The DNA of every gene is formed as a particular sequence of four possible bases, labelled A, C, G and T, respectively. I
Posted: Fri May 06, 2022 7:00 am
The DNA of every gene is formed as a particular sequence of four possible bases, labelled A, C, G and T, respectively. In a particular gene sub-sequence of length 1572, the following cooccurrence matrix was recorded: 185 101 69 161 74 41 45 103 C= 86 6 34 100 171 115 78 202 Here, the (i, j)th count, Cij, 1 ≤i, j ≤ 4, is the number of transitions from base j to base i (with the bases ordered as listed above) that occur at adjacent locations along the sub-sequence. (a) A machine learning algorithm estimates the parameters of a homogeneous Markov chain (HMC) model of the gene DNA by consistently processing C. Using these estimates, address the following: (i) if all four bases are equiprobable at a particular position along the DNA, what is the probability that the base at the 3rd-next position is equal to the base at the 7th-next position? [25 %] (ii) if base A is found at a position along the DNA in the long-run, what is the probability that either base A or C occurs four positions earlier? [25 %]