Bayesian Approach to DNA Segmentation into Regions with Different Average Nucleotide Composition
We present a new method of segmentation of nucleotide sequences into regions with different average composition. The sequence is modelled as a series of segments; within each segment the sequence is considered as a random sequence of independent and identically distributed variables. The partition algorithm includes two stages. In the first stage the optimal partition is found, which maximises the overall product of marginal likelihoods calculated for each segment. To prevent segmentation into short segments, the border insertion penalty may be introduced. In the next stage segments with close compositions are merged. Filtration is performed with the help of partition function calculated for all possible subsets of boundaries that belong to the optimal partition. The long sequences can be segmented by dividing sequences and segmenting those parts separately. The contextual effects of repeats, genes and other genomic elements are readily visualised.
KeywordsPartition Function Hide Markov Model Random Sequence Bayesian Approach Marginal Likelihood
Unable to display preview. Download preview PDF.
- 8.Gelfand, M. S.: Prediction of function in DNA sequence analysis. Journal of Computational Biology 2 (1995) 87–117.Google Scholar
- 9.Gelfand, M. S., Koonin, E. V.: Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes. Nucl. Acid. Res 27 (1995) 2430–2439.Google Scholar
- 13.Lawrence, C. E.: Bayesian Bioinformatics. 5th international conference on intelligent systems for molecular biology, Halkidiki, Greece (1997).Google Scholar
- 14.Liu, S. L., Lawrence, C. E.: Bayesian inference of biopolymer models, Stanford Statistical Department Technical Report (1998).Google Scholar
- 15.Roman-Roldan, R., Bernaola-Galvan, P. and Oliver, J. L.: Sequence compositional complexity of DNA through an entropic segmentation method. Phys. Rev. Lett. 80 (1998) 1344.Google Scholar
- 18.Muri, F., Chauveau, D., Cellier, D.: Convergence assessment in latent variable models: DNA applications. In C. P. Robert (ed.) Lectural Notes in Statistics, Vol. 135, Discretization and MCMC convergence assessment., Springer. (1998) 127–146.Google Scholar
- 20.Rozanov, Y. M.: Teoriya veroyatnosti, sluchainye processy i matematicheskaya statistika (russ: Probability Theory, Stochastic Processes and Mathematical Statisitics). Moscow, Nauka (1985).Google Scholar
- 22.Shaeffer, G. (1999) Personal communication.Google Scholar
- 26.Ono, S.: Evolution by gene duplication. Springer. (1970)Google Scholar