Bayesian Approach to DNA Segmentation into Regions with Different Average Nucleotide Composition

Makeev, Vsevolod; Ramensky, Vasily; Gelfand, Mikhail; Roytberg, Mikhail; Tumanyan, Vladimir

doi:10.1007/3-540-45727-5_6

Vsevolod Makeev⁶,
Vasily Ramensky⁶,
Mikhail Gelfand⁷,
Mikhail Roytberg⁸ &
…
Vladimir Tumanyan⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2066))

Included in the following conference series:

International Conference on Biology, Informatics, and Mathematics

359 Accesses
1 Citations

Abstract

We present a new method of segmentation of nucleotide sequences into regions with different average composition. The sequence is modelled as a series of segments; within each segment the sequence is considered as a random sequence of independent and identically distributed variables. The partition algorithm includes two stages. In the first stage the optimal partition is found, which maximises the overall product of marginal likelihoods calculated for each segment. To prevent segmentation into short segments, the border insertion penalty may be introduced. In the next stage segments with close compositions are merged. Filtration is performed with the help of partition function calculated for all possible subsets of boundaries that belong to the optimal partition. The long sequences can be segmented by dividing sequences and segmenting those parts separately. The contextual effects of repeats, genes and other genomic elements are readily visualised.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Karlin, S., Brendel, V.: Patchiness and correlation in DNA sequences. Science 259 (1993) 677–680.
Article Google Scholar
Li, W.: The study of correlation structure of DNA sequences: a critical review. Computer & Chemistry 21(4) (1997) 257–278.
Article Google Scholar
Bernardi, G.: The isochore organization of the human genome. Annual Review of Genetics 23 (1989) 637–661.
Article Google Scholar
D’Onofrio, G., Mouchiroud, D., Aissani, B., Gautier, C., Bernardi, G.: Correlation between the compositional properties of human genes, codon usage, and amino acid composition of proteins. J. Mol. Evol. 32 (1991) 504–510.
Article Google Scholar
Guigo, R. Fickett, J. W.: Distinctive sequence features in protein coding, genic noncoding and intergenic human DNA. J. Mol. Biol. 253 (1995) 51–60.
Article Google Scholar
Herzel, H., Grosse, I.: Correlation in DNA sequences: The role of protein coding segments. Phys. Rev. E. 55 (1997) 800–810.
Article Google Scholar
Li, W., Kaneko, V.: DNA Correlations. Nature 360 (1992) 635–636.
Article Google Scholar
Gelfand, M. S.: Prediction of function in DNA sequence analysis. Journal of Computational Biology 2 (1995) 87–117.
Google Scholar
Gelfand, M. S., Koonin, E. V.: Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes. Nucl. Acid. Res 27 (1995) 2430–2439.
Google Scholar
Pedersen, A. G., Baldi, P., Chauvin, Y. Brunak, S.: The biology of eukaryotic promoter prediction. Computer & Chemistry 23 (1999) 191–207.
Article Google Scholar
Krogh, A., Mian, I. S. Haussler, D.: A hidden Markov model that finds genes in E.coli DNA. Nucl. Acid. Res 22 (1994) 4768–4778.
Article Google Scholar
Liu, S. L., Lawrence, C. E.: Bayesian Inference of Biopolymer Models. Bioinformatics 15 (1999) 38–52.
Article Google Scholar
Lawrence, C. E.: Bayesian Bioinformatics. 5th international conference on intelligent systems for molecular biology, Halkidiki, Greece (1997).
Google Scholar
Liu, S. L., Lawrence, C. E.: Bayesian inference of biopolymer models, Stanford Statistical Department Technical Report (1998).
Google Scholar
Roman-Roldan, R., Bernaola-Galvan, P. and Oliver, J. L.: Sequence compositional complexity of DNA through an entropic segmentation method. Phys. Rev. Lett. 80 (1998) 1344.
Google Scholar
Churchill, G. A.: Stochastic models for heterogeneous DNA sequences. Bull. Math. Biol. 51 (1989) 79–94.
MATH MathSciNet Google Scholar
Durbin, R., Eddy, Y. S., Krogh, A. Mitchison, G.: Biological Sequence Analysis. Cambridge, Cambirdge University Press (1998).
MATH Google Scholar
Muri, F., Chauveau, D., Cellier, D.: Convergence assessment in latent variable models: DNA applications. In C. P. Robert (ed.) Lectural Notes in Statistics, Vol. 135, Discretization and MCMC convergence assessment., Springer. (1998) 127–146.
Google Scholar
Wolpert, D. H., Wolf, D. R.: Estimating functions of probability distributions from a finite set of samples. Phys. Rev. E. 52 (1995) 6841–6854.
Article MathSciNet Google Scholar
Rozanov, Y. M.: Teoriya veroyatnosti, sluchainye processy i matematicheskaya statistika (russ: Probability Theory, Stochastic Processes and Mathematical Statisitics). Moscow, Nauka (1985).
Google Scholar
Ramensky, V.E., Makeev, V.Ju., Roytberg, M.A., Tumanyan, V.G.: DNA segmentation through the bayesian approach. Journal of Computational Biology., 7 (2000), 215–231.
Article Google Scholar
Shaeffer, G. (1999) Personal communication.
Google Scholar
Finkelstein, A. V., Roytberg, M. A.: Computation of biopolymers: A general approach to different problems. BioSystems 30 (1993) 1–19.
Article Google Scholar
Ossadnik, S.M., Buldyrev, S.V., Goldberger, A.L., Havlin, S., Mantegna, R.N., Peng, C.-K., Simons, M., Stanley, H.E.: Correlation approach to identify coding regions in DNA sequences. Biophysical Journal 67 (1994) 64–70.
Article Google Scholar
Bernaola-Galván, P., Grosse, I., Carpena, P., Oliver, J., Román-Roldán, R., Stanley, H.: Finding borders between coding and noncoding DNA regions by an entropic segmentation method. Phys. Rev. Let., 85, (2000) 1342–1345.
Article Google Scholar
Ono, S.: Evolution by gene duplication. Springer. (1970)
Google Scholar

Download references

Author information

Authors and Affiliations

Engelhardt Institute of Molecular Biology, Moscow, 117984, Russia
Vsevolod Makeev, Vasily Ramensky & Vladimir Tumanyan
VNIIGENETIKA, Moscow, Russia
Mikhail Gelfand
Institute of Mathematical Problems of Biology, Puschino, Moscow Region, Russia
Mikhail Roytberg

Authors

Vsevolod Makeev
View author publications
You can also search for this author in PubMed Google Scholar
Vasily Ramensky
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail Gelfand
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail Roytberg
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Tumanyan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Laboratoire d’Informatique, de Robotique et de Microelectronique de Montpellier, 161 rue Ada, 34392, Montpellier Cedex 5, France
Olivier Gascuel
Laboratoire d’Algorithmique Combinatoire, Institut Pasteur, 28, rue du Dr. Roux, 75724, Paris Cedex 15, France
Marie-France Sagot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Makeev, V., Ramensky, V., Gelfand, M., Roytberg, M., Tumanyan, V. (2001). Bayesian Approach to DNA Segmentation into Regions with Different Average Nucleotide Composition. In: Gascuel, O., Sagot, MF. (eds) Computational Biology. JOBIM 2000. Lecture Notes in Computer Science, vol 2066. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45727-5_6

Download citation

DOI: https://doi.org/10.1007/3-540-45727-5_6
Published: 28 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42242-6
Online ISBN: 978-3-540-45727-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics