Skip to main content

Bayesian Approach to DNA Segmentation into Regions with Different Average Nucleotide Composition

  • Conference paper
  • First Online:
Book cover Computational Biology (JOBIM 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2066))

Included in the following conference series:

Abstract

We present a new method of segmentation of nucleotide sequences into regions with different average composition. The sequence is modelled as a series of segments; within each segment the sequence is considered as a random sequence of independent and identically distributed variables. The partition algorithm includes two stages. In the first stage the optimal partition is found, which maximises the overall product of marginal likelihoods calculated for each segment. To prevent segmentation into short segments, the border insertion penalty may be introduced. In the next stage segments with close compositions are merged. Filtration is performed with the help of partition function calculated for all possible subsets of boundaries that belong to the optimal partition. The long sequences can be segmented by dividing sequences and segmenting those parts separately. The contextual effects of repeats, genes and other genomic elements are readily visualised.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Karlin, S., Brendel, V.: Patchiness and correlation in DNA sequences. Science 259 (1993) 677–680.

    Article  Google Scholar 

  2. Li, W.: The study of correlation structure of DNA sequences: a critical review. Computer & Chemistry 21(4) (1997) 257–278.

    Article  Google Scholar 

  3. Bernardi, G.: The isochore organization of the human genome. Annual Review of Genetics 23 (1989) 637–661.

    Article  Google Scholar 

  4. D’Onofrio, G., Mouchiroud, D., Aissani, B., Gautier, C., Bernardi, G.: Correlation between the compositional properties of human genes, codon usage, and amino acid composition of proteins. J. Mol. Evol. 32 (1991) 504–510.

    Article  Google Scholar 

  5. Guigo, R. Fickett, J. W.: Distinctive sequence features in protein coding, genic noncoding and intergenic human DNA. J. Mol. Biol. 253 (1995) 51–60.

    Article  Google Scholar 

  6. Herzel, H., Grosse, I.: Correlation in DNA sequences: The role of protein coding segments. Phys. Rev. E. 55 (1997) 800–810.

    Article  Google Scholar 

  7. Li, W., Kaneko, V.: DNA Correlations. Nature 360 (1992) 635–636.

    Article  Google Scholar 

  8. Gelfand, M. S.: Prediction of function in DNA sequence analysis. Journal of Computational Biology 2 (1995) 87–117.

    Google Scholar 

  9. Gelfand, M. S., Koonin, E. V.: Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes. Nucl. Acid. Res 27 (1995) 2430–2439.

    Google Scholar 

  10. Pedersen, A. G., Baldi, P., Chauvin, Y. Brunak, S.: The biology of eukaryotic promoter prediction. Computer & Chemistry 23 (1999) 191–207.

    Article  Google Scholar 

  11. Krogh, A., Mian, I. S. Haussler, D.: A hidden Markov model that finds genes in E.coli DNA. Nucl. Acid. Res 22 (1994) 4768–4778.

    Article  Google Scholar 

  12. Liu, S. L., Lawrence, C. E.: Bayesian Inference of Biopolymer Models. Bioinformatics 15 (1999) 38–52.

    Article  Google Scholar 

  13. Lawrence, C. E.: Bayesian Bioinformatics. 5th international conference on intelligent systems for molecular biology, Halkidiki, Greece (1997).

    Google Scholar 

  14. Liu, S. L., Lawrence, C. E.: Bayesian inference of biopolymer models, Stanford Statistical Department Technical Report (1998).

    Google Scholar 

  15. Roman-Roldan, R., Bernaola-Galvan, P. and Oliver, J. L.: Sequence compositional complexity of DNA through an entropic segmentation method. Phys. Rev. Lett. 80 (1998) 1344.

    Google Scholar 

  16. Churchill, G. A.: Stochastic models for heterogeneous DNA sequences. Bull. Math. Biol. 51 (1989) 79–94.

    MATH  MathSciNet  Google Scholar 

  17. Durbin, R., Eddy, Y. S., Krogh, A. Mitchison, G.: Biological Sequence Analysis. Cambridge, Cambirdge University Press (1998).

    MATH  Google Scholar 

  18. Muri, F., Chauveau, D., Cellier, D.: Convergence assessment in latent variable models: DNA applications. In C. P. Robert (ed.) Lectural Notes in Statistics, Vol. 135, Discretization and MCMC convergence assessment., Springer. (1998) 127–146.

    Google Scholar 

  19. Wolpert, D. H., Wolf, D. R.: Estimating functions of probability distributions from a finite set of samples. Phys. Rev. E. 52 (1995) 6841–6854.

    Article  MathSciNet  Google Scholar 

  20. Rozanov, Y. M.: Teoriya veroyatnosti, sluchainye processy i matematicheskaya statistika (russ: Probability Theory, Stochastic Processes and Mathematical Statisitics). Moscow, Nauka (1985).

    Google Scholar 

  21. Ramensky, V.E., Makeev, V.Ju., Roytberg, M.A., Tumanyan, V.G.: DNA segmentation through the bayesian approach. Journal of Computational Biology., 7 (2000), 215–231.

    Article  Google Scholar 

  22. Shaeffer, G. (1999) Personal communication.

    Google Scholar 

  23. Finkelstein, A. V., Roytberg, M. A.: Computation of biopolymers: A general approach to different problems. BioSystems 30 (1993) 1–19.

    Article  Google Scholar 

  24. Ossadnik, S.M., Buldyrev, S.V., Goldberger, A.L., Havlin, S., Mantegna, R.N., Peng, C.-K., Simons, M., Stanley, H.E.: Correlation approach to identify coding regions in DNA sequences. Biophysical Journal 67 (1994) 64–70.

    Article  Google Scholar 

  25. Bernaola-Galván, P., Grosse, I., Carpena, P., Oliver, J., Román-Roldán, R., Stanley, H.: Finding borders between coding and noncoding DNA regions by an entropic segmentation method. Phys. Rev. Let., 85, (2000) 1342–1345.

    Article  Google Scholar 

  26. Ono, S.: Evolution by gene duplication. Springer. (1970)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Makeev, V., Ramensky, V., Gelfand, M., Roytberg, M., Tumanyan, V. (2001). Bayesian Approach to DNA Segmentation into Regions with Different Average Nucleotide Composition. In: Gascuel, O., Sagot, MF. (eds) Computational Biology. JOBIM 2000. Lecture Notes in Computer Science, vol 2066. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45727-5_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-45727-5_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42242-6

  • Online ISBN: 978-3-540-45727-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics