Skip to main content

Sequence Segmentation

  • Protocol
Bioinformatics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 452))

Abstract

Whole-genome comparisons among mammalian and other eukaryotic organisms have revealed that they contain large quantities of conserved non—protein-coding sequence. Although some of the functions of this non-coding DNA have been identified, there remains a large quantity of conserved genomic sequence that is of no known function. Moreover, the task of delineating the conserved sequences is non-trivial, particularly when some sequences are conserved in only a small number of lineages. Sequence segmentation is a statistical technique for identifying putative functional elements in genomes based on atypical sequence characteristics, such as conservation levels relative to other genomes, GC content, SNP frequency, and potentially many others. The publicly available program changept and associated programs use Bayesian multiple change-point analysis to delineate classes of genomic segments with similar characteristics, potentially representing new classes of non-coding RNAs (contact web site: http://silmaril.math.sci.qut.edu.au/~keith/).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lander, E. S., Linton, L. M., Birren, B., et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860–921.

    Article  PubMed  CAS  Google Scholar 

  2. Venter, J. C., Adams, M. D., Myers, E. W., et al. (2001) The sequence of the human genome. Science 291, 1304–1351.

    Article  PubMed  CAS  Google Scholar 

  3. Waterston, R. H., Lindblad-Toh, K., Bir-ney, E., et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562.

    Article  PubMed  CAS  Google Scholar 

  4. Mikkelsen, T. S., Hillier, L. W., Eichler, E. E., et al. (2005) Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87.

    Article  CAS  Google Scholar 

  5. Sandelin, A., Wasserman, W. W., Lenhard, B. (2004) ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Res 32, W249–W52.

    Article  PubMed  CAS  Google Scholar 

  6. Loots, G. G., Ovcharenko, I., Pachter, L., et al. (2002) rVista for comparative sequence-based discovery of functional transcription factor binding sites. Genome Res 12, 832–839.

    PubMed  Google Scholar 

  7. Cooper, G. M., Stone, E. A., Asimenos, G., et al. (2005) Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 15, 901–913.

    Article  PubMed  CAS  Google Scholar 

  8. Gibbs, R. A., Weinstock, G. M., Metzker, M. L., et al. (2004) Genome sequence of the Brown Norway Rat yields insights into mammalian evolution. Nature 428, 493–521.

    Article  PubMed  CAS  Google Scholar 

  9. Siepel, A. C., Bejerano, G., Pedersen, J. S., et al. (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15, 1034–1050.

    Article  PubMed  CAS  Google Scholar 

  10. Siepel, A. C., Haussler, D. (2004) Combining phylogenetic and hidden Markov models in biosequence analysis. J Com Biol 11, 413–428.

    Article  CAS  Google Scholar 

  11. Bernaola-Galvan, P., Grosse, I., Carpena, P., et al. (2000) Finding borders between coding and non-coding regions by an entropic segmentation method. Phys Rev Letts 85, 1342–1345.

    Article  CAS  Google Scholar 

  12. Bernaola-Galvan, P., Roman-Roldan, R., Oliver, J. (1996) Compositional segmentation and long-range fractal correlations in DNA sequences. Phys Rev E 53, 5181–5189.

    Article  Google Scholar 

  13. Braun, J. V., Braun, R. K., Muller, H.-G. (2000) Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation. Biometrika 87, 301–314.

    Article  Google Scholar 

  14. Braun, J. V., Muller, H.-G. (1998) Statistical methods for DNA sequence segmentation. Stat Sci 13, 142–162.

    Article  Google Scholar 

  15. Gionis, A., Mannila, H. (2003) Finding recurrent sources in sequences. In Proceedings of the Seventh Annual International Conference on Research in Computational Molecular Biology, 123–130.

    Google Scholar 

  16. Li, W. (2001) DNA segmentation as a model selection process. In Proceedings of the Fifth Annual International Conference on Research in Computational Molecular Biology, 204–210.

    Google Scholar 

  17. Li, W., Bernaola-Galvan, P., Haghighi, F., et al. (2002) Applications of recursive segmentation to the analysis of DNA sequences. Comput Chem 26, 491–510.

    Article  PubMed  CAS  Google Scholar 

  18. Oliver, J. L., Bernaola-Galvan, P., Carpena, P., et al. (2001) Isochore chromosome maps of eukaryotic genomes. Gene 276, 47–56.

    Article  PubMed  CAS  Google Scholar 

  19. Oliver, J. L., Carpena, P., Roman-Roldan, R., et al. (2002) Isochore chromosome maps of the human genome. Gene 300, 117–127.

    Article  PubMed  CAS  Google Scholar 

  20. Oliver, J. L., Roman-Roldan, R., Perez, J., et al. (1999) SEGMENT: identifying compositional domains in DNA sequences. Bio-informatics 15, 974–979.

    CAS  Google Scholar 

  21. Szpankowski, W., Ren, W., Szpankowski, L. (2005) An optimal DNA segmentation based on the MDL principle. Int J Bioin-format Res Appl 1, 3–17.

    Article  CAS  Google Scholar 

  22. Boys, R. J., Henderson, D. A. (2002) On determining the order of Markov dependence of an observed process governed by a hidden Markov model. Sci Prog 10, 241–251.

    Google Scholar 

  23. Boys, R. J., Henderson, D. A. (2004) A Bayesian approach to DNA sequence segmentation. Biometrics 60, 573–588.

    Article  PubMed  Google Scholar 

  24. Boys, R. J., Henderson, D. A., Wilkinson, D. J. (2000) Depicting homogenous segments in DNA sequences by using hidden Markov models. Appl Stat 49, 269–285.

    Google Scholar 

  25. Keith, J. M. (2006) Segmenting eukaryotic genomes with the generalized Gibbs sampler. J Comput Biol 13, 1369–1383.

    Article  PubMed  CAS  Google Scholar 

  26. Keith, J. M., Kroese, D. P., Bryant, D. (2004) A Generalized Markov Sampler. Methodol Comput Appl Prob 6, 29–53.

    Article  Google Scholar 

  27. Minin, V. N., Dorman, K. S., Fang, F., et al. (2005) Dual multiple change-point model leads to more accurate recombination detection. Bioinformatics 21, 3034–3042.

    Article  PubMed  CAS  Google Scholar 

  28. Husmeier, D., Wright, F. (2002) A Baye-sian approach to discriminate between alternative DNA sequence segmentations. Bioinformatics 18, 226–234.

    Article  PubMed  CAS  Google Scholar 

  29. Liu, J. S., Lawrence, C. E. (1999) Bayesian inference on biopolymer models. Bioinformatics 15, 38–52.

    Article  PubMed  CAS  Google Scholar 

  30. Ramensky, V. E., Makeev, V. J., Toytberg, M. A., et al. (2000) DNA segmentation through the Bayesian approach. J Comput Biol 7, 215–231.

    Article  PubMed  CAS  Google Scholar 

  31. Salmenkivi, M., Kere, J., Mannila, H. (2002) Genome segmentation using piecewise constant intensity models and reversible jump MCMC. Bioinformatics 18, S211–S218.

    Article  PubMed  Google Scholar 

  32. Keith, J. M., Adams, P., Stephen, S., et al. Delineating slowly and rapidly evolving fractions of the Drosophila genome, submitted.

    Google Scholar 

  33. Russo, C. A. M., Takezaki, N., Nei, M. (1995) Molecular phylogeny and divergence times of Drosopholid species. Mol Biol Evol 12, 391–404.

    PubMed  CAS  Google Scholar 

  34. Tamura, K., Subramanian, S., Kumar, S. (2004) Temporal patterns of fruit fly (Drosophila) evolution revealed by mutation clocks. Mol Biol Evol 21, 36–44.

    Article  PubMed  CAS  Google Scholar 

  35. Geyer, C. J. (1991) Markov chain Monte Carlo maximum likelihood, in (Keramidas, E. M., ed.), Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface, pp. 156–163. Interface Foundation, Fairfax Station, VA.

    Google Scholar 

Download references

Acknowledgments

The author thanks Peter Adams for assistance in running simulations; Stuart Stephen for assisting in the development of much of the code; Benjamin Goursaud and Rachel Crehange for assisting in the generalization of the code for multiple data types; and John Mattick, Kerrie Mengersen, Chris Ponting, and Mark Borodovski for helpful discussions. This work was partially funded by Australian Research Council (ARC) Discovery Grants DP0452412 and DP0556631 and a National Health and Medical Research Council (NHMRC) grant entitled “Statistical methods and algorithms for analysis of high-throughput genetics and genomics platforms” (389892).

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Keith, J.M. (2008). Sequence Segmentation. In: Keith, J.M. (eds) Bioinformatics. Methods in Molecular Biology™, vol 452. Humana Press. https://doi.org/10.1007/978-1-60327-159-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-159-2_11

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-707-5

  • Online ISBN: 978-1-60327-159-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics