Skip to main content

Hidden Markov Models in Population Genomics

  • Protocol
  • First Online:
Hidden Markov Models

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1552))

Abstract

With the advent of sequencing techniques population genomics took a major shift. The structure of data sets has evolved from a sample of a few loci in the genome, sequenced in dozens of individuals, to collections of complete genomes, virtually comprising all available loci. Initially sequenced in a few individuals, such genomic data sets are now reaching and even exceeding the size of traditional data sets in the number of haplotypes sequenced. Because all loci in a genome are not independent, this evolution of data sets is mirrored by a methodological change. The evolutionary processes that generate the observed sequences are now modeled spatially along genomes whereas it was previously described temporally (either in a forward or backward manner). Although the spatial process of sequence evolution is complex, approximations to the model feature Markovian properties, permitting efficient inference. In this chapter, we introduce these recent developments that enable the modeling of the evolutionary history of a sample of several individual genomes. Such models assume the occurrence of meiotic recombination, and therefore, to date, they are dedicated to the analysis of eukaryotic species.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. 1000 Genomes Project Consortium, Abecasis GR, Auton A et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65

    Article  Google Scholar 

  2. Hobolth A, Christensen OF, Mailund T et al (2007) Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet 3:e7

    Article  PubMed  PubMed Central  Google Scholar 

  3. Dutheil JY, Ganapathy G, Hobolth A et al (2009) Ancestral population genomics: the coalescent hidden Markov model approach. Genetics 183:259–274

    Article  PubMed  PubMed Central  Google Scholar 

  4. Harris K, Sheehan S, Kamm JA et al (2014) Decoding coalescent hidden Markov models in linear time. Res Comput Mol Biol 8394:100–114

    PubMed  PubMed Central  Google Scholar 

  5. Hein J, Schierup MH, Wiuf C (2005) Gene genealogies, variation and evolution: a primer in coalescent theory. Oxford University Press, Oxford

    Google Scholar 

  6. Wakeley J (2008) Coalescent theory: an introduction. Roberts and Company Publishers, Bloxham, Reading, PA

    Google Scholar 

  7. Hudson RR (1991) Gene genealogies and the coalescent process. Oxford Surv Evol Biol 7:1–44

    Google Scholar 

  8. McVean GAT, Cardin NJ (2005) Approximating the coalescent with recombination. Philos Trans R Soc Lon B Biol Sci 360:1387–1393

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Marjoram P, Wall JD (2006) Fast “coalescent” simulation. BMC Genet 7:16

    Article  PubMed  PubMed Central  Google Scholar 

  10. Wiuf C, Hein J (1999) Recombination as a point process along sequences. Theor Popul Biol 55:248–259

    Article  CAS  PubMed  Google Scholar 

  11. Hobolth A, Jensen JL (2014) Markovian approximation to the finite loci coalescent with recombination along multiple sequences. Theor Popul Biol 98:48–58

    Article  PubMed  Google Scholar 

  12. Rasmussen MD, Hubisz MJ, Gronau I et al (2014) Genome-wide inference of ancestral recombination graphs. PLoS Genet 10:e1004342

    Article  PubMed  PubMed Central  Google Scholar 

  13. Li H, Durbin R (2011) Inference of human population history from individual whole-genome sequences. Nature 475:493–496

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford

    Book  Google Scholar 

  15. Steinrücken M, Paul JS, Song YS (2013) A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor Popul Biol 87:51–61

    Article  PubMed  Google Scholar 

  16. Schiffels S, Durbin R (2014) Inferring human population size and separation history from multiple genome sequences. Nat Genet 46:919–925

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Paul JS, Song YS (2012) Blockwise HMM computation for large-scale population genomic inference. Bioinformatics 28:2008–2015

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Stephens M, Donnelly P (2000) Inference in molecular population genetics. J R Stat Soc Series B Stat Methodology 62:605–635

    Article  Google Scholar 

  19. Li N, Stephens M (2003) Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165:2213–2233

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Fearnhead P, Donnelly P (2001) Estimating recombination rates from population genetic data. Genetics 159:1299–1318

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Sheehan S, Harris K, Song YS (2013) Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach. Genetics 194:647–662

    Article  PubMed  PubMed Central  Google Scholar 

  22. Raghavan M, Steinrücken M, Harris K et al (2015) Genomic evidence for the Pleistocene and recent population history of Native Americans. Science 349:3884

    Article  Google Scholar 

  23. Paul JS, Song YS (2010) A principled approach to deriving approximate conditional sampling distributions in population genetics models with recombination. Genetics 186:321–338

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Paul JS, Steinrücken M, Song YS (2011) An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 187:1115–1128

    Article  PubMed  PubMed Central  Google Scholar 

  25. Eriksson A, Mahjani B, Mehlig B (2009) Sequential Markov coalescent algorithms for population models with demographic structure. Theor Popul Biol 76:84–91

    Article  CAS  PubMed  Google Scholar 

  26. Dutheil JY, Hobolth A (2012) Ancestral population genomics. Methods Mol Biol 856:293–313

    Article  CAS  PubMed  Google Scholar 

  27. Felsenstein J (2003) Inferring phylogenies. Sinauer Associates, Sunderland, MA

    Google Scholar 

  28. Mailund T, Halager AE, Westergaard M et al (2012) A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species. PLoS Genet 8:e1003125

    Article  PubMed  PubMed Central  Google Scholar 

  29. Mailund T, Halager AE, Westergaard M (2012) Using colored petri nets to construct coalescent hidden markov models: automatic translation from demographic specifications to efficient inference methods. In: Haddad S, Pomello L (eds) Application and theory of petri nets. Springer, Berlin, Heidelberg, pp 32–50

    Chapter  Google Scholar 

  30. Mailund T, Dutheil JY, Hobolth A et al (2011) Estimating divergence time and ancestral effective population size of Bornean and Sumatran orangutan subspecies using a coalescent hidden Markov model. PLoS Genet 7:e1001319

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Felsenstein J, Churchill GA (1996) A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol 13:93–104

    Article  CAS  PubMed  Google Scholar 

  32. Goldman N, Thorne JL, Jones DT (1996) Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. J Mol Biol 263:196–208

    Article  CAS  PubMed  Google Scholar 

  33. Locke DP, Hillier LW, Warren WC et al (2011) Comparative and demographic analysis of orang-utan genomes. Nature 469:529–533

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Scally A, Dutheil JY, Hillier LW et al (2012) Insights into hominid evolution from the gorilla genome sequence. Nature 483:169–175

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Prüfer K, Munch K, Hellmann I et al (2012) The bonobo genome compared with the chimpanzee and human genomes. Nature 486:527–531

    PubMed  PubMed Central  Google Scholar 

  36. Stukenbrock EH, Bataillon T, Dutheil JY et al (2011) The making of a new pathogen: insights from comparative population genomics of the domesticated wheat pathogen Mycosphaerella graminicola and its wild sister species. Genome Res 21:2157–2166

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Sand A, Kristiansen M, Pedersen CNS et al (2013) zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm. BMC Bioinformatics 14:339

    Article  PubMed  PubMed Central  Google Scholar 

  38. Durbin R, Eddy SR, Krogh A et al (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge

    Book  Google Scholar 

  39. Chen GK, Marjoram P, Wall JD (2009) Fast and flexible simulation of DNA sequence data. Genome Res 19:136–142

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

The author would like to thank Asger Hobolth for discussing the SMC model, and Yun Song for clarifying some aspects related to the implementation of the CSD . This publication is the contribution no. 2015-048 of the Institut des Sciences de l’Évolution de Montpellier (ISE-M).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julien Y. Dutheil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this protocol

Cite this protocol

Dutheil, J.Y. (2017). Hidden Markov Models in Population Genomics. In: Westhead, D., Vijayabaskar, M. (eds) Hidden Markov Models. Methods in Molecular Biology, vol 1552. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6753-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6753-7_11

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6751-3

  • Online ISBN: 978-1-4939-6753-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics