Hidden Markov Models in Population Genomics

Dutheil, Julien Y.

doi:10.1007/978-1-4939-6753-7_11

Julien Y. Dutheil⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1552))

2669 Accesses
4 Citations
1 Altmetric

Abstract

With the advent of sequencing techniques population genomics took a major shift. The structure of data sets has evolved from a sample of a few loci in the genome, sequenced in dozens of individuals, to collections of complete genomes, virtually comprising all available loci. Initially sequenced in a few individuals, such genomic data sets are now reaching and even exceeding the size of traditional data sets in the number of haplotypes sequenced. Because all loci in a genome are not independent, this evolution of data sets is mirrored by a methodological change. The evolutionary processes that generate the observed sequences are now modeled spatially along genomes whereas it was previously described temporally (either in a forward or backward manner). Although the spatial process of sequence evolution is complex, approximations to the model feature Markovian properties, permitting efficient inference. In this chapter, we introduce these recent developments that enable the modeling of the evolutionary history of a sample of several individual genomes. Such models assume the occurrence of meiotic recombination, and therefore, to date, they are dedicated to the analysis of eukaryotic species.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

1000 Genomes Project Consortium, Abecasis GR, Auton A et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65
Article Google Scholar
Hobolth A, Christensen OF, Mailund T et al (2007) Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet 3:e7
Article PubMed PubMed Central Google Scholar
Dutheil JY, Ganapathy G, Hobolth A et al (2009) Ancestral population genomics: the coalescent hidden Markov model approach. Genetics 183:259–274
Article PubMed PubMed Central Google Scholar
Harris K, Sheehan S, Kamm JA et al (2014) Decoding coalescent hidden Markov models in linear time. Res Comput Mol Biol 8394:100–114
PubMed PubMed Central Google Scholar
Hein J, Schierup MH, Wiuf C (2005) Gene genealogies, variation and evolution: a primer in coalescent theory. Oxford University Press, Oxford
Google Scholar
Wakeley J (2008) Coalescent theory: an introduction. Roberts and Company Publishers, Bloxham, Reading, PA
Google Scholar
Hudson RR (1991) Gene genealogies and the coalescent process. Oxford Surv Evol Biol 7:1–44
Google Scholar
McVean GAT, Cardin NJ (2005) Approximating the coalescent with recombination. Philos Trans R Soc Lon B Biol Sci 360:1387–1393
Article CAS PubMed PubMed Central Google Scholar
Marjoram P, Wall JD (2006) Fast “coalescent” simulation. BMC Genet 7:16
Article PubMed PubMed Central Google Scholar
Wiuf C, Hein J (1999) Recombination as a point process along sequences. Theor Popul Biol 55:248–259
Article CAS PubMed Google Scholar
Hobolth A, Jensen JL (2014) Markovian approximation to the finite loci coalescent with recombination along multiple sequences. Theor Popul Biol 98:48–58
Article PubMed Google Scholar
Rasmussen MD, Hubisz MJ, Gronau I et al (2014) Genome-wide inference of ancestral recombination graphs. PLoS Genet 10:e1004342
Article PubMed PubMed Central Google Scholar
Li H, Durbin R (2011) Inference of human population history from individual whole-genome sequences. Nature 475:493–496
Article CAS PubMed PubMed Central Google Scholar
Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford
Book Google Scholar
Steinrücken M, Paul JS, Song YS (2013) A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor Popul Biol 87:51–61
Article PubMed Google Scholar
Schiffels S, Durbin R (2014) Inferring human population size and separation history from multiple genome sequences. Nat Genet 46:919–925
Article CAS PubMed PubMed Central Google Scholar
Paul JS, Song YS (2012) Blockwise HMM computation for large-scale population genomic inference. Bioinformatics 28:2008–2015
Article CAS PubMed PubMed Central Google Scholar
Stephens M, Donnelly P (2000) Inference in molecular population genetics. J R Stat Soc Series B Stat Methodology 62:605–635
Article Google Scholar
Li N, Stephens M (2003) Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165:2213–2233
CAS PubMed PubMed Central Google Scholar
Fearnhead P, Donnelly P (2001) Estimating recombination rates from population genetic data. Genetics 159:1299–1318
CAS PubMed PubMed Central Google Scholar
Sheehan S, Harris K, Song YS (2013) Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach. Genetics 194:647–662
Article PubMed PubMed Central Google Scholar
Raghavan M, Steinrücken M, Harris K et al (2015) Genomic evidence for the Pleistocene and recent population history of Native Americans. Science 349:3884
Article Google Scholar
Paul JS, Song YS (2010) A principled approach to deriving approximate conditional sampling distributions in population genetics models with recombination. Genetics 186:321–338
Article CAS PubMed PubMed Central Google Scholar
Paul JS, Steinrücken M, Song YS (2011) An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 187:1115–1128
Article PubMed PubMed Central Google Scholar
Eriksson A, Mahjani B, Mehlig B (2009) Sequential Markov coalescent algorithms for population models with demographic structure. Theor Popul Biol 76:84–91
Article CAS PubMed Google Scholar
Dutheil JY, Hobolth A (2012) Ancestral population genomics. Methods Mol Biol 856:293–313
Article CAS PubMed Google Scholar
Felsenstein J (2003) Inferring phylogenies. Sinauer Associates, Sunderland, MA
Google Scholar
Mailund T, Halager AE, Westergaard M et al (2012) A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species. PLoS Genet 8:e1003125
Article PubMed PubMed Central Google Scholar
Mailund T, Halager AE, Westergaard M (2012) Using colored petri nets to construct coalescent hidden markov models: automatic translation from demographic specifications to efficient inference methods. In: Haddad S, Pomello L (eds) Application and theory of petri nets. Springer, Berlin, Heidelberg, pp 32–50
Chapter Google Scholar
Mailund T, Dutheil JY, Hobolth A et al (2011) Estimating divergence time and ancestral effective population size of Bornean and Sumatran orangutan subspecies using a coalescent hidden Markov model. PLoS Genet 7:e1001319
Article CAS PubMed PubMed Central Google Scholar
Felsenstein J, Churchill GA (1996) A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol 13:93–104
Article CAS PubMed Google Scholar
Goldman N, Thorne JL, Jones DT (1996) Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. J Mol Biol 263:196–208
Article CAS PubMed Google Scholar
Locke DP, Hillier LW, Warren WC et al (2011) Comparative and demographic analysis of orang-utan genomes. Nature 469:529–533
Article CAS PubMed PubMed Central Google Scholar
Scally A, Dutheil JY, Hillier LW et al (2012) Insights into hominid evolution from the gorilla genome sequence. Nature 483:169–175
Article CAS PubMed PubMed Central Google Scholar
Prüfer K, Munch K, Hellmann I et al (2012) The bonobo genome compared with the chimpanzee and human genomes. Nature 486:527–531
PubMed PubMed Central Google Scholar
Stukenbrock EH, Bataillon T, Dutheil JY et al (2011) The making of a new pathogen: insights from comparative population genomics of the domesticated wheat pathogen Mycosphaerella graminicola and its wild sister species. Genome Res 21:2157–2166
Article CAS PubMed PubMed Central Google Scholar
Sand A, Kristiansen M, Pedersen CNS et al (2013) zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm. BMC Bioinformatics 14:339
Article PubMed PubMed Central Google Scholar
Durbin R, Eddy SR, Krogh A et al (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
Book Google Scholar
Chen GK, Marjoram P, Wall JD (2009) Fast and flexible simulation of DNA sequence data. Genome Res 19:136–142
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgments

The author would like to thank Asger Hobolth for discussing the SMC model, and Yun Song for clarifying some aspects related to the implementation of the CSD . This publication is the contribution no. 2015-048 of the Institut des Sciences de l’Évolution de Montpellier (ISE-M).

Author information

Authors and Affiliations

Department of Evolutionary Genetics, Molecular Systems Evolution, Max Planck Institute for Evolutionary Biology, August-Thienemann-Straße 2, 24306, Plön, Germany
Julien Y. Dutheil

Authors

Julien Y. Dutheil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julien Y. Dutheil .

Editor information

Editors and Affiliations

University of Leeds School of Molecular and Cellular Biology, Leeds, United Kingdom
David R. Westhead
University of Leeds School of Cellular and Molecular Biology, Leeds, United Kingdom
M. S. Vijayabaskar

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Dutheil, J.Y. (2017). Hidden Markov Models in Population Genomics. In: Westhead, D., Vijayabaskar, M. (eds) Hidden Markov Models. Methods in Molecular Biology, vol 1552. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6753-7_11

Download citation

DOI: https://doi.org/10.1007/978-1-4939-6753-7_11
Published: 22 February 2017
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6751-3
Online ISBN: 978-1-4939-6753-7
eBook Packages: Springer Protocols

Publish with us

Policies and ethics