Abstract
With the advent of sequencing techniques population genomics took a major shift. The structure of data sets has evolved from a sample of a few loci in the genome, sequenced in dozens of individuals, to collections of complete genomes, virtually comprising all available loci. Initially sequenced in a few individuals, such genomic data sets are now reaching and even exceeding the size of traditional data sets in the number of haplotypes sequenced. Because all loci in a genome are not independent, this evolution of data sets is mirrored by a methodological change. The evolutionary processes that generate the observed sequences are now modeled spatially along genomes whereas it was previously described temporally (either in a forward or backward manner). Although the spatial process of sequence evolution is complex, approximations to the model feature Markovian properties, permitting efficient inference. In this chapter, we introduce these recent developments that enable the modeling of the evolutionary history of a sample of several individual genomes. Such models assume the occurrence of meiotic recombination, and therefore, to date, they are dedicated to the analysis of eukaryotic species.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
1000 Genomes Project Consortium, Abecasis GR, Auton A et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65
Hobolth A, Christensen OF, Mailund T et al (2007) Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet 3:e7
Dutheil JY, Ganapathy G, Hobolth A et al (2009) Ancestral population genomics: the coalescent hidden Markov model approach. Genetics 183:259–274
Harris K, Sheehan S, Kamm JA et al (2014) Decoding coalescent hidden Markov models in linear time. Res Comput Mol Biol 8394:100–114
Hein J, Schierup MH, Wiuf C (2005) Gene genealogies, variation and evolution: a primer in coalescent theory. Oxford University Press, Oxford
Wakeley J (2008) Coalescent theory: an introduction. Roberts and Company Publishers, Bloxham, Reading, PA
Hudson RR (1991) Gene genealogies and the coalescent process. Oxford Surv Evol Biol 7:1–44
McVean GAT, Cardin NJ (2005) Approximating the coalescent with recombination. Philos Trans R Soc Lon B Biol Sci 360:1387–1393
Marjoram P, Wall JD (2006) Fast “coalescent” simulation. BMC Genet 7:16
Wiuf C, Hein J (1999) Recombination as a point process along sequences. Theor Popul Biol 55:248–259
Hobolth A, Jensen JL (2014) Markovian approximation to the finite loci coalescent with recombination along multiple sequences. Theor Popul Biol 98:48–58
Rasmussen MD, Hubisz MJ, Gronau I et al (2014) Genome-wide inference of ancestral recombination graphs. PLoS Genet 10:e1004342
Li H, Durbin R (2011) Inference of human population history from individual whole-genome sequences. Nature 475:493–496
Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford
Steinrücken M, Paul JS, Song YS (2013) A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor Popul Biol 87:51–61
Schiffels S, Durbin R (2014) Inferring human population size and separation history from multiple genome sequences. Nat Genet 46:919–925
Paul JS, Song YS (2012) Blockwise HMM computation for large-scale population genomic inference. Bioinformatics 28:2008–2015
Stephens M, Donnelly P (2000) Inference in molecular population genetics. J R Stat Soc Series B Stat Methodology 62:605–635
Li N, Stephens M (2003) Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165:2213–2233
Fearnhead P, Donnelly P (2001) Estimating recombination rates from population genetic data. Genetics 159:1299–1318
Sheehan S, Harris K, Song YS (2013) Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach. Genetics 194:647–662
Raghavan M, Steinrücken M, Harris K et al (2015) Genomic evidence for the Pleistocene and recent population history of Native Americans. Science 349:3884
Paul JS, Song YS (2010) A principled approach to deriving approximate conditional sampling distributions in population genetics models with recombination. Genetics 186:321–338
Paul JS, Steinrücken M, Song YS (2011) An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 187:1115–1128
Eriksson A, Mahjani B, Mehlig B (2009) Sequential Markov coalescent algorithms for population models with demographic structure. Theor Popul Biol 76:84–91
Dutheil JY, Hobolth A (2012) Ancestral population genomics. Methods Mol Biol 856:293–313
Felsenstein J (2003) Inferring phylogenies. Sinauer Associates, Sunderland, MA
Mailund T, Halager AE, Westergaard M et al (2012) A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species. PLoS Genet 8:e1003125
Mailund T, Halager AE, Westergaard M (2012) Using colored petri nets to construct coalescent hidden markov models: automatic translation from demographic specifications to efficient inference methods. In: Haddad S, Pomello L (eds) Application and theory of petri nets. Springer, Berlin, Heidelberg, pp 32–50
Mailund T, Dutheil JY, Hobolth A et al (2011) Estimating divergence time and ancestral effective population size of Bornean and Sumatran orangutan subspecies using a coalescent hidden Markov model. PLoS Genet 7:e1001319
Felsenstein J, Churchill GA (1996) A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol 13:93–104
Goldman N, Thorne JL, Jones DT (1996) Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. J Mol Biol 263:196–208
Locke DP, Hillier LW, Warren WC et al (2011) Comparative and demographic analysis of orang-utan genomes. Nature 469:529–533
Scally A, Dutheil JY, Hillier LW et al (2012) Insights into hominid evolution from the gorilla genome sequence. Nature 483:169–175
Prüfer K, Munch K, Hellmann I et al (2012) The bonobo genome compared with the chimpanzee and human genomes. Nature 486:527–531
Stukenbrock EH, Bataillon T, Dutheil JY et al (2011) The making of a new pathogen: insights from comparative population genomics of the domesticated wheat pathogen Mycosphaerella graminicola and its wild sister species. Genome Res 21:2157–2166
Sand A, Kristiansen M, Pedersen CNS et al (2013) zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm. BMC Bioinformatics 14:339
Durbin R, Eddy SR, Krogh A et al (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
Chen GK, Marjoram P, Wall JD (2009) Fast and flexible simulation of DNA sequence data. Genome Res 19:136–142
Acknowledgments
The author would like to thank Asger Hobolth for discussing the SMC model, and Yun Song for clarifying some aspects related to the implementation of the CSD . This publication is the contribution no. 2015-048 of the Institut des Sciences de l’Évolution de Montpellier (ISE-M).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media LLC
About this protocol
Cite this protocol
Dutheil, J.Y. (2017). Hidden Markov Models in Population Genomics. In: Westhead, D., Vijayabaskar, M. (eds) Hidden Markov Models. Methods in Molecular Biology, vol 1552. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6753-7_11
Download citation
DOI: https://doi.org/10.1007/978-1-4939-6753-7_11
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6751-3
Online ISBN: 978-1-4939-6753-7
eBook Packages: Springer Protocols