Inferring Demographic History Using Genomic Data
Characterizing population histories has been a major focus in evolutionary and conservation biology for decades. Driven by a desire to understand population histories, researchers have been modeling simple demographic scenarios with genetic data since the 1970s. In the last decade, the availability of genomic data and the number of demographic inference methods have dramatically increased and constitute a continuously evolving sub-discipline within population genetics. Genome sequences—both reduced representation and whole-genome sequencing and re-sequencing—contain a trove of information related to population histories and permit reconstructing complex demographic scenarios. In combination with new powerful and flexible analytical methods, population demographic inference from genomic data has revealed surprising, dynamic, and conservation-relevant histories. This chapter discusses recent advancements in demographic inference made possible by genome sequence and new analytical tools. As the theory and models of demographic inference have matured, and data sets have grown, likewise has the recognition of limitations and confounding effects. We caution that the increasing sophistication of methods should not override the critical evaluation of the researcher. Demographic inferences with genomic data offer powerful windows into the past but we encourage users to recognize inherent limitations of model assumptions, use simulations to identify potential biases, and include complementary and supporting analyses.
KeywordsApproximate-Bayesian computation Coalescent Effective population size Genealogy Haplotypes Migration
compares summary statistics from observed and simulated data to make demographic and statistical inferences. ABC does not rely on computing a likelihood-function.
a massive and temporary reduction in (effective) population size that results in an associated reduction of genetic diversity.
changes in the frequency of alleles due to random mating (and allele segregation in diploids). Changes are more pronounced in small populations.
mathematical model governing the expected distribution of coalescence times back to a common ancestor in a population sample.
approximation of the Wright-Fisher (WF) model that leads to a continuous time stochastic process that is easier to study mathematically. It is used to derive useful formulas such as the expected time to fixation of a mutation.
estimated divergence time between two populations measured as the number of generations, typically divided by 2Ne.
the size of an idealized (Wright-Fisher) population with the same amount of genetic drift as the given real population. In most organisms, effective size is less than census size because of factors such as overlapping generations, reproductive inequality, and sex bias.
the ancestral relationship, for a particular segment of the genome, among sampled chromosomes. This takes the form of a branching tree for non-recombining data, but becomes a tangled graph (the “ancestral recombination graph”) with recombination.
is the average interval between identical life history stages across successive generations. Generation time is often expressed in years.
is the average number of migrants entering each population per generation defined as 4Nem where m is the proportion of individuals per generation in each population that are immigrants.
the process of exchanging genetic material between homologous chromosomes during meiosis resulting in new combinations of alleles in the resulting gametes.
is the population-scaled recombination rate defined as 4Ner in diploid organisms.
a population in which all pairs of individuals are equally likely to mate.
also called the allele frequency spectrum, is the distribution of the allele frequencies of a given set of loci in a sample, and is often visualized as a histogram.
a summary statistic that compares two estimators of the population-scaled mutation rate Θ to detect departures from the standard coalescent model. Departures can reflect demography or selection.
is the population-scaled mutation rate equal to 4Neμ in diploid organisms. It is the product of the Ne and mutation rate μ and measures the capacity of a population to maintain genetic variability. Among organisms of similar μ, it functions as a measure of relative effective population size.
is a discrete-time model of stochastic reproduction (see also genetic drift) that assumes a population of size N, random mating, and non-overlapping generations.
- Beaumont MA. Approximate Bayesian computation in evolution and ecology. Annu Rev Ecol Evol Syst. 2010;41:379–406.Google Scholar
- Bienvenu F, Demetrius L, Legendre S. A general formula for the generation time. ArXiv Prepr. 2013:ArXiv13076692.Google Scholar
- Box GE, Draper NR, et al. Empirical model-building and response surfaces. New York: Wiley; 1987.Google Scholar
- Carvajal-Rodríguez A. GENOMEPOP: a program to simulate genomes in populations. BMC Bioinforma. 2008;9(1):223.Google Scholar
- Cornuet J-M, Pudlo P, Veyssier J, Dehne-Garcia A, Gautier M, Leblois R, Marin J-M, Estoup A. DIYABC v2. 0: a software to make approximate Bayesian computation inferences about population history using single nucleotide polymorphism, DNA sequence and microsatellite data. Bioinformatics. 2014;30:1187–9.PubMedGoogle Scholar
- Csilléry K, François O, Blum MG. abc: an R package for approximate Bayesian computation (ABC). Methods Ecol Evol. 2012;3:475–9.Google Scholar
- Fahrig L. Effects of habitat fragmentation on biodiversity. Annu Rev Ecol Evol Syst. 2003;34:487–515.Google Scholar
- Fisher RA. The distribution of gene ratios for rare mutations. Proc Roy Soc Edinburgh. 1930;50:205–22.Google Scholar
- Griffiths RC, Marjoram P. An ancestral recombination graph. In: Donnelly P, Tavar’e S, editors. Progress in population genetics and human evolution, IMA volumes in mathematics and its applications, vol 87. New York: Springer; 1997. p. 100–117.Google Scholar
- Griffiths RC, Tavaré S. The age of a mutation in a general coalescent tree. Stoch Models. 1998;14:273–95.Google Scholar
- Hein J, Schierop MH, Wiuf C. Gene genealogies, variation and evolution. A primer in coalescent theory. Oxford, UK: Oxford University Press; 2005.Google Scholar
- Hirschfeld L, Hirschfeld H. Serological differences between the blood of different races: the results of researches on the Macedonian front. Lancet. 1919;194:675–9.Google Scholar
- Hudson RR. Gene genealogies and the coalescent process. Oxf Surv Evol Biol. 1990;7(1):44.Google Scholar
- Kaj I, Krone SM, Lascoux M. Coalescent theory for seed bank models. J Appl Prob. 2001;38:285–300.Google Scholar
- Kern AD, Hey J. Exact calculation of the joint allele frequency spectrum for generalized isolation with migration models. BioRXiv. 2016. doi: http://dx.doi.org/10.1101/065003.
- Kimura M. Diffusion models in population genetics. J Appl Probab. 1964;1:177–232.Google Scholar
- Kingman JFC. The coalescent. Stoch Process Their Appl. 1982;13:235–48.Google Scholar
- Krone SM, Neuhauser C. Ancestral processes with selection. Theor Popn Biol. 1997;51:210–37.Google Scholar
- Luikart G, Cornuet J-M. Empirical evaluation of a test for identifying recently bottlenecked populations from allele frequency data. Conserv Biol. 1998;12:228–37.Google Scholar
- MacLeod IM, Hayes BJ, Goddard ME, et al. A novel predictor of multilocus haplotype homozygosity: comparison with existing predictors. Genet Res. 2009;91:413–26.Google Scholar
- Marjoram P, Joyce P. Practical implications of coalescent theory. Chapter 5. In: Heath LS, Ramakrishnan N, editors. Problem solving handbook in computational 63 biology and bioinformatics. New York: Springer; 2010.Google Scholar
- McKee JK, Sciulli PW, Fooce CD, Waite TA. Forecasting global biodiversity threats associated with human population growth. Biol Conserv. 2004;115:161–4.Google Scholar
- McVean GAT, Cardin NJ. Approximating the coalescent with recombination. Philos Trans R Soc B. 2005;360:1387–93.Google Scholar
- Moran PAP. Random processes in genetics. In: Proceedings of the Cambridge Philosophical Society. 1958. p. 60.Google Scholar
- Naduvilezhath L, Rose LE, Metzler D. Jaatha: a fast composite-likelihood approach to estimate demographic parameters. MolEcol. 2011;20:2709–23.Google Scholar
- Nelson GC, Dobermann A, Nakicenovic N, O’Neill BC. Anthropogenic drivers of ecosystem change: an overview. Ecol Soc. 2006;11.Google Scholar
- Nielsen R, Slatkin M. An introduction to population genetics: theory and applications. Sunderland, MA: Sinauer Associates; 2013.Google Scholar
- Nordborg M. Coalescent theory. In: Balding DJ, Bishop MJ, Cannings C, editors. Handbook of statistical genetics. New York: Wiley; 2001. p. 179–208Google Scholar
- Orlando L, Ginolhac A, Zhang G, Froese D, Albrechtsen A, Stiller M, Schubert M, Cappellini E, Petersen B, Moltke I, et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature. 2013;499:744–8.Google Scholar
- Palamara PF, Pe’er I. Inference of historical migration rates via haplotype sharing. Bioinformatics. 2013;8:i180–8.Google Scholar
- Veeramah KR, Woerner AE, Johnstone L, Gut I, Gut M, Marques-Bonet T, Carbone L, Wall JD, Hammer MF. Examining phylogenetic relationships among gibbon genera using whole genome sequence data using an approximate bayesian computation approach. Genetics. 2015;200:295–308.PubMedPubMedCentralGoogle Scholar
- Vitousek PM, Mooney HA, Lubchenco J, Melillo JM. Human domination of earth’s ecosystems. Science. 1997;277:494–9.Google Scholar
- Wakeley J. Coalescent theory: an introduction. San Francisco: W.H. Freeman; 2008.Google Scholar
- Warren MJ, Thomas GWC, Hahn MW, Raney BJ, Aken B, Nag R, Schmitz J, Churakov G, Noll A, Stanyon R, Webb D, Thibaud-Nissen F, Nordborg M, Marques-Bonet T, Dewar K, Weinstock GM, Wilson RK, Freimer NB. The genome of the vervet (Chlorocebus aethiops sabaeus). Genome Res. 2015;25:1921–33.PubMedPubMedCentralGoogle Scholar
- Watterson GA. The sampling theory of selectively neutral alleles. Adv Appl Probab. 1974:463–88.Google Scholar