Advertisement

How Many Subpopulations Is Too Many? Exponential Lower Bounds for Inferring Population Histories

  • Younhun Kim
  • Frederic KoehlerEmail author
  • Ankur Moitra
  • Elchanan Mossel
  • Govind Ramnarayan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11467)

Abstract

Reconstruction of population histories is a central problem in population genetics. Existing coalescent-based methods, like the seminal work of Li and Durbin (Nature, 2011), attempt to solve this problem using sequence data but have no rigorous guarantees. Determining the amount of data needed to correctly reconstruct population histories is a major challenge. Using a variety of tools from information theory, the theory of extremal polynomials, and approximation theory, we prove new sharp information-theoretic lower bounds on the problem of reconstructing population structure—the history of multiple subpopulations that merge, split and change sizes over time. Our lower bounds are exponential in the number of subpopulations, even when reconstructing recent histories. We demonstrate the sharpness of our lower bounds by providing algorithms for distinguishing and learning population histories with matching dependence on the number of subpopulations.

Keywords

Population size histories Mixtures of exponentials Sample complexity 

Notes

Acknowledgements

This work was funded in part by ONR N00014-16-1-2227, NSF CCF1665252, NSF DMS-1737944, NSF Large CCF-1565235, NSF CAREER Award CCF-1453261, as well as Ankur Moitra’s David and Lucile Packard Fellowship, Alfred P. Sloan Fellowship, and ONR Young Investigator Award.

References

  1. 1.
    Bhaskar, A., Song, Y.S.: Descartes’ rule of signs and the identifiability of population demographic models from genomic variation data. Ann. Stat. 42(6), 2469 (2014)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bhaskar, A., Wang, Y.R., Song, Y.S.: Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res. 25(2), 268–279 (2015). gr-178756CrossRefGoogle Scholar
  3. 3.
    Blythe, R.A., McKane, A.J.: Stochastic models of evolution in genetics, ecology and linguistics. J. Stat. Mech.: Theory Exp. 2007(07), P07018 (2007)CrossRefGoogle Scholar
  4. 4.
    Candès, E.J., Fernandez-Granda, C.: Super-resolution from noisy data. J. Fourier Anal. Appl. 19(6), 1229–1254 (2013)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Drummond, A., Rambaut, A., Shapiro, B., Pybus, O.: Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22(5), 1185–1192 (2005)CrossRefGoogle Scholar
  6. 6.
    Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V.C., Foll, M.: Robust demographic inference from genomic and SNP data. PLoS Genet. 9(10), e1003905 (2013)CrossRefGoogle Scholar
  7. 7.
    Gautschi, W.: On inverses of vandermonde and confluent vandermonde matrices. Numer. Math. 4(1), 117–123 (1962)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Heled, J., Drummond, A.: Bayesian inference of population size history from multiple loci. BMC Evol. Biol. 8(1), 289 (2008)CrossRefGoogle Scholar
  9. 9.
    Hua, Y., Sarkar, T.K.: Matrix pencil method for estimating parameters of exponentially damped/undamped sinusoids in noise. IEEE Trans. Acoust. Speech Signal Process. 38(5), 814–824 (1990)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Joseph, T.A., Pe’er, I.: Inference of population structure from ancient DNA. In: Raphael, B.J. (ed.) RECOMB 2018. LNCS, vol. 10812, pp. 90–104. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-89929-9_6CrossRefGoogle Scholar
  11. 11.
    Kim, J., Mossel, E., Rácz, M.Z., Ross, N.: Can one hear the shape of a population history? Theor. Popul. Biol. 100, 26–38 (2015)CrossRefGoogle Scholar
  12. 12.
    Kim, Y., Koehler, F., Moitra, A., Mossel, E., Ramnarayan, G.: How many subpopulations is too many? Exponential lower bounds for inferring population histories. arXiv preprint arXiv:1811.03177 (2018)
  13. 13.
    Kimura, M., Crow, J.F.: The number of alleles that can be maintained in a finite population. Genetics 49(4), 725 (1964)Google Scholar
  14. 14.
    Li, H., Durbin, R.: Inference of human population history from individual whole-genome sequences. Nature 475(7357), 493 (2011)CrossRefGoogle Scholar
  15. 15.
    McVean, G.A., Cardin, N.J.: Approximating the coalescent with recombination. Philos. Trans. Roy. Soc. London B: Biol. Sci. 360(1459), 1387–1393 (2005)CrossRefGoogle Scholar
  16. 16.
    Moitra, A.: Super-resolution, extremal functions and the condition number of vandermonde matrices. In: Proceedings of the Forty-seventh Annual ACM Symposium on Theory of Computing, STOC 2015, pp. 821–830. ACM, New York (2015).  https://doi.org/10.1145/2746539.2746561
  17. 17.
    Myers, S., Fefferman, C., Patterson, N.: Can one learn history from the allelic spectrum? Theor. Popul. Biol. 73(3), 342–348 (2008)CrossRefGoogle Scholar
  18. 18.
    Nazarov, F.L.: Local estimates for exponential polynomials and their applications to inequalities of the uncertainty principle type. Algebra i analiz 5(4), 3–66 (1993)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Nielsen, R.: Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154(2), 931–942 (2000)Google Scholar
  20. 20.
    Nordborg, M.: Coalescent theory. Handb. Stat. Genet. 2, 843–877 (2001)Google Scholar
  21. 21.
    Schiffels, S., Durbin, R.: Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46(8), 919 (2014)CrossRefGoogle Scholar
  22. 22.
    Sheehan, S., Harris, K., Song, Y.S.: Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach. Genetics 194, 647–662 (2013)CrossRefGoogle Scholar
  23. 23.
    Terhorst, J., Kamm, J.A., Song, Y.S.: Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49(2), 303 (2017)CrossRefGoogle Scholar
  24. 24.
    Terhorst, J., Song, Y.S.: Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proc. Nat. Acad. Sci. 112(25), 7677–7682 (2015)CrossRefGoogle Scholar
  25. 25.
    Turán, P.: On a New Method of Analysis and Its Applications. Wiley, New York (1984)zbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Younhun Kim
    • 1
  • Frederic Koehler
    • 1
    Email author
  • Ankur Moitra
    • 1
  • Elchanan Mossel
    • 1
  • Govind Ramnarayan
    • 1
  1. 1.Massachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations