# How Many Subpopulations Is Too Many? Exponential Lower Bounds for Inferring Population Histories

## Abstract

Reconstruction of population histories is a central problem in population genetics. Existing coalescent-based methods, like the seminal work of Li and Durbin (*Nature*, 2011), attempt to solve this problem using sequence data but have no rigorous guarantees. Determining the amount of data needed to *correctly reconstruct* population histories is a major challenge. Using a variety of tools from information theory, the theory of extremal polynomials, and approximation theory, we prove new sharp information-theoretic lower bounds on the problem of reconstructing *population structure*—the history of multiple subpopulations that merge, split and change sizes over time. Our lower bounds are *exponential* in the number of subpopulations, even when reconstructing recent histories. We demonstrate the sharpness of our lower bounds by providing algorithms for distinguishing and learning population histories with matching dependence on the number of subpopulations.

## Keywords

Population size histories Mixtures of exponentials Sample complexity## Notes

### Acknowledgements

This work was funded in part by ONR N00014-16-1-2227, NSF CCF1665252, NSF DMS-1737944, NSF Large CCF-1565235, NSF CAREER Award CCF-1453261, as well as Ankur Moitra’s David and Lucile Packard Fellowship, Alfred P. Sloan Fellowship, and ONR Young Investigator Award.

## References

- 1.Bhaskar, A., Song, Y.S.: Descartes’ rule of signs and the identifiability of population demographic models from genomic variation data. Ann. Stat.
**42**(6), 2469 (2014)MathSciNetCrossRefGoogle Scholar - 2.Bhaskar, A., Wang, Y.R., Song, Y.S.: Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res.
**25**(2), 268–279 (2015). gr-178756CrossRefGoogle Scholar - 3.Blythe, R.A., McKane, A.J.: Stochastic models of evolution in genetics, ecology and linguistics. J. Stat. Mech.: Theory Exp.
**2007**(07), P07018 (2007)CrossRefGoogle Scholar - 4.Candès, E.J., Fernandez-Granda, C.: Super-resolution from noisy data. J. Fourier Anal. Appl.
**19**(6), 1229–1254 (2013)MathSciNetCrossRefGoogle Scholar - 5.Drummond, A., Rambaut, A., Shapiro, B., Pybus, O.: Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol.
**22**(5), 1185–1192 (2005)CrossRefGoogle Scholar - 6.Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V.C., Foll, M.: Robust demographic inference from genomic and SNP data. PLoS Genet.
**9**(10), e1003905 (2013)CrossRefGoogle Scholar - 7.Gautschi, W.: On inverses of vandermonde and confluent vandermonde matrices. Numer. Math.
**4**(1), 117–123 (1962)MathSciNetCrossRefGoogle Scholar - 8.Heled, J., Drummond, A.: Bayesian inference of population size history from multiple loci. BMC Evol. Biol.
**8**(1), 289 (2008)CrossRefGoogle Scholar - 9.Hua, Y., Sarkar, T.K.: Matrix pencil method for estimating parameters of exponentially damped/undamped sinusoids in noise. IEEE Trans. Acoust. Speech Signal Process.
**38**(5), 814–824 (1990)MathSciNetCrossRefGoogle Scholar - 10.Joseph, T.A., Pe’er, I.: Inference of population structure from ancient DNA. In: Raphael, B.J. (ed.) RECOMB 2018. LNCS, vol. 10812, pp. 90–104. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89929-9_6CrossRefGoogle Scholar
- 11.Kim, J., Mossel, E., Rácz, M.Z., Ross, N.: Can one hear the shape of a population history? Theor. Popul. Biol.
**100**, 26–38 (2015)CrossRefGoogle Scholar - 12.Kim, Y., Koehler, F., Moitra, A., Mossel, E., Ramnarayan, G.: How many subpopulations is too many? Exponential lower bounds for inferring population histories. arXiv preprint arXiv:1811.03177 (2018)
- 13.Kimura, M., Crow, J.F.: The number of alleles that can be maintained in a finite population. Genetics
**49**(4), 725 (1964)Google Scholar - 14.Li, H., Durbin, R.: Inference of human population history from individual whole-genome sequences. Nature
**475**(7357), 493 (2011)CrossRefGoogle Scholar - 15.McVean, G.A., Cardin, N.J.: Approximating the coalescent with recombination. Philos. Trans. Roy. Soc. London B: Biol. Sci.
**360**(1459), 1387–1393 (2005)CrossRefGoogle Scholar - 16.Moitra, A.: Super-resolution, extremal functions and the condition number of vandermonde matrices. In: Proceedings of the Forty-seventh Annual ACM Symposium on Theory of Computing, STOC 2015, pp. 821–830. ACM, New York (2015). https://doi.org/10.1145/2746539.2746561
- 17.Myers, S., Fefferman, C., Patterson, N.: Can one learn history from the allelic spectrum? Theor. Popul. Biol.
**73**(3), 342–348 (2008)CrossRefGoogle Scholar - 18.Nazarov, F.L.: Local estimates for exponential polynomials and their applications to inequalities of the uncertainty principle type. Algebra i analiz
**5**(4), 3–66 (1993)MathSciNetzbMATHGoogle Scholar - 19.Nielsen, R.: Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics
**154**(2), 931–942 (2000)Google Scholar - 20.Nordborg, M.: Coalescent theory. Handb. Stat. Genet.
**2**, 843–877 (2001)Google Scholar - 21.Schiffels, S., Durbin, R.: Inferring human population size and separation history from multiple genome sequences. Nat. Genet.
**46**(8), 919 (2014)CrossRefGoogle Scholar - 22.Sheehan, S., Harris, K., Song, Y.S.: Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach. Genetics
**194**, 647–662 (2013)CrossRefGoogle Scholar - 23.Terhorst, J., Kamm, J.A., Song, Y.S.: Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet.
**49**(2), 303 (2017)CrossRefGoogle Scholar - 24.Terhorst, J., Song, Y.S.: Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proc. Nat. Acad. Sci.
**112**(25), 7677–7682 (2015)CrossRefGoogle Scholar - 25.Turán, P.: On a New Method of Analysis and Its Applications. Wiley, New York (1984)zbMATHGoogle Scholar