Journal of Mathematical Biology

, Volume 78, Issue 6, pp 1727–1769 | Cite as

A general framework for moment-based analysis of genetic data

  • Maria Simonsen SpeedEmail author
  • David Joseph Balding
  • Asger Hobolth


In population genetics, the Dirichlet (also called the Balding–Nichols) model has for 20 years been considered the key model to approximate the distribution of allele fractions within populations in a multi-allelic setting. It has often been noted that the Dirichlet assumption is approximate because positive correlations among alleles cannot be accommodated under the Dirichlet model. However, the validity of the Dirichlet distribution has never been systematically investigated in a general framework. This paper attempts to address this problem by providing a general overview of how allele fraction data under the most common multi-allelic mutational structures should be modeled. The Dirichlet and alternative models are investigated by simulating allele fractions from a diffusion approximation of the multi-allelic Wright–Fisher process with mutation, and applying a moment-based analysis method. The study shows that the optimal modeling strategy for the distribution of allele fractions depends on the specific mutation process. The Dirichlet model is only an exceptionally good approximation for the pure drift, Jukes–Cantor and parent-independent mutation processes with small mutation rates. Alternative models are required and proposed for the other mutation processes, such as a Beta–Dirichlet model for the infinite alleles mutation process, and a Hierarchical Beta model for the Kimura, Hasegawa–Kishino–Yano and Tamura–Nei processes. Finally, a novel Hierarchical Beta approximation is developed, a Pyramidal Hierarchical Beta model, for the generalized time-reversible and single-step mutation processes.


Allele fraction Beta–Dirichlet Diffusion Dirichlet Distribution of allele fractions Evolutionary history Hierarchical Beta Moments Multi-allelic Wright–Fisher Mutation processes Pyramid 

Mathematics Subject Classification

60J25 60J60 62E17 62M05 92D25 



We are grateful to the associate editor and two anonymous reviewers for helpful comments and suggestions. This work is funded through a Grant from the Danish Research Council (DFF 4002-00382) awarded to Asger Hobolth.


  1. 1000 Genomes Project Consortium et al (2015) A global reference for human genetic variation. Nature 526(7571):68–74CrossRefGoogle Scholar
  2. Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, Boca RatonCrossRefzbMATHGoogle Scholar
  3. Balding DJ, Nichols RA (1995) A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96:3–12CrossRefGoogle Scholar
  4. Balding DJ, Nichols RA (1997) Significant genetic correlations among Caucasians at forensic DNA loci. Heredity 78(6):583–589CrossRefGoogle Scholar
  5. Balding DJ, Steele CD (2015) Weight-of-evidence for forensic DNA profiles, 2nd edn. Wiley, WoolloongabbaCrossRefGoogle Scholar
  6. Crow JF, Kimura M (1970) An introduction to population genetics theory. Harper & Row, Publishers, New YorkzbMATHGoogle Scholar
  7. De Maio N, Schrempf D, Kosiol C (2015) PoMo: an allele frequency-based approach for species tree estimation. Syst Biol 64(6):1018–1031CrossRefGoogle Scholar
  8. Etheridge A (2012) Some mathematical models from population genetics. Springer, BerlinzbMATHGoogle Scholar
  9. Ewens WJ (2004) Mathematical population genetics 1: I. Theoretical introduction, vol 27. Springer, New YorkCrossRefzbMATHGoogle Scholar
  10. Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, SunderlandGoogle Scholar
  11. Gautier M, Vitalis R (2013) Inferring population histories using genome-wide allele frequency data. Mol Biol Evol 30(3):654–68CrossRefGoogle Scholar
  12. Griffiths RC, Spanò D (2010) Diffusion processes and coalescent trees. In: Bingham NH, Goldie CM (eds) Probability and mathematical genetics: papers in honour of Sir John Kingman. Cambridge University Press, Cambridge, pp 358–379CrossRefGoogle Scholar
  13. Hasegawa M, Kishino H, Yano T (1985) Dating of human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22(2):160–174CrossRefGoogle Scholar
  14. Hobolth A, Sirén J (2016) The multivariate Wright–Fisher process with mutation: moment-based analysis and inference using a hierarchical Beta model. Theor Popul Biol 108:36–50CrossRefzbMATHGoogle Scholar
  15. Hodgkinson A, Eyre-Walker A (2010) Human triallelic sites: Evidence for a new mutational mechanism? Genetics 184(1):233–241CrossRefGoogle Scholar
  16. Jenkins PA, Mueller JW, Song YS (2014) General triallelic frequency spectrum under demographic models with variable population size. Genetics 196:295–311CrossRefGoogle Scholar
  17. Jukes TH, Cantor CR (1969) Evolution of protein molecules. Academic Press, New York, pp 21–132Google Scholar
  18. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16(2):111–120CrossRefGoogle Scholar
  19. Kimura M, Ohta T (1978) Stepwise mutation model and distribution of allele frequencies in a finite population. Proc Natl Acad Sci 75(6):2868–2872CrossRefzbMATHGoogle Scholar
  20. Motoo K (1955a) Random genetic drift in multi-allelic locus. Evolution 9(4):419–435CrossRefGoogle Scholar
  21. Motoo K (1955b) Solution of a process of random genetic drift with a continuous model. Proc Natl Acad Sci U S A 41(3):144CrossRefzbMATHGoogle Scholar
  22. Nicholson G, Smith AV, Jónsson F, Gustafsson Ó, Stefánsson K, Donnelly P (2002) Assessing population differentiation and isolation from single-nucleotide polymorphism data. J R Stat Soc Ser B (Stat Methodol) 64(4):695–715MathSciNetCrossRefzbMATHGoogle Scholar
  23. Ongora A, Migliorati S, Monti GS (2008) A new distribution on the simplex containing the Dirichlet family. In: Proceedings of the 3rd compositional data analysis workshop, 27–30 May. University of GironaGoogle Scholar
  24. Pickrell JK, Pritchard JK (2012) Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet 8(11):e1002967CrossRefGoogle Scholar
  25. Ross SM (1996) Stochastic processes, 2nd edn. Wiley, HobokenzbMATHGoogle Scholar
  26. Sirén J, Marttinen P, Corander J (2011) Reconstructing population histories from single nucleotide polymorphism data. Mol Biol Evol 28:673–683CrossRefGoogle Scholar
  27. Sirén J, Hanage WP, Corander J (2013) Inference on population histories by approximating infinite alleles diffusion. Mol Biol Evol 30(2):457–468CrossRefGoogle Scholar
  28. Swofford DL, Olsen GJ, Waddell PJ, Hillis DM (1996) Phylogenetic inference. Sinauer Associates, SunderlandGoogle Scholar
  29. Tamura K, Nei M (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10(3):512–526Google Scholar
  30. Tataru P, Bataillon T, Hobolth A (2015) Inference under a Wright–Fisher model using an accurate Beta approximation. Genetics 201:1133–1141CrossRefGoogle Scholar
  31. Tataru P, Simonsen M, Bataillon T, Hobolth A (2016) Statistical inference in the Wright–Fisher model using allele frequency data. Syst Biol 66:e30–e46Google Scholar
  32. Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101:1566–1581MathSciNetCrossRefzbMATHGoogle Scholar
  33. Wong TT (2010) Parameter estimation for generalized Dirichlet distributions from the sample estimates of the first and the second moments of random variables. Comput Stat Data Anal 54(7):1756–1765MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Bioinformatics Research CentreAarhus UniversityAarhusDenmark
  2. 2.Melbourne Integrative Genomics, School of BioSciences and School of Mathematics & StatisticsUniversity of MelbourneMelbourneAustralia
  3. 3.Department of Affective DisordersAarhus University HospitalAarhusDenmark

Personalised recommendations