# Enumeration of compact coalescent histories for matching gene trees and species trees

- 89 Downloads
- 1 Citations

## Abstract

Compact coalescent histories are combinatorial structures that describe for a given gene tree *G* and species tree *S* possibilities for the numbers of coalescences of *G* that take place on the various branches of *S*. They have been introduced as a data structure for evaluating probabilities of gene tree topologies conditioning on species trees, reducing computation time compared to standard coalescent histories. When gene trees and species trees have a matching labeled topology \(G=S=t\), the compact coalescent histories of *t* are encoded by particular integer labelings of the branches of *t*, each integer specifying the number of coalescent events of *G* present in a branch of *S*. For matching gene trees and species trees, we investigate enumerative properties of compact coalescent histories. We report a recursion for the number of compact coalescent histories for matching gene trees and species trees, using it to study the numbers of compact coalescent histories for small trees. We show that the number of compact coalescent histories equals the number of coalescent histories if and only if the labeled topology is a caterpillar or a bicaterpillar. The number of compact coalescent histories is seen to increase with tree imbalance: we prove that as the number of taxa *n* increases, the exponential growth of the number of compact coalescent histories follows \(4^n\) in the case of caterpillar or bicaterpillar labeled topologies and approximately \(3.3302^n\) and \(2.8565^n\) for lodgepole and balanced topologies, respectively. We prove that the mean number of compact coalescent histories of a labeled topology of size *n* selected uniformly at random grows with \(3.3750^n\). Our results contribute to the analysis of the computational complexity of algorithms for computing gene tree probabilities, and to the combinatorial study of gene trees and species trees more generally.

## Keywords

Compact coalescent histories Gene trees Generating functions Phylogenetics Species trees## Mathematics Subject Classification

05A15 05A16 92B10 92D15## Notes

### Acknowledgements

Support was provided by National Institutes of Health grant R01 GM117590 and by a Rita Levi-Montalcini grant to FD from the Ministero dell’Istruzione, dell’Università e della Ricerca.

## References

- Banderier C, Bousquet-Mélou M, Denise A, Flajolet P, Gardy D, Gouyou-Beauchamps D (2002) Generating functions for generating trees. Discr Math 246:29–55MathSciNetCrossRefzbMATHGoogle Scholar
- Barcucci E, Del Lungo A, Pergola E, Pinzani R (1999) ECO: a methodology for the enumeration of combinatorial objects. J Differ Equ Appl 5:435–490MathSciNetCrossRefzbMATHGoogle Scholar
- Colless DH (1982) Phylogenetics, the theory and practice of phylogenetic systematics. Syst Zool 31:100–104CrossRefGoogle Scholar
- Degnan JH, Rhodes JA (2015) There are no caterpillars in a wicked forest. Theor Popul Biol 105:17–23CrossRefzbMATHGoogle Scholar
- Degnan JH, Rosenberg NA (2006) Discordance of species trees with their most likely gene trees. PLoS Genet 2:762–768CrossRefGoogle Scholar
- Degnan JH, Rosenberg NA, Stadler T (2012) The probability distribution of ranked gene trees on a species tree. Math Biosci 235:45–55MathSciNetCrossRefzbMATHGoogle Scholar
- Degnan JH, Salter LA (2005) Gene tree distributions under the coalescent process. Evolution 59:24–37CrossRefGoogle Scholar
- Deutsch E (2000) Problem 10658. Am Math Mon 107:368–370CrossRefGoogle Scholar
- Disanto F, Rosenberg NA (2015) Coalescent histories for lodgepole species trees. J Comput Biol 22:918–929MathSciNetCrossRefGoogle Scholar
- Disanto F, Rosenberg NA (2016) Asymptotic properties of the number of matching coalescent histories for caterpillar-like families of species trees. IEEE/ACM Trans Comput Biol Bioinf 13:913–925CrossRefGoogle Scholar
- Disanto F, Rosenberg NA (2017) Enumeration of ancestral configurations for matching gene trees and species trees. J Comput Biol 24:831–850MathSciNetCrossRefGoogle Scholar
- Disanto F, Rosenberg NA (2018) On the number of non-equivalent ancestral configurations for matching gene trees and species trees. Bull Math Biol (in press)Google Scholar
- Felsenstein J (1978) The number of evolutionary trees. Syst Zool 27:27–33CrossRefGoogle Scholar
- Flajolet P, Sedgewick R (2009) Analytic combinatorics. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
- Hammersley JM, Grimmett GR (1974) Maximal solutions of the generalized subadditive inequality. In: Harding EF, Kendall DG (eds) Stochastic geometry. Wiley, London, pp 270–285Google Scholar
- Harding EF (1971) The probabilities of rooted tree-shapes generated by random bifurcation. Adv Appl Prob 3:44–77MathSciNetCrossRefzbMATHGoogle Scholar
- Maddison WP (1997) Gene trees in species trees. Syst Biol 46:523–536CrossRefGoogle Scholar
- Rosenberg NA (2007) Counting coalescent histories. J Comput Biol 14:360–377MathSciNetCrossRefGoogle Scholar
- Rosenberg NA (2013) Coalescent histories for caterpillar-like families. IEEE/ACM Trans Comput Biol Bioinf 10:1253–1262CrossRefGoogle Scholar
- Rosenberg NA, Degnan JH (2010) Coalescent histories for discordant gene trees and species trees. Theor Popul Biol 77:145–151CrossRefzbMATHGoogle Scholar
- Rosenberg NA, Tao R (2008) Discordance of species trees with their most likely gene trees: the case of five taxa. Syst Biol 57:131–140CrossRefGoogle Scholar
- Than C, Nakhleh L (2009) Species tree inference by minimizing deep coalescences. PLoS Comput Biol 5:e1000501MathSciNetCrossRefGoogle Scholar
- Than C, Ruths D, Innan H, Nakhleh L (2007) Confounding factors in HGT detection: statistical error, coalescent effects, and multiple solutions. J Comput Biol 14:517–535MathSciNetCrossRefGoogle Scholar
- Wu Y (2012) Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood. Evolution 66:763–775CrossRefGoogle Scholar
- Wu Y (2016) An algorithm for computing the gene tree probability under the multispecies coalescent and its application in the inference of population tree. Bioinformatics 32:i225–i233CrossRefGoogle Scholar