# On the Number of Non-equivalent Ancestral Configurations for Matching Gene Trees and Species Trees

- 131 Downloads
- 1 Citations

## Abstract

An *ancestral configuration* is one of the combinatorially distinct sets of gene lineages that, for a given gene tree, can reach a given node of a specified species tree. Ancestral configurations have appeared in recursive algebraic computations of the conditional probability that a gene tree topology is produced under the multispecies coalescent model for a given species tree. For matching gene trees and species trees, we study the number of ancestral configurations, considered up to an equivalence relation introduced by Wu (Evolution 66:763–775, 2012) to reduce the complexity of the recursive probability computation. We examine the largest number of non-equivalent ancestral configurations possible for a given tree size *n*. Whereas the smallest number of non-equivalent ancestral configurations increases polynomially with *n*, we show that the largest number increases with \(k^n\), where *k* is a constant that satisfies \(\root 3 \of {3}\,\le \,k\,<\,1.503\). Under a uniform distribution on the set of binary labeled trees with a given size *n*, the mean number of non-equivalent ancestral configurations grows exponentially with *n*. The results refine an earlier analysis of the number of ancestral configurations considered without applying the equivalence relation, showing that use of the equivalence relation does not alter the exponential nature of the increase with tree size.

## Keywords

Ancestral configurations Combinatorics Gene trees and species trees Phylogenetics## Notes

### Acknowledgements

We thank Elizabeth Allman, James Degnan, and John Rhodes for discussions, and two reviewers for comments. Support was provided by National Institutes of Health grant R01 GM117590 and by a 2014 Rita Levi Montalcini grant to FD from the Ministero dell’Istruzione, dell’Università e della Ricerca.

## References

- Aho AV, Sloane NJA (1973) Some doubly exponential sequences. Fibonacci Q. 11:429–437MathSciNetzbMATHGoogle Scholar
- Allman ES, Degnan JH, Rhodes JA (2011) Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J Math Biol 62:833–862MathSciNetCrossRefzbMATHGoogle Scholar
- Degnan JH, Salter LA (2005) Gene tree distributions under the coalescent process. Evolution 59:24–37CrossRefGoogle Scholar
- Disanto F, Rosenberg NA (2015) Coalescent histories for lodgepole species trees. J Comput Biol 22:918–929MathSciNetCrossRefGoogle Scholar
- Disanto F, Rosenberg NA (2016) Asymptotic properties of the number of matching coalescent histories for caterpillar-like families of species trees. IEEE/ACM Trans Comput Biol Bioinf 13:913–925CrossRefGoogle Scholar
- Disanto F, Rosenberg NA (2017) Enumeration of ancestral configurations for matching gene trees and species trees. J Comput Biol 24:831–850Google Scholar
- Felsenstein J (1978) The number of evolutionary trees. Syst. Zool. 27:27–33CrossRefGoogle Scholar
- Felsenstein J (2004) Inferring phylogenies. Sinauer, Sunderland, MAGoogle Scholar
- Flajolet P, Sedgewick R (2009) Analytic combinatorics. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
- Harding EF (1971) The probabilities of rooted tree-shapes generated by random bifurcation. Adv Appl Prob 3:44–77MathSciNetCrossRefzbMATHGoogle Scholar
- Rosenberg NA (2006) The mean and variance of the numbers of \(r\)-pronged nodes and \(r\)-caterpillars in Yule-generated genealogical trees. Ann Comb 10:129–146MathSciNetCrossRefzbMATHGoogle Scholar
- Rosenberg NA (2007) Counting coalescent histories. J Comput Biol 14:360–377MathSciNetCrossRefGoogle Scholar
- Rosenberg NA (2013) Coalescent histories for caterpillar-like families. IEEE/ACM Trans Comput Biol Bioinf 10:1253–1262CrossRefGoogle Scholar
- Rosenberg NA, Degnan JH (2010) Coalescent histories for discordant gene trees and species trees. Theor Pop Biol 77:145–151CrossRefzbMATHGoogle Scholar
- Sedgewick R, Flajolet P (1996) An introduction to the analysis of algorithms. Addison-Wesley, BostonzbMATHGoogle Scholar
- Than C, Ruths D, Innan H, Nakhleh L (2007) Confounding factors in HGT detection: statistical error, coalescent effects, and multiple solutions. J Comput Biol 14:517–535MathSciNetCrossRefGoogle Scholar
- Wu Y (2012) Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood. Evolution 66:763–775CrossRefGoogle Scholar