Cluster Matching Distance for Rooted Phylogenetic Trees
Phylogenetic trees are fundamental to biology and are benefitting several other research areas. Various methods have been developed for inferring such trees, and comparing them is an important problem in computational phylogenetics. Addressing this problem requires tree measures, but all of them suffer from problems that can severely limit their applicability in practice. This also holds true for one of the oldest and most widely used tree measures, the Robinson-Foulds distance. While this measure is satisfying the properties of a metric and is efficiently computable, it has a negatively skewed distribution, a poor range of discrimination and diameter, and may not be robust when comparing erroneous trees. The cluster distance is a measure for comparing rooted trees that can be interpreted as a weighted version of the Robinson-Foulds distance. We show that when compared with the Robinson-Foulds distance, the cluster distance is much more robust towards small errors in the compared trees, and has a significantly improved distribution and range.
KeywordsEvolutionary trees Bipartite perfect matching Robinson-Foulds distance Cluster matching distance
This material is based upon work supported by the National Science Foundation under Grant No. 1617626.
- 2.Arvestad, L., et al.: Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution. In: RECOMB, pp. 326–335. ACM (2004)Google Scholar
- 6.Bourque, M.: Arbres de Steiner et réseaux dont varie l’emplagement de certains sommets. Ph.D. thesis, University of Montréal Montréal, Canada (1978)Google Scholar
- 7.Bryant, D.: Hunting for trees, building trees and comparing trees: theory and method in phylogenetic analysis. Ph.D. thesis, University of Canterbury, New Zealand (1997)Google Scholar
- 9.Das Gupta, B., et al.: On distances between phylogenetic trees. In: SODA 1997, pp. 427–436 (1997)Google Scholar
- 11.Felenstein, J.: Inferring Phylogenies. Sinauer, Sunderland (2003)Google Scholar
- 12.Forster, P., Renfrew, C.: Phylogenetic Methods and the Prehistory of Languages. McDonald Institute of Archeological, Cambridge (2006)Google Scholar
- 18.Hufbauer, R.A., et al.: Population structure, ploidy levels and allelopathy of Centaurea maculosa (spotted knapweed) and C. diffusa (diffuse knapweed) in North America and Eurasia. In: ISBCW, pp. 121–126. USDA Forest Service (2003)Google Scholar
- 19.Katherine, S.J.: Review paper: the shape of phylogenetic treespace. Syst. Biol. 66(1), e83–e94 (2017)Google Scholar
- 30.Semple, C., Steel, M.A.: Phylogenetics. Oxford (2003)Google Scholar
- 31.Steel, M.A., Penny, D.: Distributions of tree comparison metrics. Syst. Biol. 42(2), 126–141 (1993)Google Scholar