Abstract
The Robinson-Foulds (RF) metric is the measure most widely used in comparing phylogenetic trees; it can be computed in linear time using Day’s algorithm. When faced with the need to compare large numbers of large trees, however, even linear time becomes prohibitive. We present a randomized approximation scheme that provides, with high probability, a (1+ε) approximation of the true RF metric for all pairs of trees in a given collection. Our approach is to use a sublinear-space embedding of the trees, combined with an application of the Johnson-Lindenstrauss lemma to approximate vector norms very rapidly. We discuss the consequences of various parameter choices (in the embedding and in the approximation requirements). We also implemented our algorithm as a Java class that can easily be combined with popular packages such as Mesquite; in consequence, we present experimental results illustrating the precision and running-time tradeoffs as well as demonstrating the speed of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bryant, D.: A classification of consensus methods for phylogenetics. In: Bioconsensus. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 61, pp. 163–184. American Math. Soc (2002)
Bininda-Edmonds, O. (ed.): Phylogenetic Supertrees: Combining information to reveal the Tree of Life. Kluwer Publ., Dordrecht (2004)
DasGupta, B., He, X., Jiang, T., Li, M., Tromp, J., Zhang, L.: On computing the nearest neighbor interchange distance. In: Proc. DIMACS Workshop on Discrete Problems with Medical Applications. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 55, pp. 125–143. American Math. Soc (2000)
Allen, B., Steel, M.: Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics 5, 1–15 (2001)
Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosciences 53, 131–147 (1981)
Day, W.: Optimal algorithms for comparing trees with labeled leaves. J. of Classification 2, 7–28 (1985)
Johnson, W., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Cont. Math. 26, 189–206 (1984)
Maddison, W., Maddison, D.: Mesquite: A modular system for evolutionary analysis (2005), Version 1.06: http://mesquiteproject.org
Bryant, D.: The splits in the neighborhood of a tree. Annals of Combinatorics 8, 1–11 (2004)
Indyk, P.: Algorithmic applications of low-distortion geometric embeddings. In: Proc. 42nd IEEE Symp. on Foundations of Computer Science FOCS 2001, pp. 10–33. IEEE Computer Society, Los Alamitos (2001)
Linial, N., London, E., Rabinovich, Y.: The geometry of graphs and some of its algorithmic applications. Combinatorica 15, 215–245 (1995)
Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proc. 13th ACM Symp. on Theory of Computing STOC 1998, pp. 604–613 (1998)
Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66, 671–687 (2003)
Hillis, D., Heath, T., St John, K.: Analysis and visualization of tree space. Syst. Bio. 54, 471–482 (1995)
Amenta, N., Klingner, J.: Case study: Visualizing sets of evolutionary trees. In: Proc. IEEE Symp. on Information Visualization INFOVIS 2002, pp. 71–73. IEEE Computer Society, Los Alamitos (2002)
Maddison, D.: The discovery and importance of multiple islands of most-parsimonious trees. Syst. Zoology 40, 315–328 (1991)
Rand, W.: Objective criteria for the evaluation of clustering methods. J. American Stat. Assoc. 66, 846–850 (1971)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pattengale, N.D., Moret, B.M.E. (2006). A Sublinear-Time Randomized Approximation Scheme for the Robinson-Foulds Metric. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2006. Lecture Notes in Computer Science(), vol 3909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732990_19
Download citation
DOI: https://doi.org/10.1007/11732990_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33295-4
Online ISBN: 978-3-540-33296-1
eBook Packages: Computer ScienceComputer Science (R0)