Skip to main content

A Sublinear-Time Randomized Approximation Scheme for the Robinson-Foulds Metric

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3909))

Abstract

The Robinson-Foulds (RF) metric is the measure most widely used in comparing phylogenetic trees; it can be computed in linear time using Day’s algorithm. When faced with the need to compare large numbers of large trees, however, even linear time becomes prohibitive. We present a randomized approximation scheme that provides, with high probability, a (1+ε) approximation of the true RF metric for all pairs of trees in a given collection. Our approach is to use a sublinear-space embedding of the trees, combined with an application of the Johnson-Lindenstrauss lemma to approximate vector norms very rapidly. We discuss the consequences of various parameter choices (in the embedding and in the approximation requirements). We also implemented our algorithm as a Java class that can easily be combined with popular packages such as Mesquite; in consequence, we present experimental results illustrating the precision and running-time tradeoffs as well as demonstrating the speed of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bryant, D.: A classification of consensus methods for phylogenetics. In: Bioconsensus. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 61, pp. 163–184. American Math. Soc (2002)

    Google Scholar 

  2. Bininda-Edmonds, O. (ed.): Phylogenetic Supertrees: Combining information to reveal the Tree of Life. Kluwer Publ., Dordrecht (2004)

    Google Scholar 

  3. DasGupta, B., He, X., Jiang, T., Li, M., Tromp, J., Zhang, L.: On computing the nearest neighbor interchange distance. In: Proc. DIMACS Workshop on Discrete Problems with Medical Applications. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 55, pp. 125–143. American Math. Soc (2000)

    Google Scholar 

  4. Allen, B., Steel, M.: Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics 5, 1–15 (2001)

    Article  MathSciNet  Google Scholar 

  5. Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosciences 53, 131–147 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  6. Day, W.: Optimal algorithms for comparing trees with labeled leaves. J. of Classification 2, 7–28 (1985)

    Article  MATH  Google Scholar 

  7. Johnson, W., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Cont. Math. 26, 189–206 (1984)

    MATH  MathSciNet  Google Scholar 

  8. Maddison, W., Maddison, D.: Mesquite: A modular system for evolutionary analysis (2005), Version 1.06: http://mesquiteproject.org

  9. Bryant, D.: The splits in the neighborhood of a tree. Annals of Combinatorics 8, 1–11 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  10. Indyk, P.: Algorithmic applications of low-distortion geometric embeddings. In: Proc. 42nd IEEE Symp. on Foundations of Computer Science FOCS 2001, pp. 10–33. IEEE Computer Society, Los Alamitos (2001)

    Google Scholar 

  11. Linial, N., London, E., Rabinovich, Y.: The geometry of graphs and some of its algorithmic applications. Combinatorica 15, 215–245 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  12. Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proc. 13th ACM Symp. on Theory of Computing STOC 1998, pp. 604–613 (1998)

    Google Scholar 

  13. Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66, 671–687 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  14. Hillis, D., Heath, T., St John, K.: Analysis and visualization of tree space. Syst. Bio. 54, 471–482 (1995)

    Article  Google Scholar 

  15. Amenta, N., Klingner, J.: Case study: Visualizing sets of evolutionary trees. In: Proc. IEEE Symp. on Information Visualization INFOVIS 2002, pp. 71–73. IEEE Computer Society, Los Alamitos (2002)

    Chapter  Google Scholar 

  16. Maddison, D.: The discovery and importance of multiple islands of most-parsimonious trees. Syst. Zoology 40, 315–328 (1991)

    Article  Google Scholar 

  17. Rand, W.: Objective criteria for the evaluation of clustering methods. J. American Stat. Assoc. 66, 846–850 (1971)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pattengale, N.D., Moret, B.M.E. (2006). A Sublinear-Time Randomized Approximation Scheme for the Robinson-Foulds Metric. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2006. Lecture Notes in Computer Science(), vol 3909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732990_19

Download citation

  • DOI: https://doi.org/10.1007/11732990_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33295-4

  • Online ISBN: 978-3-540-33296-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics