Efficiently Computing Arbitrarily-Sized Robinson-Foulds Distance Matrices

Sul, Seung-Jin; Brammer, Grant; Williams, Tiffani L.

doi:10.1007/978-3-540-87361-7_11

Seung-Jin Sul¹,
Grant Brammer¹ &
Tiffani L. Williams¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5251))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1020 Accesses
7 Citations

Abstract

In this paper, we introduce the HashRF(p,q) algorithm for computing RF matrices of large binary, evolutionary tree collections. The novelty of our algorithm is that it can be used to compute arbitrarily-sized (p ×q) RF matrices without running into physical memory limitations. In this paper, we explore the performance of our HashRF(p,q) approach on 20,000 and 33,306 biological trees of 150 taxa and 567 taxa trees, respectively, collected from a Bayesian analysis. When computing the all-to-all RF matrix, HashRF(p,q) is up to 200 times faster than PAUP* and around 40% faster than HashRF, one of the fastest all-to-all RF algorithms. We show an application of our approach by clustering large RF matrices to improve the resolution rate of consensus trees, a popular approach used by biologists to summarize the results of their phylogenetic analysis. Thus, our HashRF(p,q) algorithm provides scientists with a fast and efficient alternative for understanding the evolutionary relationships among a set of trees.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Huelsenbeck, J.P., Ronquist, F., Nielsen, R., Bollback, J.P.: Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294, 2310–2314 (2001)
Article Google Scholar
Hillis, D.M., Heath, T.A., John, K.S.: Analysis and visualization of tree space. Syst. Biol. 54(3), 471–482 (2005)
Article Google Scholar
Stockham, C., Wang, L.S., Warnow, T.: Statistically based postprocessing of phylogenetic analysis by cluste ring. In: Proceedings of 10th Int’l Conf. on Intelligent Systems for Molecular Biology (ISMB 2002), pp. 285–293 (2002)
Google Scholar
Swofford, D.L.: PAUP*: Phylogenetic analysis using parsimony (and other methods), Sinauer Associates, Underland, Massachusetts, Version 4.0 (2002)
Google Scholar
Felsenstein, J.: Inferring Phylogenies. Sinauer Associates (2003)
Google Scholar
Day, W.H.E.: Optimal algorithms for comparing trees with labeled leaves. Journal Of Classification 2, 7–28 (1985)
Article MATH MathSciNet Google Scholar
Pattengale, N., Gottlieb, E., Moret, B.: Efficiently computing the Robinson-Foulds metric. Journal of Computational Biology 14(6), 724–735 (2007)
Article MathSciNet Google Scholar
Sul, S.J., Williams, T.L.: A randomized algorithm for comparing sets of phylogenetic trees. In: Proc. Fifth Asia Pacific Bioinformatics Conference (APBC 2007), pp. 121–130 (2007)
Google Scholar
Huelsenbeck, J.P., Ronquist, F.: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17(8), 754–755 (2001)
Article Google Scholar
Lewis, L.A., Lewis, P.O.: Unearthing the molecular phylodiversity of desert soil green algae (chlorophyta). Syst. Bio. 54(6), 936–947 (2005)
Article Google Scholar
Soltis, D.E., Gitzendanner, M.A., Soltis, P.S.: A 567-taxon data set for angiosperms: The challenges posed by bayesian analyses of large data sets. Int. J. Plant Sci. 168(2), 137–157 (2007)
Article Google Scholar
Karypis, G.: CLUTO—software for clustering high-dimensional datasets. Internet Website (last accessed, June 2008), http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview

Download references

Author information

Authors and Affiliations

Department of Computer Science, Texas A&M University, College Station, TX 77843-3112, USA
Seung-Jin Sul, Grant Brammer & Tiffani L. Williams

Authors

Seung-Jin Sul
View author publications
You can also search for this author in PubMed Google Scholar
Grant Brammer
View author publications
You can also search for this author in PubMed Google Scholar
Tiffani L. Williams
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Keith A. Crandall Jens Lagergren

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sul, SJ., Brammer, G., Williams, T.L. (2008). Efficiently Computing Arbitrarily-Sized Robinson-Foulds Distance Matrices. In: Crandall, K.A., Lagergren, J. (eds) Algorithms in Bioinformatics. WABI 2008. Lecture Notes in Computer Science(), vol 5251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87361-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-87361-7_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87360-0
Online ISBN: 978-3-540-87361-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics