A Compressed Format for Collections of Phylogenetic Trees and Improved Consensus Performance

  • Robert S. Boyer
  • Warren A. HuntJr
  • Serita M. Nelesen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3692)


Phylogenetic tree searching algorithms often produce thousands of trees which biologists save in Newick format in order to perform further analysis. Unfortunately, Newick is neither space efficient, nor conducive to post-tree analysis such as consensus. We propose a new format for storing phylogenetic trees that significantly reduces storage requirements while continuing to allow the trees to be used as input to post-tree analysis. We implemented mechanisms to read and write such data from and to files, and also implemented a consensus algorithm that is faster by an order of magnitude than standard phylogenetic analysis tools. We demonstrate our results on a collection of data files produced from both maximum parsimony tree searches and Bayesian methods.


Consensus Tree Storage Requirement Input Tree Consensus Algorithm Majority Consensus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Adams, E.N.: Consensus techniques and the comparison of taxonomic trees. Systematic Zoology 21, 390–397 (1972)CrossRefGoogle Scholar
  2. 2.
    Amenta, N., St. John, K., Clarke, F.: A linear-time majority tree algorithm. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 216–227. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  3. 3.
    Berger-Wolf, T.Y.: Online consensus and agreement of phylogenetic trees. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS, vol. 3240, pp. 350–361. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  4. 4.
    Bryant, D.: A classification of consensus methods for phylogenetics. In: Janowitz, M., Lapointe, F.J., McMorris, F., Mirkin, B., Roberts, F. (eds.) Bioconsensus. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. DIMACS-AMS (2001)Google Scholar
  5. 5.
    Day, W.H.E.: Optimal algorithms for comparing trees with labeled leaves. Journal of Classification 2(1), 7–28 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Felsenstein, J.: The newick tree format (1986),
  7. 7.
    Felsenstein, J.: Inferring Phylogenies. Sinauer Associates, Inc. (2004)Google Scholar
  8. 8.
    Goloboff, P.A., Farris, J.S., Nixon, K.C.: TNT (Tree analysis using new technology) (BETA) ver. 1.0. Published by the authors, Tucumán, Argentina (2000)Google Scholar
  9. 9.
    Goto, E., Soma, T., Inade, N., Ida, T., Idesawa, M., Hiraki, K., Suzuki, M., Shimizu, K., Philpov, B.: Design of a lisp machine - flats. In: LFP 1982: Proceedings of the 1982 ACM Symposium on LISP and functional programming, pp. 208–215 (1982)Google Scholar
  10. 10.
    Hillis, D.M., Moritz, C., Mable, B.K.: Molecular Sytematics, 2nd edn. Sinauer Associates, Inc., Sunderland (1996)Google Scholar
  11. 11.
    Huelsenbeck, J.P., Ronquist, F.: MRBAYES: Bayesian inference of phylogeny. Bioinformatics 17, 754–755 (2001)CrossRefGoogle Scholar
  12. 12.
    Kaufmann, M., Manolios, P., Moore, J.S.: Computer-Aided Reasoning: An Approach. Kluwer Academic Publishers, Dordrecht (2000)Google Scholar
  13. 13.
    Margush, T., McMorris, F.R.: Consensus n-trees. Bulletin of Mathematical Biology 43(2), 239–244 (1981)zbMATHMathSciNetGoogle Scholar
  14. 14.
    Nakhleh, L., Miranker, D., Barbancon, F., Piel, W.H., Donoghue, M.J.: Requirements of phylogenetic databases. In: Proceedings of the Third IEEE Symposium on Bioinformatics and Bioengineering (BIBE 2003), pp. 141–148. IEEE Press, Los Alamitos (2003)CrossRefGoogle Scholar
  15. 15.
    Seward, J.: bzip2 (2002),
  16. 16.
    Sokal, R.R., Rohlf, F.J.: Taxonomic Congruence in the Leptopodomorpha Re-Examined. Systematic Zoology 30(3), 309–325 (1981)CrossRefGoogle Scholar
  17. 17.
    Steele, G.L.: Common Lisp the Language, 2nd edn., ch. 22.1.4. Digital Press (1990)Google Scholar
  18. 18.
    Swofford, D.L.: PAUP*: Phylogenetic Analysis Using Parsimony (and Other Methods) 4.0 Beta. Sinauer Associates, Sunderland (2002)Google Scholar
  19. 19.
    Williams, T., Berger-Wolf, T., Moret, B., Roshan, U., Warnow, T.: The relationship between maximum parsimony score and phylogenetic tree topologies. Personal CommunicationGoogle Scholar
  20. 20.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23, 337–342 (1977)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Robert S. Boyer
    • 1
  • Warren A. HuntJr
    • 1
  • Serita M. Nelesen
    • 1
  1. 1.Department of Computer SciencesThe University of TexasAustinUSA

Personalised recommendations