Skip to main content

Reconstructing Optimal Phylogenetic Trees: A Challenge in Experimental Algorithmics

  • Chapter
  • First Online:
Book cover Experimental Algorithmics

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2547))

Abstract

The benefits of experimental algorithmics and algorithm engineering need to be extended to applications in the computational sciences. In this paper, we present on one such application: the reconstruction of evolutionary histories (phylogenies) from molecular data such as DNA sequences. Our presentation is not a survey of past and current work in the area, but rather a discussion of what we see as some of the important challenges in experimental algorithmics that arise from computational phylogenetics. As motivational examples or examples of possible approaches, we briefly discuss two specific uses of algorithm engineering and of experimental algorithmics from our recent research. The first such use focused on speed: we reimplemented Sanko. and Blanchette’s breakpoint analysis and obtained a 200, 000-fold speedup for serial code and 108-fold speedup on a 512-processor supercluster. We report here on the techniques used in obtaining such a speedup. The second use focused on experimentation: we conducted an extensive study of quartet-based reconstruction algorithms within a parameter-rich simulation space, using several hundred CPU-years of computation. We report here on the challenges involved in designing, conducting, and assessing such a study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L. Arge, J. Chase, J. S. Vitter, and R. Wickremesinghe. Efficient sorting using registers and caches. In Proceedings of the 4th Workshop on Algorithm Engineering (WAE’00). Springer Lecture Notes in Computer Science 1982, 2000.

    Google Scholar 

  2. D. A. Bader and B. M. E. Moret. GRAPPA runs in record time. HPC Wire, 9(47), 2000.

    Google Scholar 

  3. V. Berry, D. Bryant, T. Jiang, P. Kearney, M. Li, T. Wareham, and H. Zhang. A practical algorithm for recovering the best supported edges of an evolutionary tree. In Proceedings of the 11th ACM/SIAM Symposium on Discrete Algorithms (SODA’00), pages 287–296, 2000.

    Google Scholar 

  4. V. Berryand O. Gascuel. Inferring evolutionary trees with strong combinatorial evidence. Theoretical Computer Science, 240(2):271–298, 2000.

    Article  MathSciNet  Google Scholar 

  5. V. Berry, T. Jiang, P. Kearney, M. Li, and T. Wareham. Quartet cleaning: improved algorithms and simulations. In Proceedings of the 7th European Symposium on Algorithms (ESA’99). Springer Lecture Notes in Computer Science 1643, pages 313–324, 1999.

    Google Scholar 

  6. M. Blanchette, G. Bourque, and D. Sanko.. Breakpoint phylogenies. In S. Miyano and T. Takagi, editors, Genome Informatics 1997, pages 25–34. Univ. Academy Press, Tokyo, 1997.

    Google Scholar 

  7. A. Caprara. On the practical solution of the reversal median problem. In Proceedings of the 1st Workshop on Algorithms for Bioinformatics (WABI’01). Springer Lecture Notes in Computer Science 2149, pages 238–251, 2001.

    Google Scholar 

  8. J. I. Cohen. Epstein-barr virus infection. New England Journal of Medicine, 343(7):481–492, 2000.

    Article  Google Scholar 

  9. M. E. Cosner, R. K. Jansen, B. M. E. Moret, L. A. Raubeson, L.-S. Wang, T. Warnow, and S. K. Wyman. An empirical comparison of phylogenetic methods on chloroplast gene order data in Campanulaceae. In D. Sanko. and J. Nadeau, editors, Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment, and the Evolution of Gene Families, pages 99–121. Kluwer, 2000.

    Google Scholar 

  10. M. E. Cosner, R. K. Jansen, B. M. E. Moret, L. A. Raubeson, L. Wang, T. Warnow, and S. K. Wyman. A new fast heuristic for computing the breakpoint phylogeny and experimental phylogenetic analyses of real and synthetic data. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB’00), pages 104–115, 2000.

    Google Scholar 

  11. N. Eiron, M. Rodeh, and I. Stewarts. Matrix multiplication: a case study of enhanced data cache utilization. ACM Journal of Experimental Algorithmics, 4(3), 1999. Online at http://www.jea.acm.org/1999/EironMatrix/.

  12. P. Erdős, M. A. Steel, L. A. Székely, and T. Warnow. A few logs suffice to build (almost) all trees I. Random Structures and Algorithms, 14:153–184, 1997.

    Google Scholar 

  13. D. Huson, S. Nettles, K. Rice, T. Warnow, and S. Yooseph. Hybrid tree reconstruction methods. ACM Journal of Experimental Algorithmics, 4(5), 1999. Online at http://www.jea.acm.org/1999/HusonHybrid/.

  14. T. Jiang, P. E. Kearney, and M. Li. A polynomial-time approximation scheme for inferring evolutionary trees from quartet topologies and its application. SIAM Journal on Computing. To appear.

    Google Scholar 

  15. D. S. Johnson and L. A. McGeoch. The traveling salesman problem: a case study. In E. Aarts and J.K. Lenstra, editors, Local Search in Combinatorial Optimization, pages 215–310. John Wiley, 1997.

    Google Scholar 

  16. T. H. Jukes and C. Cantor. Mammalian Protein Metabolism. Academic Press, 1969.

    Google Scholar 

  17. P. J. Keeling, M. A. Luker, and J. D. Palmer. Evidence from beta-tubulin phylogeny that microsporidia evolved from within the Fungi. Molecular Biology and Evolution, 17:23–31, 2000.

    Google Scholar 

  18. R. Ladner, J. D. Fix, and A. LaMarca. The cache performance of traversals and random accesses. In Proceedings of the 10th ACM/SIAM Symposium on Discrete Algorithms (SODA’99), pages 613–622, 1999.

    Google Scholar 

  19. A. LaMarca and R. Ladner. The influence of caches on the performance of heaps. ACM Journal of Experimental Algorithmics, 1(4), 1996. Online at http://www.jea.acm.org/1996/LaMarcaInfluence/.

  20. A. LaMarca and R. Ladner. The influence of caches on the performance of sorting. In Proceedings of the 8th ACM/SIAM Symposium on Discrete Algorithms (SODA’97), pages 370–379, 1997.

    Google Scholar 

  21. C. C. McGeoch. Analyzing algorithms by simulation: variance reduction techniques and simulation speedups. ACM Computing Surveys, 24:195–212, 1992.

    Article  Google Scholar 

  22. B. Mishof, C. L. Anderson, and H. Hadrys. A phylogeny of the damselfly genus Calopteryx (Odonata) using mitochondrial 16s rDNA markers. Molecular Phylogeny Evolution, 15:5–14, 2000.

    Article  Google Scholar 

  23. B. M. E. Moret, D. A. Bader, and T. Warnow. High-performance algorithm engineering for computational phylogenetics. In Proceedings of the 2001 International Conference on Computational Science (ICCS’01). Springer Lecture Notes in Computer Science 2073–2074, 2001.

    Google Scholar 

  24. B. M. E. Moret, S. K. Wyman, D. A. Bader, T. Warnow, and M. Yan. A new implementation and detailed studyof breakpoint analysis. In Proceedings of the 6th Pacific Symposium Biocomputing (PSB’01). World Scientific, pages 583–594, 2001.

    Google Scholar 

  25. B. M. E. Moret and H. D. Shapiro. Algorithms and experiments: the new (and old) methodology. Journal on Universal Computer Science, 7(5):434–446, 2001.

    MATH  MathSciNet  Google Scholar 

  26. B. M. E. Moret, J. Tang, L.-S. Wang, and T. Warnow. Steps toward accurate reconstruction of phylogenies from gene-order data. Journal on Computer and System Sciences. To appear.

    Google Scholar 

  27. I. Pe'er and R. Shamir. The median problems for breakpoints are NPcomplete. Electronic Colloqium on Computational Complexity, 71, 1998.

    Google Scholar 

  28. A. Rambaut and N. C. Grassly. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Computational Applications in Biosciences, 13:235–238, 1997.

    Google Scholar 

  29. K. Rice, M. Donoghue, and R. Olmstead. Analyzing large datasets: rbcl500 revisited. System Biology, 46:554–562, 1997.

    Article  Google Scholar 

  30. F. Rodrigues-Trelles, L. Alarcon, and A. Fontdevila. Molecular evolution and phylogeny of the buzzatii complex (D. repleta group): a maximum likelihood approach. Molecular Biology Evolution, 17:1112–1122, 2000.

    Google Scholar 

  31. A. Rokas and P. W. H. Holland. Rare genomic changes as a tool for phylogenetics. Trends in Ecology and Evolution, 15:454–459, 2000.

    Article  Google Scholar 

  32. N. Saitou and M. Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology Evolotion, 4:406–425, 1987.

    Google Scholar 

  33. D. Sanko. and M. Blanchette. Multiple genome rearrangement and breakpoint phylogeny. Journal on Computational Biology, 5:555–570, 1998.

    Article  Google Scholar 

  34. A. C. Siepel and B. M. E. Moret. Finding an optimal inversion median: experimental results. In Proceedings of the 1st Workshop on Algorithms for Bioinformatics (WABI’01). Springer Lecture Notes in Computer Science 2149, pages 189–203, 2001.

    Google Scholar 

  35. K. St. John, T. Warnow, B. M. E. Moret, and L. Vawter. Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining. In Proceedings of the 12th Annual ACM/SIAM Symposium on Discrete Algorithms (SODA’01), pages 196–205, 2001.

    Google Scholar 

  36. K. Strimmer and A. von Haeseler. Quartet puzzling: a maximum likelihood method for reconstructing tree topologies. Molecular Biology Evolution, 13:964–969, 1996.

    Google Scholar 

  37. T. Warnow, B. M. E. Moret, and K. St. John. Absolute phylogeny: true trees from short sequences. In Proceedings of the 12th Annual ACM/SIAM Symposium on Discrete Algorithms (SODA’01), pages 186–195, 2001.

    Google Scholar 

  38. M. S. Waterman. Introduction to Computational Biology: Sequences, Maps and Genomes. Chapman Hall, 1995.

    Google Scholar 

  39. L. Xiao, X. Zhang, and S. A. Kubricht. Improving memory performance of sorting algorithms. ACM Journal of Experimental Algorithmics, 5(3), 2000. Online at http://www.jea.acm.org/2000/XiaoMemory/.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Moret, B.M.E., Warnow, T. (2002). Reconstructing Optimal Phylogenetic Trees: A Challenge in Experimental Algorithmics. In: Fleischer, R., Moret, B., Schmidt, E.M. (eds) Experimental Algorithmics. Lecture Notes in Computer Science, vol 2547. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36383-1_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-36383-1_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00346-5

  • Online ISBN: 978-3-540-36383-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics