Skip to main content

Divide-and-Conquer Tree Estimation: Opportunities and Challenges

  • Chapter
  • First Online:
Bioinformatics and Phylogenetics

Part of the book series: Computational Biology ((COBO,volume 29))

Abstract

Large-scale phylogeny estimation is challenging for many reasons, including heterogeneity across the Tree of Life and the difficulty in finding good solutions to NP-hard optimization problems. One of the promising ways for enabling large-scale phylogeny estimation is through divide-and-conquer: a dataset is divided into overlapping subsets, trees are estimated on the subsets, and then the subset trees are merged together into a tree on the full set of taxa. This last step is achieved through the use of a supertree method, which is popular in systematics for use in combining species trees from the scientific literature. Because most supertree methods are heuristics for NP-hard optimization problems, the use of supertree estimation on large datasets is challenging, both in terms of scalability and accuracy. In this chapter, we describe the current state of the art in supertree construction and the use of supertree methods in divide-and-conquer strategies, and we identify directions where future research could lead to improved supertree methods. Finally, we present a new type of divide-and-conquer strategy that bypasses the need for supertree estimation, in which the division into subsets produces disjoint subsets. Overall, this chapter aims to present directions for research that will potentially lead to new methods to scale phylogeny estimation methods to large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This approach has been used to construct species trees from gene trees under the multispecies coalescent; an example of such a method is the population tree in BUCKy [76], but see also [147] for others.

References

  1. Agarwala, R., Bafna, V., Farach, M., Paterson, M., Thorup, M.: On the approximability of numerical taxonomy (fitting distances by tree metrics). SIAM J. Comput. 28(3), 1073–1085 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  2. Ailon, N., Charikar, M.: Fitting tree metrics: hierarchical clustering and phylogeny. SIAM J. Comput. 40(5), 1275–1291 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  3. Akanni, W., Creevey, C., Wilkinson, M., Pisani, D.: L.U.-St: a tool for approximated maximum likelihood supertree reconstruction. BMC Bioinform. 15, 183 (2014)

    Google Scholar 

  4. Akanni, W., Wilkinson, M., Creevey, C., Foster, P., Pisani, D.: Implementing and testing Bayesian and maximum-likelihood supertree methods in phylogenetics. R. Soc. Open Sci. 2, 140,436 (2015)

    Google Scholar 

  5. Allman, E.S., Degnan, J.H., Rhodes, J.A.: Species tree inference from gene splits by unrooted star methods. IEEE/ACM Trans. Computat. Biol. Bioinform. (TCBB) 15(1), 337–342 (2018)

    Article  Google Scholar 

  6. Alon, N., Snir, S., Yuster, R.: On the compatibility of quartet trees. In: Proceedings of the Twenty-fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’14, pp. 535–545. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (2014). http://dl.acm.org/citation.cfm?id=2634074.2634114

  7. Altenhoff, A., Boeckmann, B., Capella-Gutierrez, S., Dalquen, D., et al.: Standardized benchmarking in the quest for orthologs. Nat. Methods 13, 425–430 (2016)

    Article  Google Scholar 

  8. Avni, E., Cohen, R., Snir, S.: Weighted quartets phylogenetics. Syst. Biol. 64(2), 233–242 (2014)

    Article  Google Scholar 

  9. Avni, E., Yona, Z., Cohen, R., Snir, S.: The performance of two supertree schemes compared using synthetic and real data quartet input. J. Mol. Evol. 86(2), 150–165 (2018). https://doi.org/10.1007/s00239-018-9833-0

  10. Baker, W.J., Savolainen, V., Asmussen-Lange, C.B., Chase, M.W., Dransfield, J., Forest, F., Harley, M.M., Uhl, N.W., Wilkinson, M.: Complete generic-level phylogenetic analyses of palms (arecaceae) with comparisons of supertree and supermatrix approaches. Syst. Biol. 58(2), 240–256 (2009)

    Article  Google Scholar 

  11. Bansal, M., Burleigh, J., Eulenstein, O., Fernández-Baca, D.: Robinson-Foulds supertrees. Algorithms Mol. Biol. 5, 18 (2010)

    Article  Google Scholar 

  12. Baum, B.: Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon 41, 3–10 (1992)

    Article  Google Scholar 

  13. Baum, B., Ragan, M.A.: The MRP method. In: Bininda-Emonds, O.R.P. (ed.) Phylogenetic Supertrees: Combining Information to Reveal The Tree Of Life, pp. 17–34. Kluwer Academic, Dordrecht, The Netherlands (2004)

    Chapter  Google Scholar 

  14. Bayzid, M., Mirarab, S., Warnow, T.: Inferring optimal species trees under gene duplication and loss. Pac. Symp. Biocomput. 18, 250–261 (2013)

    Google Scholar 

  15. Bayzid, M.S., Hunt, T., Warnow, T.: Disk covering methods improve phylogenomic analyses. BMC Genomics 15(Suppl 6), S7 (2014)

    Article  Google Scholar 

  16. Ben-Dor, A., Chor, B., Graur, D., Ophir, R., Pelleg, D.: Constructing phylogenies from quartets: elucidation of eutherian superordinal relationships. J. Comput. Biol. 5(3), 377–390 (1998)

    Article  Google Scholar 

  17. Berry, V., Bryant, D., Jiang, T., Kearney, P.E., Li, M., Wareham, T., Zhang, H.: A practical algorithm for recovering the best supported edges of an evolutionary tree. In: Proceedings of the SIAM-ACM Symposium on Discrete Algorithms (SODA), pp. 287–296 (2000)

    Google Scholar 

  18. Berry, V., Gascuel, O.: Inferring evolutionary trees with strong combinatorial evidence. Theoret. Comput. Sci. 240(2), 271–298 (2000). https://doi.org/10.1016/S0304-3975(99)00235-2. http://www.sciencedirect.com/science/article/pii/S0304397599002352

  19. Bininda-Emonds, O. (ed.): Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. Kluwer Academic Publishers, Dordrecht (2004)

    MATH  Google Scholar 

  20. Bininda-Emonds, O.R.P.: MRP supertree construction in the consensus setting. In: Bioconsensus. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 61, pp. 231–242. American Mathematical Society-DIMACS, Providence, Rhode Island (2003)

    Google Scholar 

  21. Bininda-Emonds, O.R.P., Gittleman, J.L., Purvis, A.: Building large trees by combining phylogenetic information: a complete phylogeny of the extant Carnivora (Mammalia). Biol. Rev. Camb. Philos. Soc. 74, 143–175 (1999)

    Article  Google Scholar 

  22. Bininda-Emonds, O.R.P., Gittleman, J.L., Steel, M.A.: The (super)tree of life: procedures, problems, and prospects. Annu. Rev. Ecol. Syst. 33, 265–289 (2002)

    Article  Google Scholar 

  23. Böcker, S., Bryant, D., Dress, A.W., Steel, M.A.: Algorithmic aspects of tree amalgamation. J. Algorithms 37(2), 522–537 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  24. Bordewich, M., Mihaescu, R.: Accuracy guarantees for phylogeny reconstruction algorithms based on balanced minimum evolution. In: Moulton, V.. Singh, M. (eds.) Proceedings of the 2010 Workshop on Algorithms for Bioinformatics, pp. 250–261. Springer, Berlin, Heidelberg (2010)

    Google Scholar 

  25. Brinkmeyer, M., Griebel, T., Böcker, S.: Polynomial supertree methods revisited. Adv. Bioinform. 2011 (2011)

    Google Scholar 

  26. Bryant, D., Steel, M.: Constructing optimal trees from quartets. J. Algorithms 38(1), 237–259 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  27. Bryant, D., Steel, M.: Computing the distribution of a tree metric. IEEE/ACM Trans. Comput. Biol. Bioinform. 6(3), 420–426 (2009)

    Article  Google Scholar 

  28. Buneman, P.: The recovery of trees from measures of dissimilarity. In: Hodson, F., Kendall, D., Tautu, P. (eds.) Mathematics in the Archaeological and Historical Sciences, pp. 387–395. Edinburgh University Press, Edinburgh, Scotland (1971)

    Google Scholar 

  29. Chaudhary, R.: MulRF: a software package for phylogenetic analysis using multi-copy gene trees. Bioinformatics 31, 432–433 (2015)

    Article  Google Scholar 

  30. Chaudhary, R., Bansal, M.S., Wehe, A., Fernández-Baca, D., Eulenstein, O.: iGTP: a software package for large-scale gene tree parsimony analysis. BMC Bioinform. 11, 574 (2010)

    Article  Google Scholar 

  31. Chaudhary, R., Burleigh, J.G., Fernández-Baca, D.: Fast local search for unrooted Robinson-Foulds supertrees. IEEE/ACM Trans. Comput. Biol. Bioinform. 9, 1004–1013 (2012)

    Article  Google Scholar 

  32. Chen, D., Diao, L., Eulenstein, O., Fernández-Baca, D., Sanderson, M.: Flipping: a supertree construction method. In: Bioconsensus. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science, vol. 61, pp. 135–160. American Mathematical Society-DIMACS, Providence, Rhode Island (2003)

    Google Scholar 

  33. Chen, D., Eulenstein, O., Fernández-Baca, D., Burleigh, J.: Improved heuristics for minimum-flip supertree construction. Evol. Bioinform. 2, 401–410 (2006)

    Article  Google Scholar 

  34. Chen, D., Eulenstein, O., Fernández-Baca, D., Sanderson, M.: Minimum-flip supertrees: complexity and algorithms. IEEE/ACM Trans. Comput. Biol. Bioinform. 3, 165–173 (2006)

    Article  Google Scholar 

  35. Chernomor, O., von Haeseler, A., Minh, B.Q.: Terrace aware data structure for phylogenomic inference from supermatrices. Syst. Biol. 65(6), 997–1008 (2016)

    Article  Google Scholar 

  36. Christensen, S., Molloy, E.K., Vachaspati, P., Warnow, T.: OCTAL: Optimal Completion of gene trees in polynomial time. Algorithms Mol. Biol. 13(1), 6 (2018). https://doi.org/10.1186/s13015-018-0124-5

  37. Cotton, J., Wilkinson, M.: Majority rule supertrees. Syst. Biol. 56(3), 445–452 (2007)

    Article  Google Scholar 

  38. Cotton, J., Wilkinson, M.: Supertrees join the mainstream of phylogenetics. Trends Ecol. Evol. 24, 1–3 (2009)

    Article  Google Scholar 

  39. Creevey, C., McInerney, J.: Trees from trees: construction of phylogenetic supertrees using CLANN. In: Bioinformatics for DNA Sequence Analysis, vol. 537, pp. 139–61. Springer, Clifton, NJ (2009)

    Google Scholar 

  40. Criscuolo, A., Berry, V., Douzery, E., Gascuel, O.: SDM: a fast distance-based approach for (super) tree building in phylogenomics. Syst. Biol. 55, 740–755 (2006)

    Article  Google Scholar 

  41. Criscuolo, A., Gascuel, O.: Fast NJ-like algorithms to deal with incomplete distance matrices. BMC Bioinform. 9(166) (2008)

    Google Scholar 

  42. Davies, T., Barraclough, T., Chase, M., Soltis, P., Soltis, D., Savolainen, V.: Darwin’s abominable mystery: insights from a supertree of the angiosperms. Proc. Natl. Acad. Sci. 101, 1904–1909 (2004)

    Article  Google Scholar 

  43. Desper, R., Gascuel, O.: Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting. Mol. Biol. Evol. 21(3), 587–598 (2004). http://dx.doi.org/10.1093/molbev/msh049

  44. Dobzhansky, T.: Nothing in biology makes sense except in the light of evolution. Am. Biol. Teacher 35, 125–129 (1973)

    Article  Google Scholar 

  45. Edwards, S.: Is a new and general theory of molecular systematics emerging? Evolution 63(1), 1–19 (2009)

    Article  Google Scholar 

  46. Erdös, P., Steel, M., Székely, L., Warnow, T.: A few logs suffice to build (almost) all trees (I). Random Struct. Algorithms 14, 153–184 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  47. Erdös, P., Steel, M., Székely, L., Warnow, T.: A few logs suffice to build (almost) all trees (II). Theoret. Comput. Sci. 221, 77–118 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  48. Fakcharoenphol, J., Rao, S., Talwar, K.: A tight bound on approximating arbitrary metrics by tree metrics. J. Comput. Syst. Sci. 69(3), 485–497 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  49. Fernández, M.H., Vrba, E.S.: A complete estimate of the phylogenetic relationships in ruminantia: a dated species-level supertree of the extant ruminants. Biol. Rev. 80(2), 269–302 (2005)

    Article  Google Scholar 

  50. Fleischauer, M., Böcker, S.: Collecting reliable clades using the greedy strict consensus merger. PeerJ 4, e2172 (2016)

    Article  Google Scholar 

  51. Fleischauer, M., Böcker, S.: Bad clade deletion supertrees: a fast and accurate supertree algorithm. Mol. Biol. Evol. 34(9), 2408–2421 (2017)

    Article  Google Scholar 

  52. Foulds, L.R., Graham, R.L.: The Steiner problem in phylogeny is NP-complete. Adv. Appl. Math. 3(43–49), 299 (1982)

    MathSciNet  MATH  Google Scholar 

  53. Gascuel, O.: BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14, 685–695 (1997)

    Article  Google Scholar 

  54. Goloboff, P., Farris, J., Nixon, K.: TNT, a free program for phylogenetic analysis. Cladistics 24, 1–13 (2008)

    Article  Google Scholar 

  55. Gramm, J., Niedermeier, R.: A fixed-parameter algorithm for minimum quartet inconsistency. J. Comput. Syst. Sci. 67(4), 723–741 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  56. Grappa (genome rearrangements analysis under parsimony and other phylogenetic algorithms). https://www.cs.unm.edu/~moret/GRAPPA/

  57. Grotkopp, E., Rejmánek, M., Sanderson, M.J., Rost, T.L.: Evolution of genome size in pines (pinus) and its life-history correlates: supertree analyses. Evolution 58(8), 1705–1729 (2004)

    Article  Google Scholar 

  58. Guindon, S., Gascuel, O.: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003) (1063-5157 (Print))

    Google Scholar 

  59. Hallett, M., Lagergren, J.: New algorithms for the duplication-loss model. In: Proceedings of the ACM Symposium on Computational Biology RECOMB2000, pp. 138–146. ACM Press, New York (2000)

    Google Scholar 

  60. Hillis, D.M., Huelsenbeck, J.P., Cunningham, C.W.: Application and accuracy of molecular phylogenies. Science 264, 671–677 (1994)

    Article  Google Scholar 

  61. Holland, B., Conner, G., Huber, K., Moulton, V.: Imputing supertrees and supernetworks from quartets. Syst. Biol. 56(1), 57–67 (2007). http://dx.doi.org/10.1080/10635150601167013

  62. Huson, D., Nettles, S., Warnow, T.: Disk-covering, a fast converging method for phylogenetic tree reconstruction. J. Comput. Biol. 6(3), 369–386 (1999)

    Article  Google Scholar 

  63. Huson, D., Vawter, L., Warnow, T.: Solving large scale phylogenetic problems using DCM2. In: Proceedings of 7th International Conference on Intelligent Systems for Molecular Biology (ISMB’99), pp. 118–129. AAAI Press (1999)

    Google Scholar 

  64. Huson, D.H., Vawter, L., Warnow, T.: Solving large scale phylogenetic problems using DCM2. In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology table of contents, pp. 118–129. AAAI Press (1999)

    Google Scholar 

  65. Janowitz, M., Lapointe, F.J., McMorris, F., Mirkin, B., Roberts, F. (eds.): Bioconsensus: DIMACS Working Group Meetings on Bioconsensus, 25–26 Oct 2000 and 2–5 Oct 2001, DIMACS Center 61. American Mathematical Society (2003)

    Google Scholar 

  66. Jarvis, E., Mirarab, S., Aberer, A.J., Li, B., Houde, P., Li, C., Ho, S., Faircloth, B.C., Nabholz, B., Howard, J.T., Suh, A., Weber, C.C., da Fonseca, R.R., Li, J., Zhang, F., Li, H., Zhou, L., Narula, N., Liu, L., Ganapathy, G., Boussau, B., Bayzid, M.S., Zavidovych, V., Subramanian, S., Gabaldón, T., Capella-Gutiérrez, S., Huerta-Cepas, J., Rekepalli, B., Munch, K., Schierup, M., Lindow, B., Warren, W.C., Ray, D., Green, R.E., Bruford, M.W., Zhan, X., Dixon, A., Li, S., Li, N., Huang, Y., Derryberry, E.P., Bertelsen, M.F., Sheldon, F.H., Brumfield, R.T., Mello, C.V., Lovell, P.V., Wirthlin, M., Schneider, M.P.C., Prosdocimi, F., Samaniego, J.A., Velazquez, A.M.V., Alfaro-Núnez, A., Campos, P.F., Petersen, B., Sicheritz-Ponten, T., Pas, A., Bailey, T., Scofield, P., Bunce, M., Lambert, D.M., Zhou, Q., Perelman, P., Driskell, A.C., Shapiro, B., Xiong, Z., Zeng, Y., Liu, S., Li, Z., Liu, B., Wu, K., Xiao, J., Yinqi, X., Zheng, Q., Zhang, Y., Yang, H., Wang, J., Smeds, L., Rheindt, F.E., Braun, M., Fjeldsa, J., Orlando, L., Barker, F.K., Jonsson, K.A., Johnson, W., Koepfli, K.P., O’Brien, S., Haussler, D., Ryder, O.A., Rahbek, C., Willerslev, E., Graves, G.R., Glenn, T.C., McCormack, J., Burt, D., Ellegren, H., Alstrom, P., Edwards, S.V., Stamatakis, A., Mindell, D.P., Cracraft, J., Braun, E.L., Warnow, T., Jun, W., Gilbert, M.T.P., Zhang, G.: Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346(6215), 1320–1331 (2014)

    Article  Google Scholar 

  67. Jewett, E., Rosenberg, N.A.: iGLASS: an improvement to the GLASS method for estimating species trees from gene trees. J. Comput. Biol. 19(3), 293–315 (2012)

    Article  MathSciNet  Google Scholar 

  68. Jiang, T., Kearney, P., Li, M.: A polynomial-time approximation scheme for inferring evolutionary trees from quartet topologies and its applications. SIAM J. Comput. 30(6), 1924–1961 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  69. Jones, K.E., Purvis, A., MacLarnon, A., Bininda-Emonds, O.R.P., Simmons, N.B.: A phylogenetic supertree of the bats (Mammalia: Chiroptera). Biol. Rev. Camb. Philos. Soc. 77, 223–259 (2002)

    Article  Google Scholar 

  70. Jonsson, K.A., Fjeldsa, J.: A phylogenetic supertree of oscine passerine birds (Aves: Passeri). Zoologica Scripta 35, 149–186 (2006)

    Article  Google Scholar 

  71. Kettleborough, G., Dicks, J., Roberts, I.N., Huber, K.T.: Reconstructing (super) trees from data sets with missing distances: not all is lost. Mol. Biol. Evol. 32(6), 1628–1642 (2015)

    Article  Google Scholar 

  72. Kupczok, A.: Split-based computation of majority rule supertrees. BMC Evol. Biol. 11, (2011)

    Google Scholar 

  73. Lacey, M., Chang, J.: A signal-to-noise analysis of phylogeny estimation by neighbor-joining: insufficiency of polynomial length sequences. Math. Biosci. 199(2), 188–215 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  74. Lafond, M., Scornavacca, C.: On the Weighted Quartet Consensus Problem (2016). arXiv:1610.00505

  75. Lapointe, F.J., Cucumel, G.: The average consensus procedure: combination of weighted trees containing identical or overlapping sets of taxa. Syst. Biol. 46(2), 306–312 (1997)

    Article  Google Scholar 

  76. Larget, B., Kotha, S., Dewey, C., Ané, C.: BUCKy: gene tree/species tree reconciliation with the Bayesian concordance analysis. Bioinformatics 26(22), 2910–2911 (2010)

    Article  Google Scholar 

  77. Lechner, M., Hernandez-Rosales, M., Doerr, D., Wieseke, N., Thévenin, A., Stoye, J., Hartmann, R., Prohaska, S., Stadler, P.: Orthology detection combining clustering and synteny for very large datasets. PLoS ONE 9(8), e105,015 (2014). https://doi.org/10.1371/journal.pone.0105015

  78. Lefort, V., Desper, R., Gascuel, O.: FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol. Biol. Evol. 32(10), 2798–2800 (2015). https://doi.org/10.1093/molbev/msv150

  79. Liu, K., Linder, C., Warnow, T.: RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation. PLoS ONE 6(11), e27,731 (2012)

    Google Scholar 

  80. Liu, K., Raghavan, S., Nelesen, S., Linder, C.R., Warnow, T.: Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324(5934), 1561–1564 (2009)

    Article  Google Scholar 

  81. Liu, L., Yu, L.: Estimating species trees from unrooted gene trees. Syst. Biol. 60(5), 661–667 (2011)

    Article  Google Scholar 

  82. Liu, L., Yu, L., Edwards, S.: A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10, 302 (2010)

    Article  Google Scholar 

  83. Lopez, P., Casane, D., Philippe, H.: Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19, 1–7 (2002)

    Article  Google Scholar 

  84. Maddison, W.P.: Gene trees in species trees. Syst. Biol. 46, 523–536 (1997)

    Article  Google Scholar 

  85. Martins, L., Mallo, D., Posada, D.: A Bayesian supertree model for genome-wide species tree reconstruction. Syst. Biol. 65, 397–416 (2016)

    Article  Google Scholar 

  86. McMorris, F.: Axioms for consensus functions on undirected phylogenetic trees. Math. Biosci. 74, 17–21 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  87. Mihaescu, R., Levy, D., Pachter, L.: Why neighbor-joining works. Algorithmica 54(1), 1–24 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  88. Mirarab, S., Reaz, R., Bayzid, M.S., Zimmermann, T., Swenson, M., Warnow, T.: ASTRAL: Accurate Species TRee ALgorithm. Bioinformatics 30(17), i541–i548 (2014)

    Article  Google Scholar 

  89. Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)

    Article  Google Scholar 

  90. Molloy, E.K., Warnow, T.: NJMerge: a generic technique for scaling phylogeny estimation methods and its application to species trees. In: Blanchette, M., Ouangraoua, A. (eds.) Comparative Genomics, pp. 260–276. Springer International Publishing, Cham (2018)

    Google Scholar 

  91. Moret, B.M.E., Wang, L.S., Warnow, T.: New software for computational phylogenetics. IEEE Comput.: Spec. Issue Bioinform. 35(7), 55–64 (2002)

    Google Scholar 

  92. Mossel, E., Roch, S.: Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE Trans. Comput. Biol. Bioinform. 7(1), 166–171 (2011)

    Article  Google Scholar 

  93. Nelesen, S., Liu, K., Wang, L.S., Linder, C.R., Warnow, T.: DACTAL: divide-and-conquer trees (almost) without alignments. Bioinformatics 28, i274–i282 (2012)

    Article  Google Scholar 

  94. Neves, D., Sobral, J.: Parallel SuperFine—a tool for fast and accurate supertree estimation: features and limitations. Future Gener. Comput. Syst. 67, 441–454 (2017)

    Article  Google Scholar 

  95. Neves, D., Warnow, T., Sobral, J., Pingali, K.: Parallelizing SuperFine. In: 27th Symposium on Applied Computing (ACM-SAC), Bioinformatics, pp. 1361–1367. ACM (2012). https://doi.org/10.1145/2231936.2231992

  96. Neves, D.T., Sobral, J.L.: Parallel SuperFine—a tool for fast and accurate supertree estimation: Features and limitations. Future Gener. Comput. Syst. 67, 441–454 (2017). https://doi.org/10.1016/j.future.2016.04.004. http://www.sciencedirect.com/science/article/pii/S0167739X16300814

  97. Nguyen, L.T., Schmidt, H., von Haeseler, A., Minh, B.: IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32(1), 268–274 (2015). https://doi.org/10.1093/molbev/msu300

    Article  Google Scholar 

  98. Nguyen, N., Mirarab, S., Kumar, K., Warnow, T.: Ultra-large alignments using phylogeny aware profiles. Genome Biol. 16(124) (2015). https://doi.org/10.1186/s13059-015-0688-z. A preliminary version appeared in the Proceedings RECOMB 2015

  99. Nguyen, N., Mirarab, S., Warnow, T.: MRL and SuperFine+MRL: new supertree methods. J. Algorithms Mol. Biol. 7(3) (2012)

    Google Scholar 

  100. Nute, M., Warnow, T.: Scaling statistical multiple sequence alignment to large datasets. BMC Genomics 17(10), 764 (2016). https://doi.org/10.1186/s12864-016-3101-8

  101. de Oliveira Martins, L., Posada, D.: Species tree estimation from genome-wide data with Guenomu. In: Bioinformatics, pp. 461–478. Springer (2017)

    Google Scholar 

  102. Pardi, F., Guillemot, S., Gascuel, O.: Combinatorics of distance-based tree inference. Proc. Natl. Acad. Sci. (USA) 109(41), 16443–16448 (2012)

    Article  Google Scholar 

  103. Piaggio-Talice, R., Burleigh, J.G., Eulenstein, O.: Quartet supertrees. In: Bininda-Emonds, O.R.P. (ed.) Phylogenetic Supertrees: Combining Information to Reveal The Tree of Life, pp. 173–191. Kluwer Academic, Dordrecht, The Netherlands (2004)

    Chapter  Google Scholar 

  104. Pisani, D.: A genus-level supertree of the Dinosauria. Proc. R. Soc. Lond. B: Biol. Sci. 269, 915–921 (2002)

    Article  Google Scholar 

  105. Pisani, D., Cotton, J.A., McInerney, J.O.: Supertrees disentangle the chimeric origin of eukaryotic genomes. Mol. Biol. Evol. (2007)

    Google Scholar 

  106. Popescu, A.A., Huber, K.T., Paradis, E.: ape 3.0: New tools for distance-based phylogenetics and evolutionary analysis in R. Bioinformatics 28(11), 1536–1537 (2012)

    Google Scholar 

  107. Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5(3), e9490 (2010)

    Article  Google Scholar 

  108. Ragan, M.A.: Phylogenetic inference based on matrix representation of trees. Mol. Phylogenet. Evol. 1, 53–58 (1992)

    Article  Google Scholar 

  109. Ranwez, V., Berry, V., Criscuolo, A., Fabre, P.H., Guillemot, S., Scornavacca, C., Douzery, E.J.: PhySIC: a veto supertree method with desirable properties. Syst. Biol. 56(5), 798–817 (2007)

    Article  Google Scholar 

  110. Ranwez, V., Criscuolo, A., Douzery, E.J.: SuperTriplets: a triplet-based supertree approach to phylogenomics. Bioinformatics 26(12), i115–i123 (2010)

    Article  Google Scholar 

  111. Ranwez, V., Gascuel, O.: Quartet-based phylogenetic inference: improvements and limits. Mol. Biol. Evol. 18(6), 1103–1116 (2001)

    Article  MATH  Google Scholar 

  112. Reaz, R., Bayzid, M., Rahman, M.: Accurate phylogenetic tree reconstruction from quartets: a heuristic approach. PLoS ONE (2014). https://doi.org/10.1371/journal.pone.0104008

    Article  Google Scholar 

  113. Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  114. Roch, S., Steel, M.: Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theoret. Popul. Biol. 100, 56–62 (2015)

    Article  MATH  Google Scholar 

  115. Rodrigo, A.G.: A comment on Baum’s method for combining phylogenetic trees. Taxon 42(3), 631–636 (1993)

    Article  Google Scholar 

  116. Roshan, U., Moret, B.M., Williams, T.L., Warnow, T.: REC-I-DCM3: a fast algorithmic technique for reconstructing large phylogenetic trees. In: Proceedings of 3rd IEEE Computational Systems Bioinformatics Conference CSB ’04, LCBB-CONF-2004-002, pp. 98–109. IEEE Press (2004)

    Google Scholar 

  117. Roshan, U., Moret, B.M.E., Williams, T.L., Warnow, T.: Performance of supertree methods on various dataset decompositions. In: Bininda-Emonds, O.R.P. (ed.) Phylogenetic Supertrees: Combining Information to Reveal The Tree Of Life, pp. 301–328. Kluwer Academic, Dordrecht, The Netherlands (2004)

    Chapter  Google Scholar 

  118. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)

    Google Scholar 

  119. Salamin, N., Davies, J.T.: Using supertrees to investigate species richness in grasses and flowering plants. In: Bininda-Emonds, O.R.P. (ed.) Phylogenetic Supertrees: Combining Information to Reveal The Tree Of Life, pp. 461–487. Kluwer Academic, Dordrecht, The Netherlands (2004)

    Chapter  Google Scholar 

  120. Sanderson, M., McMahon, M., Steel, M.: Phylogenomics with incomplete taxon coverage: the limits to inference. BMC Evol. Biol. 10, 155 (2010)

    Article  Google Scholar 

  121. Sanderson, M.J., McMahon, M.M., Stamatakis, A., Zwickl, D.J., Steel, M.: Impacts of terraces on phylogenetic inference. Syst. Biol. 64(5), 709–726 (2015)

    Article  Google Scholar 

  122. Sanderson, M.J., McMahon, M.M., Steel, M.: Terraces in phylogenetic tree space. Science 333(6041), 448–450 (2011)

    Article  Google Scholar 

  123. Semple, C., Steel, M.: A supertree method for rooted trees. Discrete Appl. Math. 105(1–3), 147–158 (2000). https://doi.org/10.1016/S0166-218X(00)00202-X. http://www.sciencedirect.com/science/article/pii/S0166218X0000202X

  124. Sevillya, G., Frenkel, Z., Snir, S.: Triplet MaxCut: a new toolkit for rooted supertree. Methods Ecol. Evol. 7, 1359–1365 (2016). https://doi.org/10.1111/2041-210X.12606

    Article  Google Scholar 

  125. Shigezumi, T.: Robustness of greedy type minimum evolution algorithms. In: Proceedings of International Conference on Computational Science, pp. 815–821. Springer (2006)

    Google Scholar 

  126. Sjölander, K., Datta, R., Shen, Y., Shoffner, G.: Ortholog identification in the presence of domain architecture rearrangement. Brief. Bioinform. 12(5), 413–422 (2011). https://doi.org/10.1093/bib/bbr036. http://bib.oxfordjournals.org/content/12/5/413.abstract

  127. Snir, S., Rao, S.: Using max cut to enhance rooted trees consistency. IEEE/ACM Trans. Comput. Biol. Bioinform. 323–333 (2006)

    Google Scholar 

  128. Snir, S., Rao, S.: Quartets MaxCut: a divide and conquer quartets algorithm. IEEE/ACM Trans. Comput. Biol. Bioinform. 7(4), 704–718 (2010)

    Article  Google Scholar 

  129. Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006)

    Article  Google Scholar 

  130. Steel, M.: The complexity of reconstructing trees from qualitative characters and subtrees. J. Classif. 9, 91–116 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  131. Steel, M., Gascuel, O.: Neighbor-joining revealed. Mol. Biol. Evol. 23(11), 1997–2000 (2006)

    Article  Google Scholar 

  132. Steel, M., Rodrigo, A.: Maximum likelihood supertrees. Syst. Biol. 57(2), 243–250 (2008)

    Article  Google Scholar 

  133. Strimmer, K., von Haeseler, A.: Quartet puzzling: a quartet maximim-likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13(7), 964–969 (1996)

    Article  Google Scholar 

  134. Swenson, M., Suri, R., Linder, C., Warnow, T.: An experimental study of Quartets MaxCut and other supertree methods. Algorithms Mol. Biol. 6, 7 (2011). PMID: 21504600

    Article  Google Scholar 

  135. Swenson, M., Suri, R., Linder, C., Warnow, T.: SuperFine: fast and accurate supertree estimation. Syst. Biol. 61(2), 214–227 (2012)

    Article  Google Scholar 

  136. Swofford, dD.: PAUP*: Phylogenetic Analysis Using Parsimony (*d and Other Methods) Ver. 4. Sinauer Associated, Sunderland, Massachusetts (2002)

    Google Scholar 

  137. Szöllősi, G., Rosikiewicz, W., Boussau, B., Tannier, E., Daubin, V.: Efficient exploration of the space of reconciled gene trees. Syst. Biol. (2013). https://doi.org/10.1093/sysbio/syt054. http://sysbio.oxfordjournals.org/content/early/2013/08/06/sysbio.syt054.abstract

  138. Szöllősi, G.J., Boussau, B., Abby, S.S., Tannier, E., Daubin, V.: Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations. Proc. Natl. Acad. Sci. 109(43), 17513–17518 (2012). https://doi.org/10.1073/pnas.1202997109

    Article  Google Scholar 

  139. Tang, J., Moret, B.: Scaling up accurate phylogenetic reconstruction from gene-order data. Bioinformatics 19 (Suppl. 1), i305–i312 (2003). Proceedings of 11th International Conference on Intelligent Systems for Molecular Biology ISMB’03

    Google Scholar 

  140. Than, C., Nakhleh, L.: Species tree inference by minimizing deep coalescences. PLoS Comput. Biol. 5, 31000,501 (2009)

    Google Scholar 

  141. Thorley, J., Wilkinson, M.: A view of supertree methods. DIMACS Ser. Discrete Math. Theoret. Comput. Sci. 61, 185–194 (2003)

    Google Scholar 

  142. Vachaspati, P., Warnow, T.: ASTRID: accurate species TRees from internode distances. BMC Genomics 16(Suppl 10), S3 (2015)

    Article  Google Scholar 

  143. Vachaspati, P., Warnow, T.: FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization. Bioinformatics (2016). https://doi.org/10.1093/bioinformatics/btw600

    Article  Google Scholar 

  144. Vachaspati, P., Warnow, T.: SIESTA: Enhancing searches for optimal supertrees and species trees. BMC Genomics (2018) (to appear)

    Google Scholar 

  145. Vachaspati, P., Warnow, T.: SVDquest: Improving SVDquartets species tree estimation using exact optimization within a constrained search space. Mol. Phylogenet. Evol. 124, 122–136 (2018). https://doi.org/10.1016/j.ympev.2018.03.006. http://www.sciencedirect.com/science/article/pii/S105579031730338X

  146. Wang, L.S., Leebens-Mack, J., Wall, P.K., Beckmann, K., DePamphilis, C.W., Warnow, T.: The impact of multiple protein sequence alignment on phylogenetic estimation. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 1108–1119 (2011)

    Article  Google Scholar 

  147. Warnow, T.: Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation. Cambridge University Press, Cambridge UK (2018)

    MATH  Google Scholar 

  148. Warnow, T., Moret, B.M.E., St. John, K.: Absolute convergence: true trees from short sequences. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA 01), pp. 186–195. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2001)

    Google Scholar 

  149. Waterman, M., Smith, T., Beyer, W.: Some biological sequence metrics. Adv. Math. 20, 367–387 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  150. Waterman, M., Smith, T., Singh, M., Beyer, W.: Additive evolutionary trees. J. Theoret. Biol. 64, 199–213 (1977)

    Article  MathSciNet  Google Scholar 

  151. Wehe, A., Bansal, M., Burleigh, J., Eulenstein, O.: DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24(13), 1540–1541 (2008). https://doi.org/10.1093/bioinformatics/btn230. http://bioinformatics.oxfordjournals.org/content/24/13/1540.abstract

  152. Wheeler, T.: Large-scale neighbor-joining with NINJA. In: Proceedings of Workshop Algorithms in Bioinformatics (WABI), vol. 5724, pp. 375–389 (2009)

    Google Scholar 

  153. Wickett, N., Mirarab, S., Nguyen, N., Warnow, T., Carpenter, E., Matasci, N., Ayyampalayam, S., Barker, M., Burleigh, J., Gitzendanner, M., Ruhfel, B.R., Wafula, E., Der, J.P., Graham, S.W., Mathews, S., Melkonian, M., Soltis, D.E., Soltis, P.S., Miles, N.W., Rothfels, C.J., Pokorny, L., Shaw, A.J., DeGironimo, L., Stevenson, D.W., Surek, B., Villarreal, J.C., Roure, B., Philippe, H., dePamphilis, C.W., Chen, T., Deyholos, M.K., Baucom, R.S., Kutchan, T.M., Augustin, M.M., Wang, J., Zhang, Y., Tian, Z., Yan, Z., Wu, X., Sun, X., Wong, G.K.S., Leebens-Mack, J.: Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl. Acad. Sci. 111(45), E4859–E4868 (2014)

    Article  Google Scholar 

  154. Wilkinson, M., Cotton, J.A., Lapointe, F.J., Pisani, D.: Properties of supertree methods in the consensus setting. Syst. Biol. 56(2), 330–337 (2007). https://doi.org/10.1080/10635150701245370

  155. Willson, S.: Constructing rooted supertrees using distances. Bull. Math. Biol. 66(6), 1755–1783 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  156. Xin, L., Ma, B., Zhang, K.: A new quartet approach for reconstructing phylogenetic trees: quartet joining method. In: Proceedings. Computing and Combinatorics (COCOON) 2007, Lecture Notes in Computer Science, vol. 4598, pp. 40–50. Springer, Berlin, Heidelberg (2007)

    Google Scholar 

  157. Yu, Y., Warnow, T., Nakhleh, L.: Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles. J. Comput. Biol. 18, 1543–1559 (2011). https://doi.org/10.1089/cmb.2011.0174

    Article  MathSciNet  Google Scholar 

  158. Zhang, C., Sayyari, E., Mirarab, S.: ASTRAL-III: increased scalability and impacts of contracting low support branches. In: Meidanis, J., Nakhleh, L. (eds.) Comparative Genomics, pp. 53–75. Springer International Publishing, Cham (2017)

    Chapter  Google Scholar 

  159. Zhang, Q., Rao, S., Warnow, T.: New absolute fast converging phylogeny estimation methods with improved scalability and accuracy. In: Parida, L., Ukkonen, E. (eds.) 18th International Workshop on Algorithms in Bioinformatics (WABI 2018), pp. 8:1–8:12. LIPICS, Dagsttuhl (2018)

    Google Scholar 

Download references

Acknowledgements

The author wishes to thank Pranjal Vachaspati for careful and thoughtful comments on the manuscript. We also thank the anonymous reviewers whose comments were helpful in improving the manuscript. This paper was supported in part by NSF grant CCF-1535977, but much of the work described in this book chapter was done while the author was part of the CIPRES (www.phylo.org) project, an NSF-funded multi-institutional grant that was initially led by Bernard Moret and then subsequently by the author. The first divide-and-conquer methods (DCM-NJ, DACTAL, etc.) were developed with CIPRES support, as were the supertree methods SuperFine and the Strict Consensus Merger that enabled those divide-and-conquer methods to have good performance.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tandy Warnow .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Warnow, T. (2019). Divide-and-Conquer Tree Estimation: Opportunities and Challenges. In: Warnow, T. (eds) Bioinformatics and Phylogenetics. Computational Biology, vol 29. Springer, Cham. https://doi.org/10.1007/978-3-030-10837-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-10837-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-10836-6

  • Online ISBN: 978-3-030-10837-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics