Advertisement

Divide-and-Conquer Tree Estimation: Opportunities and Challenges

  • Tandy WarnowEmail author
Chapter
Part of the Computational Biology book series (COBO, volume 29)

Abstract

Large-scale phylogeny estimation is challenging for many reasons, including heterogeneity across the Tree of Life and the difficulty in finding good solutions to NP-hard optimization problems. One of the promising ways for enabling large-scale phylogeny estimation is through divide-and-conquer: a dataset is divided into overlapping subsets, trees are estimated on the subsets, and then the subset trees are merged together into a tree on the full set of taxa. This last step is achieved through the use of a supertree method, which is popular in systematics for use in combining species trees from the scientific literature. Because most supertree methods are heuristics for NP-hard optimization problems, the use of supertree estimation on large datasets is challenging, both in terms of scalability and accuracy. In this chapter, we describe the current state of the art in supertree construction and the use of supertree methods in divide-and-conquer strategies, and we identify directions where future research could lead to improved supertree methods. Finally, we present a new type of divide-and-conquer strategy that bypasses the need for supertree estimation, in which the division into subsets produces disjoint subsets. Overall, this chapter aims to present directions for research that will potentially lead to new methods to scale phylogeny estimation methods to large datasets.

Keywords

Supertrees Phylogenetics Species trees Divide-and-conquer Incomplete lineage sorting Tree of Life 

Notes

Acknowledgements

The author wishes to thank Pranjal Vachaspati for careful and thoughtful comments on the manuscript. We also thank the anonymous reviewers whose comments were helpful in improving the manuscript. This paper was supported in part by NSF grant CCF-1535977, but much of the work described in this book chapter was done while the author was part of the CIPRES (www.phylo.org) project, an NSF-funded multi-institutional grant that was initially led by Bernard Moret and then subsequently by the author. The first divide-and-conquer methods (DCM-NJ, DACTAL, etc.) were developed with CIPRES support, as were the supertree methods SuperFine and the Strict Consensus Merger that enabled those divide-and-conquer methods to have good performance.

References

  1. 1.
    Agarwala, R., Bafna, V., Farach, M., Paterson, M., Thorup, M.: On the approximability of numerical taxonomy (fitting distances by tree metrics). SIAM J. Comput. 28(3), 1073–1085 (1998)CrossRefMathSciNetzbMATHGoogle Scholar
  2. 2.
    Ailon, N., Charikar, M.: Fitting tree metrics: hierarchical clustering and phylogeny. SIAM J. Comput. 40(5), 1275–1291 (2011)CrossRefMathSciNetzbMATHGoogle Scholar
  3. 3.
    Akanni, W., Creevey, C., Wilkinson, M., Pisani, D.: L.U.-St: a tool for approximated maximum likelihood supertree reconstruction. BMC Bioinform. 15, 183 (2014)Google Scholar
  4. 4.
    Akanni, W., Wilkinson, M., Creevey, C., Foster, P., Pisani, D.: Implementing and testing Bayesian and maximum-likelihood supertree methods in phylogenetics. R. Soc. Open Sci. 2, 140,436 (2015)Google Scholar
  5. 5.
    Allman, E.S., Degnan, J.H., Rhodes, J.A.: Species tree inference from gene splits by unrooted star methods. IEEE/ACM Trans. Computat. Biol. Bioinform. (TCBB) 15(1), 337–342 (2018)CrossRefGoogle Scholar
  6. 6.
    Alon, N., Snir, S., Yuster, R.: On the compatibility of quartet trees. In: Proceedings of the Twenty-fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’14, pp. 535–545. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (2014). http://dl.acm.org/citation.cfm?id=2634074.2634114
  7. 7.
    Altenhoff, A., Boeckmann, B., Capella-Gutierrez, S., Dalquen, D., et al.: Standardized benchmarking in the quest for orthologs. Nat. Methods 13, 425–430 (2016)CrossRefGoogle Scholar
  8. 8.
    Avni, E., Cohen, R., Snir, S.: Weighted quartets phylogenetics. Syst. Biol. 64(2), 233–242 (2014)CrossRefGoogle Scholar
  9. 9.
    Avni, E., Yona, Z., Cohen, R., Snir, S.: The performance of two supertree schemes compared using synthetic and real data quartet input. J. Mol. Evol. 86(2), 150–165 (2018).  https://doi.org/10.1007/s00239-018-9833-0
  10. 10.
    Baker, W.J., Savolainen, V., Asmussen-Lange, C.B., Chase, M.W., Dransfield, J., Forest, F., Harley, M.M., Uhl, N.W., Wilkinson, M.: Complete generic-level phylogenetic analyses of palms (arecaceae) with comparisons of supertree and supermatrix approaches. Syst. Biol. 58(2), 240–256 (2009)CrossRefGoogle Scholar
  11. 11.
    Bansal, M., Burleigh, J., Eulenstein, O., Fernández-Baca, D.: Robinson-Foulds supertrees. Algorithms Mol. Biol. 5, 18 (2010)CrossRefGoogle Scholar
  12. 12.
    Baum, B.: Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon 41, 3–10 (1992)CrossRefGoogle Scholar
  13. 13.
    Baum, B., Ragan, M.A.: The MRP method. In: Bininda-Emonds, O.R.P. (ed.) Phylogenetic Supertrees: Combining Information to Reveal The Tree Of Life, pp. 17–34. Kluwer Academic, Dordrecht, The Netherlands (2004)CrossRefGoogle Scholar
  14. 14.
    Bayzid, M., Mirarab, S., Warnow, T.: Inferring optimal species trees under gene duplication and loss. Pac. Symp. Biocomput. 18, 250–261 (2013)Google Scholar
  15. 15.
    Bayzid, M.S., Hunt, T., Warnow, T.: Disk covering methods improve phylogenomic analyses. BMC Genomics 15(Suppl 6), S7 (2014)CrossRefGoogle Scholar
  16. 16.
    Ben-Dor, A., Chor, B., Graur, D., Ophir, R., Pelleg, D.: Constructing phylogenies from quartets: elucidation of eutherian superordinal relationships. J. Comput. Biol. 5(3), 377–390 (1998)CrossRefGoogle Scholar
  17. 17.
    Berry, V., Bryant, D., Jiang, T., Kearney, P.E., Li, M., Wareham, T., Zhang, H.: A practical algorithm for recovering the best supported edges of an evolutionary tree. In: Proceedings of the SIAM-ACM Symposium on Discrete Algorithms (SODA), pp. 287–296 (2000)Google Scholar
  18. 18.
    Berry, V., Gascuel, O.: Inferring evolutionary trees with strong combinatorial evidence. Theoret. Comput. Sci. 240(2), 271–298 (2000).  https://doi.org/10.1016/S0304-3975(99)00235-2. http://www.sciencedirect.com/science/article/pii/S0304397599002352
  19. 19.
    Bininda-Emonds, O. (ed.): Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. Kluwer Academic Publishers, Dordrecht (2004)zbMATHGoogle Scholar
  20. 20.
    Bininda-Emonds, O.R.P.: MRP supertree construction in the consensus setting. In: Bioconsensus. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 61, pp. 231–242. American Mathematical Society-DIMACS, Providence, Rhode Island (2003)Google Scholar
  21. 21.
    Bininda-Emonds, O.R.P., Gittleman, J.L., Purvis, A.: Building large trees by combining phylogenetic information: a complete phylogeny of the extant Carnivora (Mammalia). Biol. Rev. Camb. Philos. Soc. 74, 143–175 (1999)CrossRefGoogle Scholar
  22. 22.
    Bininda-Emonds, O.R.P., Gittleman, J.L., Steel, M.A.: The (super)tree of life: procedures, problems, and prospects. Annu. Rev. Ecol. Syst. 33, 265–289 (2002)CrossRefGoogle Scholar
  23. 23.
    Böcker, S., Bryant, D., Dress, A.W., Steel, M.A.: Algorithmic aspects of tree amalgamation. J. Algorithms 37(2), 522–537 (2000)CrossRefMathSciNetzbMATHGoogle Scholar
  24. 24.
    Bordewich, M., Mihaescu, R.: Accuracy guarantees for phylogeny reconstruction algorithms based on balanced minimum evolution. In: Moulton, V.. Singh, M. (eds.) Proceedings of the 2010 Workshop on Algorithms for Bioinformatics, pp. 250–261. Springer, Berlin, Heidelberg (2010)Google Scholar
  25. 25.
    Brinkmeyer, M., Griebel, T., Böcker, S.: Polynomial supertree methods revisited. Adv. Bioinform. 2011 (2011)Google Scholar
  26. 26.
    Bryant, D., Steel, M.: Constructing optimal trees from quartets. J. Algorithms 38(1), 237–259 (2001)CrossRefMathSciNetzbMATHGoogle Scholar
  27. 27.
    Bryant, D., Steel, M.: Computing the distribution of a tree metric. IEEE/ACM Trans. Comput. Biol. Bioinform. 6(3), 420–426 (2009)CrossRefGoogle Scholar
  28. 28.
    Buneman, P.: The recovery of trees from measures of dissimilarity. In: Hodson, F., Kendall, D., Tautu, P. (eds.) Mathematics in the Archaeological and Historical Sciences, pp. 387–395. Edinburgh University Press, Edinburgh, Scotland (1971)Google Scholar
  29. 29.
    Chaudhary, R.: MulRF: a software package for phylogenetic analysis using multi-copy gene trees. Bioinformatics 31, 432–433 (2015)CrossRefGoogle Scholar
  30. 30.
    Chaudhary, R., Bansal, M.S., Wehe, A., Fernández-Baca, D., Eulenstein, O.: iGTP: a software package for large-scale gene tree parsimony analysis. BMC Bioinform. 11, 574 (2010)CrossRefGoogle Scholar
  31. 31.
    Chaudhary, R., Burleigh, J.G., Fernández-Baca, D.: Fast local search for unrooted Robinson-Foulds supertrees. IEEE/ACM Trans. Comput. Biol. Bioinform. 9, 1004–1013 (2012)CrossRefGoogle Scholar
  32. 32.
    Chen, D., Diao, L., Eulenstein, O., Fernández-Baca, D., Sanderson, M.: Flipping: a supertree construction method. In: Bioconsensus. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science, vol. 61, pp. 135–160. American Mathematical Society-DIMACS, Providence, Rhode Island (2003)Google Scholar
  33. 33.
    Chen, D., Eulenstein, O., Fernández-Baca, D., Burleigh, J.: Improved heuristics for minimum-flip supertree construction. Evol. Bioinform. 2, 401–410 (2006)CrossRefGoogle Scholar
  34. 34.
    Chen, D., Eulenstein, O., Fernández-Baca, D., Sanderson, M.: Minimum-flip supertrees: complexity and algorithms. IEEE/ACM Trans. Comput. Biol. Bioinform. 3, 165–173 (2006)CrossRefGoogle Scholar
  35. 35.
    Chernomor, O., von Haeseler, A., Minh, B.Q.: Terrace aware data structure for phylogenomic inference from supermatrices. Syst. Biol. 65(6), 997–1008 (2016)CrossRefGoogle Scholar
  36. 36.
    Christensen, S., Molloy, E.K., Vachaspati, P., Warnow, T.: OCTAL: Optimal Completion of gene trees in polynomial time. Algorithms Mol. Biol. 13(1), 6 (2018).  https://doi.org/10.1186/s13015-018-0124-5
  37. 37.
    Cotton, J., Wilkinson, M.: Majority rule supertrees. Syst. Biol. 56(3), 445–452 (2007)CrossRefGoogle Scholar
  38. 38.
    Cotton, J., Wilkinson, M.: Supertrees join the mainstream of phylogenetics. Trends Ecol. Evol. 24, 1–3 (2009)CrossRefGoogle Scholar
  39. 39.
    Creevey, C., McInerney, J.: Trees from trees: construction of phylogenetic supertrees using CLANN. In: Bioinformatics for DNA Sequence Analysis, vol. 537, pp. 139–61. Springer, Clifton, NJ (2009)Google Scholar
  40. 40.
    Criscuolo, A., Berry, V., Douzery, E., Gascuel, O.: SDM: a fast distance-based approach for (super) tree building in phylogenomics. Syst. Biol. 55, 740–755 (2006)CrossRefGoogle Scholar
  41. 41.
    Criscuolo, A., Gascuel, O.: Fast NJ-like algorithms to deal with incomplete distance matrices. BMC Bioinform. 9(166) (2008)Google Scholar
  42. 42.
    Davies, T., Barraclough, T., Chase, M., Soltis, P., Soltis, D., Savolainen, V.: Darwin’s abominable mystery: insights from a supertree of the angiosperms. Proc. Natl. Acad. Sci. 101, 1904–1909 (2004)CrossRefGoogle Scholar
  43. 43.
    Desper, R., Gascuel, O.: Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting. Mol. Biol. Evol. 21(3), 587–598 (2004). http://dx.doi.org/10.1093/molbev/msh049
  44. 44.
    Dobzhansky, T.: Nothing in biology makes sense except in the light of evolution. Am. Biol. Teacher 35, 125–129 (1973)CrossRefGoogle Scholar
  45. 45.
    Edwards, S.: Is a new and general theory of molecular systematics emerging? Evolution 63(1), 1–19 (2009)CrossRefGoogle Scholar
  46. 46.
    Erdös, P., Steel, M., Székely, L., Warnow, T.: A few logs suffice to build (almost) all trees (I). Random Struct. Algorithms 14, 153–184 (1999)CrossRefMathSciNetzbMATHGoogle Scholar
  47. 47.
    Erdös, P., Steel, M., Székely, L., Warnow, T.: A few logs suffice to build (almost) all trees (II). Theoret. Comput. Sci. 221, 77–118 (1999)CrossRefMathSciNetzbMATHGoogle Scholar
  48. 48.
    Fakcharoenphol, J., Rao, S., Talwar, K.: A tight bound on approximating arbitrary metrics by tree metrics. J. Comput. Syst. Sci. 69(3), 485–497 (2004)CrossRefMathSciNetzbMATHGoogle Scholar
  49. 49.
    Fernández, M.H., Vrba, E.S.: A complete estimate of the phylogenetic relationships in ruminantia: a dated species-level supertree of the extant ruminants. Biol. Rev. 80(2), 269–302 (2005)CrossRefGoogle Scholar
  50. 50.
    Fleischauer, M., Böcker, S.: Collecting reliable clades using the greedy strict consensus merger. PeerJ 4, e2172 (2016)CrossRefGoogle Scholar
  51. 51.
    Fleischauer, M., Böcker, S.: Bad clade deletion supertrees: a fast and accurate supertree algorithm. Mol. Biol. Evol. 34(9), 2408–2421 (2017)CrossRefGoogle Scholar
  52. 52.
    Foulds, L.R., Graham, R.L.: The Steiner problem in phylogeny is NP-complete. Adv. Appl. Math. 3(43–49), 299 (1982)MathSciNetzbMATHGoogle Scholar
  53. 53.
    Gascuel, O.: BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14, 685–695 (1997)CrossRefGoogle Scholar
  54. 54.
    Goloboff, P., Farris, J., Nixon, K.: TNT, a free program for phylogenetic analysis. Cladistics 24, 1–13 (2008)CrossRefGoogle Scholar
  55. 55.
    Gramm, J., Niedermeier, R.: A fixed-parameter algorithm for minimum quartet inconsistency. J. Comput. Syst. Sci. 67(4), 723–741 (2003)CrossRefMathSciNetzbMATHGoogle Scholar
  56. 56.
    Grappa (genome rearrangements analysis under parsimony and other phylogenetic algorithms). https://www.cs.unm.edu/~moret/GRAPPA/
  57. 57.
    Grotkopp, E., Rejmánek, M., Sanderson, M.J., Rost, T.L.: Evolution of genome size in pines (pinus) and its life-history correlates: supertree analyses. Evolution 58(8), 1705–1729 (2004)CrossRefGoogle Scholar
  58. 58.
    Guindon, S., Gascuel, O.: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003) (1063-5157 (Print))Google Scholar
  59. 59.
    Hallett, M., Lagergren, J.: New algorithms for the duplication-loss model. In: Proceedings of the ACM Symposium on Computational Biology RECOMB2000, pp. 138–146. ACM Press, New York (2000)Google Scholar
  60. 60.
    Hillis, D.M., Huelsenbeck, J.P., Cunningham, C.W.: Application and accuracy of molecular phylogenies. Science 264, 671–677 (1994)CrossRefGoogle Scholar
  61. 61.
    Holland, B., Conner, G., Huber, K., Moulton, V.: Imputing supertrees and supernetworks from quartets. Syst. Biol. 56(1), 57–67 (2007). http://dx.doi.org/10.1080/10635150601167013
  62. 62.
    Huson, D., Nettles, S., Warnow, T.: Disk-covering, a fast converging method for phylogenetic tree reconstruction. J. Comput. Biol. 6(3), 369–386 (1999)CrossRefGoogle Scholar
  63. 63.
    Huson, D., Vawter, L., Warnow, T.: Solving large scale phylogenetic problems using DCM2. In: Proceedings of 7th International Conference on Intelligent Systems for Molecular Biology (ISMB’99), pp. 118–129. AAAI Press (1999)Google Scholar
  64. 64.
    Huson, D.H., Vawter, L., Warnow, T.: Solving large scale phylogenetic problems using DCM2. In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology table of contents, pp. 118–129. AAAI Press (1999)Google Scholar
  65. 65.
    Janowitz, M., Lapointe, F.J., McMorris, F., Mirkin, B., Roberts, F. (eds.): Bioconsensus: DIMACS Working Group Meetings on Bioconsensus, 25–26 Oct 2000 and 2–5 Oct 2001, DIMACS Center 61. American Mathematical Society (2003)Google Scholar
  66. 66.
    Jarvis, E., Mirarab, S., Aberer, A.J., Li, B., Houde, P., Li, C., Ho, S., Faircloth, B.C., Nabholz, B., Howard, J.T., Suh, A., Weber, C.C., da Fonseca, R.R., Li, J., Zhang, F., Li, H., Zhou, L., Narula, N., Liu, L., Ganapathy, G., Boussau, B., Bayzid, M.S., Zavidovych, V., Subramanian, S., Gabaldón, T., Capella-Gutiérrez, S., Huerta-Cepas, J., Rekepalli, B., Munch, K., Schierup, M., Lindow, B., Warren, W.C., Ray, D., Green, R.E., Bruford, M.W., Zhan, X., Dixon, A., Li, S., Li, N., Huang, Y., Derryberry, E.P., Bertelsen, M.F., Sheldon, F.H., Brumfield, R.T., Mello, C.V., Lovell, P.V., Wirthlin, M., Schneider, M.P.C., Prosdocimi, F., Samaniego, J.A., Velazquez, A.M.V., Alfaro-Núnez, A., Campos, P.F., Petersen, B., Sicheritz-Ponten, T., Pas, A., Bailey, T., Scofield, P., Bunce, M., Lambert, D.M., Zhou, Q., Perelman, P., Driskell, A.C., Shapiro, B., Xiong, Z., Zeng, Y., Liu, S., Li, Z., Liu, B., Wu, K., Xiao, J., Yinqi, X., Zheng, Q., Zhang, Y., Yang, H., Wang, J., Smeds, L., Rheindt, F.E., Braun, M., Fjeldsa, J., Orlando, L., Barker, F.K., Jonsson, K.A., Johnson, W., Koepfli, K.P., O’Brien, S., Haussler, D., Ryder, O.A., Rahbek, C., Willerslev, E., Graves, G.R., Glenn, T.C., McCormack, J., Burt, D., Ellegren, H., Alstrom, P., Edwards, S.V., Stamatakis, A., Mindell, D.P., Cracraft, J., Braun, E.L., Warnow, T., Jun, W., Gilbert, M.T.P., Zhang, G.: Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346(6215), 1320–1331 (2014)CrossRefGoogle Scholar
  67. 67.
    Jewett, E., Rosenberg, N.A.: iGLASS: an improvement to the GLASS method for estimating species trees from gene trees. J. Comput. Biol. 19(3), 293–315 (2012)CrossRefMathSciNetGoogle Scholar
  68. 68.
    Jiang, T., Kearney, P., Li, M.: A polynomial-time approximation scheme for inferring evolutionary trees from quartet topologies and its applications. SIAM J. Comput. 30(6), 1924–1961 (2001)CrossRefMathSciNetzbMATHGoogle Scholar
  69. 69.
    Jones, K.E., Purvis, A., MacLarnon, A., Bininda-Emonds, O.R.P., Simmons, N.B.: A phylogenetic supertree of the bats (Mammalia: Chiroptera). Biol. Rev. Camb. Philos. Soc. 77, 223–259 (2002)CrossRefGoogle Scholar
  70. 70.
    Jonsson, K.A., Fjeldsa, J.: A phylogenetic supertree of oscine passerine birds (Aves: Passeri). Zoologica Scripta 35, 149–186 (2006)CrossRefGoogle Scholar
  71. 71.
    Kettleborough, G., Dicks, J., Roberts, I.N., Huber, K.T.: Reconstructing (super) trees from data sets with missing distances: not all is lost. Mol. Biol. Evol. 32(6), 1628–1642 (2015)CrossRefGoogle Scholar
  72. 72.
    Kupczok, A.: Split-based computation of majority rule supertrees. BMC Evol. Biol. 11, (2011)Google Scholar
  73. 73.
    Lacey, M., Chang, J.: A signal-to-noise analysis of phylogeny estimation by neighbor-joining: insufficiency of polynomial length sequences. Math. Biosci. 199(2), 188–215 (2006)CrossRefMathSciNetzbMATHGoogle Scholar
  74. 74.
    Lafond, M., Scornavacca, C.: On the Weighted Quartet Consensus Problem (2016). arXiv:1610.00505
  75. 75.
    Lapointe, F.J., Cucumel, G.: The average consensus procedure: combination of weighted trees containing identical or overlapping sets of taxa. Syst. Biol. 46(2), 306–312 (1997)CrossRefGoogle Scholar
  76. 76.
    Larget, B., Kotha, S., Dewey, C., Ané, C.: BUCKy: gene tree/species tree reconciliation with the Bayesian concordance analysis. Bioinformatics 26(22), 2910–2911 (2010)CrossRefGoogle Scholar
  77. 77.
    Lechner, M., Hernandez-Rosales, M., Doerr, D., Wieseke, N., Thévenin, A., Stoye, J., Hartmann, R., Prohaska, S., Stadler, P.: Orthology detection combining clustering and synteny for very large datasets. PLoS ONE 9(8), e105,015 (2014).  https://doi.org/10.1371/journal.pone.0105015
  78. 78.
    Lefort, V., Desper, R., Gascuel, O.: FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol. Biol. Evol. 32(10), 2798–2800 (2015).  https://doi.org/10.1093/molbev/msv150
  79. 79.
    Liu, K., Linder, C., Warnow, T.: RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation. PLoS ONE 6(11), e27,731 (2012)Google Scholar
  80. 80.
    Liu, K., Raghavan, S., Nelesen, S., Linder, C.R., Warnow, T.: Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324(5934), 1561–1564 (2009)CrossRefGoogle Scholar
  81. 81.
    Liu, L., Yu, L.: Estimating species trees from unrooted gene trees. Syst. Biol. 60(5), 661–667 (2011)CrossRefGoogle Scholar
  82. 82.
    Liu, L., Yu, L., Edwards, S.: A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10, 302 (2010)CrossRefGoogle Scholar
  83. 83.
    Lopez, P., Casane, D., Philippe, H.: Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19, 1–7 (2002)CrossRefGoogle Scholar
  84. 84.
    Maddison, W.P.: Gene trees in species trees. Syst. Biol. 46, 523–536 (1997)CrossRefGoogle Scholar
  85. 85.
    Martins, L., Mallo, D., Posada, D.: A Bayesian supertree model for genome-wide species tree reconstruction. Syst. Biol. 65, 397–416 (2016)CrossRefGoogle Scholar
  86. 86.
    McMorris, F.: Axioms for consensus functions on undirected phylogenetic trees. Math. Biosci. 74, 17–21 (1985)CrossRefMathSciNetzbMATHGoogle Scholar
  87. 87.
    Mihaescu, R., Levy, D., Pachter, L.: Why neighbor-joining works. Algorithmica 54(1), 1–24 (2009)CrossRefMathSciNetzbMATHGoogle Scholar
  88. 88.
    Mirarab, S., Reaz, R., Bayzid, M.S., Zimmermann, T., Swenson, M., Warnow, T.: ASTRAL: Accurate Species TRee ALgorithm. Bioinformatics 30(17), i541–i548 (2014)CrossRefGoogle Scholar
  89. 89.
    Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)CrossRefGoogle Scholar
  90. 90.
    Molloy, E.K., Warnow, T.: NJMerge: a generic technique for scaling phylogeny estimation methods and its application to species trees. In: Blanchette, M., Ouangraoua, A. (eds.) Comparative Genomics, pp. 260–276. Springer International Publishing, Cham (2018)Google Scholar
  91. 91.
    Moret, B.M.E., Wang, L.S., Warnow, T.: New software for computational phylogenetics. IEEE Comput.: Spec. Issue Bioinform. 35(7), 55–64 (2002)Google Scholar
  92. 92.
    Mossel, E., Roch, S.: Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE Trans. Comput. Biol. Bioinform. 7(1), 166–171 (2011)CrossRefGoogle Scholar
  93. 93.
    Nelesen, S., Liu, K., Wang, L.S., Linder, C.R., Warnow, T.: DACTAL: divide-and-conquer trees (almost) without alignments. Bioinformatics 28, i274–i282 (2012)CrossRefGoogle Scholar
  94. 94.
    Neves, D., Sobral, J.: Parallel SuperFine—a tool for fast and accurate supertree estimation: features and limitations. Future Gener. Comput. Syst. 67, 441–454 (2017)CrossRefGoogle Scholar
  95. 95.
    Neves, D., Warnow, T., Sobral, J., Pingali, K.: Parallelizing SuperFine. In: 27th Symposium on Applied Computing (ACM-SAC), Bioinformatics, pp. 1361–1367. ACM (2012).  https://doi.org/10.1145/2231936.2231992
  96. 96.
    Neves, D.T., Sobral, J.L.: Parallel SuperFine—a tool for fast and accurate supertree estimation: Features and limitations. Future Gener. Comput. Syst. 67, 441–454 (2017).  https://doi.org/10.1016/j.future.2016.04.004. http://www.sciencedirect.com/science/article/pii/S0167739X16300814
  97. 97.
    Nguyen, L.T., Schmidt, H., von Haeseler, A., Minh, B.: IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32(1), 268–274 (2015).  https://doi.org/10.1093/molbev/msu300CrossRefGoogle Scholar
  98. 98.
    Nguyen, N., Mirarab, S., Kumar, K., Warnow, T.: Ultra-large alignments using phylogeny aware profiles. Genome Biol. 16(124) (2015).  https://doi.org/10.1186/s13059-015-0688-z. A preliminary version appeared in the Proceedings RECOMB 2015
  99. 99.
    Nguyen, N., Mirarab, S., Warnow, T.: MRL and SuperFine+MRL: new supertree methods. J. Algorithms Mol. Biol. 7(3) (2012)Google Scholar
  100. 100.
    Nute, M., Warnow, T.: Scaling statistical multiple sequence alignment to large datasets. BMC Genomics 17(10), 764 (2016).  https://doi.org/10.1186/s12864-016-3101-8
  101. 101.
    de Oliveira Martins, L., Posada, D.: Species tree estimation from genome-wide data with Guenomu. In: Bioinformatics, pp. 461–478. Springer (2017)Google Scholar
  102. 102.
    Pardi, F., Guillemot, S., Gascuel, O.: Combinatorics of distance-based tree inference. Proc. Natl. Acad. Sci. (USA) 109(41), 16443–16448 (2012)CrossRefGoogle Scholar
  103. 103.
    Piaggio-Talice, R., Burleigh, J.G., Eulenstein, O.: Quartet supertrees. In: Bininda-Emonds, O.R.P. (ed.) Phylogenetic Supertrees: Combining Information to Reveal The Tree of Life, pp. 173–191. Kluwer Academic, Dordrecht, The Netherlands (2004)CrossRefGoogle Scholar
  104. 104.
    Pisani, D.: A genus-level supertree of the Dinosauria. Proc. R. Soc. Lond. B: Biol. Sci. 269, 915–921 (2002)CrossRefGoogle Scholar
  105. 105.
    Pisani, D., Cotton, J.A., McInerney, J.O.: Supertrees disentangle the chimeric origin of eukaryotic genomes. Mol. Biol. Evol. (2007)Google Scholar
  106. 106.
    Popescu, A.A., Huber, K.T., Paradis, E.: ape 3.0: New tools for distance-based phylogenetics and evolutionary analysis in R. Bioinformatics 28(11), 1536–1537 (2012)Google Scholar
  107. 107.
    Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5(3), e9490 (2010)CrossRefGoogle Scholar
  108. 108.
    Ragan, M.A.: Phylogenetic inference based on matrix representation of trees. Mol. Phylogenet. Evol. 1, 53–58 (1992)CrossRefGoogle Scholar
  109. 109.
    Ranwez, V., Berry, V., Criscuolo, A., Fabre, P.H., Guillemot, S., Scornavacca, C., Douzery, E.J.: PhySIC: a veto supertree method with desirable properties. Syst. Biol. 56(5), 798–817 (2007)CrossRefGoogle Scholar
  110. 110.
    Ranwez, V., Criscuolo, A., Douzery, E.J.: SuperTriplets: a triplet-based supertree approach to phylogenomics. Bioinformatics 26(12), i115–i123 (2010)CrossRefGoogle Scholar
  111. 111.
    Ranwez, V., Gascuel, O.: Quartet-based phylogenetic inference: improvements and limits. Mol. Biol. Evol. 18(6), 1103–1116 (2001)CrossRefzbMATHGoogle Scholar
  112. 112.
    Reaz, R., Bayzid, M., Rahman, M.: Accurate phylogenetic tree reconstruction from quartets: a heuristic approach. PLoS ONE (2014).  https://doi.org/10.1371/journal.pone.0104008CrossRefGoogle Scholar
  113. 113.
    Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981)CrossRefMathSciNetzbMATHGoogle Scholar
  114. 114.
    Roch, S., Steel, M.: Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theoret. Popul. Biol. 100, 56–62 (2015)CrossRefzbMATHGoogle Scholar
  115. 115.
    Rodrigo, A.G.: A comment on Baum’s method for combining phylogenetic trees. Taxon 42(3), 631–636 (1993)CrossRefGoogle Scholar
  116. 116.
    Roshan, U., Moret, B.M., Williams, T.L., Warnow, T.: REC-I-DCM3: a fast algorithmic technique for reconstructing large phylogenetic trees. In: Proceedings of 3rd IEEE Computational Systems Bioinformatics Conference CSB ’04, LCBB-CONF-2004-002, pp. 98–109. IEEE Press (2004)Google Scholar
  117. 117.
    Roshan, U., Moret, B.M.E., Williams, T.L., Warnow, T.: Performance of supertree methods on various dataset decompositions. In: Bininda-Emonds, O.R.P. (ed.) Phylogenetic Supertrees: Combining Information to Reveal The Tree Of Life, pp. 301–328. Kluwer Academic, Dordrecht, The Netherlands (2004)CrossRefGoogle Scholar
  118. 118.
    Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)Google Scholar
  119. 119.
    Salamin, N., Davies, J.T.: Using supertrees to investigate species richness in grasses and flowering plants. In: Bininda-Emonds, O.R.P. (ed.) Phylogenetic Supertrees: Combining Information to Reveal The Tree Of Life, pp. 461–487. Kluwer Academic, Dordrecht, The Netherlands (2004)CrossRefGoogle Scholar
  120. 120.
    Sanderson, M., McMahon, M., Steel, M.: Phylogenomics with incomplete taxon coverage: the limits to inference. BMC Evol. Biol. 10, 155 (2010)CrossRefGoogle Scholar
  121. 121.
    Sanderson, M.J., McMahon, M.M., Stamatakis, A., Zwickl, D.J., Steel, M.: Impacts of terraces on phylogenetic inference. Syst. Biol. 64(5), 709–726 (2015)CrossRefGoogle Scholar
  122. 122.
    Sanderson, M.J., McMahon, M.M., Steel, M.: Terraces in phylogenetic tree space. Science 333(6041), 448–450 (2011)CrossRefGoogle Scholar
  123. 123.
    Semple, C., Steel, M.: A supertree method for rooted trees. Discrete Appl. Math. 105(1–3), 147–158 (2000).  https://doi.org/10.1016/S0166-218X(00)00202-X. http://www.sciencedirect.com/science/article/pii/S0166218X0000202X
  124. 124.
    Sevillya, G., Frenkel, Z., Snir, S.: Triplet MaxCut: a new toolkit for rooted supertree. Methods Ecol. Evol. 7, 1359–1365 (2016).  https://doi.org/10.1111/2041-210X.12606CrossRefGoogle Scholar
  125. 125.
    Shigezumi, T.: Robustness of greedy type minimum evolution algorithms. In: Proceedings of International Conference on Computational Science, pp. 815–821. Springer (2006)Google Scholar
  126. 126.
    Sjölander, K., Datta, R., Shen, Y., Shoffner, G.: Ortholog identification in the presence of domain architecture rearrangement. Brief. Bioinform. 12(5), 413–422 (2011).  https://doi.org/10.1093/bib/bbr036. http://bib.oxfordjournals.org/content/12/5/413.abstract
  127. 127.
    Snir, S., Rao, S.: Using max cut to enhance rooted trees consistency. IEEE/ACM Trans. Comput. Biol. Bioinform. 323–333 (2006)Google Scholar
  128. 128.
    Snir, S., Rao, S.: Quartets MaxCut: a divide and conquer quartets algorithm. IEEE/ACM Trans. Comput. Biol. Bioinform. 7(4), 704–718 (2010)CrossRefGoogle Scholar
  129. 129.
    Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006)CrossRefGoogle Scholar
  130. 130.
    Steel, M.: The complexity of reconstructing trees from qualitative characters and subtrees. J. Classif. 9, 91–116 (1992)CrossRefMathSciNetzbMATHGoogle Scholar
  131. 131.
    Steel, M., Gascuel, O.: Neighbor-joining revealed. Mol. Biol. Evol. 23(11), 1997–2000 (2006)CrossRefGoogle Scholar
  132. 132.
    Steel, M., Rodrigo, A.: Maximum likelihood supertrees. Syst. Biol. 57(2), 243–250 (2008)CrossRefGoogle Scholar
  133. 133.
    Strimmer, K., von Haeseler, A.: Quartet puzzling: a quartet maximim-likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13(7), 964–969 (1996)CrossRefGoogle Scholar
  134. 134.
    Swenson, M., Suri, R., Linder, C., Warnow, T.: An experimental study of Quartets MaxCut and other supertree methods. Algorithms Mol. Biol. 6, 7 (2011). PMID: 21504600CrossRefGoogle Scholar
  135. 135.
    Swenson, M., Suri, R., Linder, C., Warnow, T.: SuperFine: fast and accurate supertree estimation. Syst. Biol. 61(2), 214–227 (2012)CrossRefGoogle Scholar
  136. 136.
    Swofford, dD.: PAUP*: Phylogenetic Analysis Using Parsimony (*d and Other Methods) Ver. 4. Sinauer Associated, Sunderland, Massachusetts (2002)Google Scholar
  137. 137.
    Szöllősi, G., Rosikiewicz, W., Boussau, B., Tannier, E., Daubin, V.: Efficient exploration of the space of reconciled gene trees. Syst. Biol. (2013).  https://doi.org/10.1093/sysbio/syt054. http://sysbio.oxfordjournals.org/content/early/2013/08/06/sysbio.syt054.abstract
  138. 138.
    Szöllősi, G.J., Boussau, B., Abby, S.S., Tannier, E., Daubin, V.: Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations. Proc. Natl. Acad. Sci. 109(43), 17513–17518 (2012).  https://doi.org/10.1073/pnas.1202997109CrossRefGoogle Scholar
  139. 139.
    Tang, J., Moret, B.: Scaling up accurate phylogenetic reconstruction from gene-order data. Bioinformatics 19 (Suppl. 1), i305–i312 (2003). Proceedings of 11th International Conference on Intelligent Systems for Molecular Biology ISMB’03Google Scholar
  140. 140.
    Than, C., Nakhleh, L.: Species tree inference by minimizing deep coalescences. PLoS Comput. Biol. 5, 31000,501 (2009)Google Scholar
  141. 141.
    Thorley, J., Wilkinson, M.: A view of supertree methods. DIMACS Ser. Discrete Math. Theoret. Comput. Sci. 61, 185–194 (2003)Google Scholar
  142. 142.
    Vachaspati, P., Warnow, T.: ASTRID: accurate species TRees from internode distances. BMC Genomics 16(Suppl 10), S3 (2015)CrossRefGoogle Scholar
  143. 143.
    Vachaspati, P., Warnow, T.: FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization. Bioinformatics (2016).  https://doi.org/10.1093/bioinformatics/btw600CrossRefGoogle Scholar
  144. 144.
    Vachaspati, P., Warnow, T.: SIESTA: Enhancing searches for optimal supertrees and species trees. BMC Genomics (2018) (to appear)Google Scholar
  145. 145.
    Vachaspati, P., Warnow, T.: SVDquest: Improving SVDquartets species tree estimation using exact optimization within a constrained search space. Mol. Phylogenet. Evol. 124, 122–136 (2018).  https://doi.org/10.1016/j.ympev.2018.03.006. http://www.sciencedirect.com/science/article/pii/S105579031730338X
  146. 146.
    Wang, L.S., Leebens-Mack, J., Wall, P.K., Beckmann, K., DePamphilis, C.W., Warnow, T.: The impact of multiple protein sequence alignment on phylogenetic estimation. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 1108–1119 (2011)CrossRefGoogle Scholar
  147. 147.
    Warnow, T.: Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation. Cambridge University Press, Cambridge UK (2018)zbMATHGoogle Scholar
  148. 148.
    Warnow, T., Moret, B.M.E., St. John, K.: Absolute convergence: true trees from short sequences. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA 01), pp. 186–195. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2001)Google Scholar
  149. 149.
    Waterman, M., Smith, T., Beyer, W.: Some biological sequence metrics. Adv. Math. 20, 367–387 (1976)CrossRefMathSciNetzbMATHGoogle Scholar
  150. 150.
    Waterman, M., Smith, T., Singh, M., Beyer, W.: Additive evolutionary trees. J. Theoret. Biol. 64, 199–213 (1977)CrossRefMathSciNetGoogle Scholar
  151. 151.
    Wehe, A., Bansal, M., Burleigh, J., Eulenstein, O.: DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24(13), 1540–1541 (2008).  https://doi.org/10.1093/bioinformatics/btn230. http://bioinformatics.oxfordjournals.org/content/24/13/1540.abstract
  152. 152.
    Wheeler, T.: Large-scale neighbor-joining with NINJA. In: Proceedings of Workshop Algorithms in Bioinformatics (WABI), vol. 5724, pp. 375–389 (2009)Google Scholar
  153. 153.
    Wickett, N., Mirarab, S., Nguyen, N., Warnow, T., Carpenter, E., Matasci, N., Ayyampalayam, S., Barker, M., Burleigh, J., Gitzendanner, M., Ruhfel, B.R., Wafula, E., Der, J.P., Graham, S.W., Mathews, S., Melkonian, M., Soltis, D.E., Soltis, P.S., Miles, N.W., Rothfels, C.J., Pokorny, L., Shaw, A.J., DeGironimo, L., Stevenson, D.W., Surek, B., Villarreal, J.C., Roure, B., Philippe, H., dePamphilis, C.W., Chen, T., Deyholos, M.K., Baucom, R.S., Kutchan, T.M., Augustin, M.M., Wang, J., Zhang, Y., Tian, Z., Yan, Z., Wu, X., Sun, X., Wong, G.K.S., Leebens-Mack, J.: Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl. Acad. Sci. 111(45), E4859–E4868 (2014)CrossRefGoogle Scholar
  154. 154.
    Wilkinson, M., Cotton, J.A., Lapointe, F.J., Pisani, D.: Properties of supertree methods in the consensus setting. Syst. Biol. 56(2), 330–337 (2007).  https://doi.org/10.1080/10635150701245370
  155. 155.
    Willson, S.: Constructing rooted supertrees using distances. Bull. Math. Biol. 66(6), 1755–1783 (2004)CrossRefMathSciNetzbMATHGoogle Scholar
  156. 156.
    Xin, L., Ma, B., Zhang, K.: A new quartet approach for reconstructing phylogenetic trees: quartet joining method. In: Proceedings. Computing and Combinatorics (COCOON) 2007, Lecture Notes in Computer Science, vol. 4598, pp. 40–50. Springer, Berlin, Heidelberg (2007)Google Scholar
  157. 157.
    Yu, Y., Warnow, T., Nakhleh, L.: Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles. J. Comput. Biol. 18, 1543–1559 (2011).  https://doi.org/10.1089/cmb.2011.0174MathSciNetCrossRefGoogle Scholar
  158. 158.
    Zhang, C., Sayyari, E., Mirarab, S.: ASTRAL-III: increased scalability and impacts of contracting low support branches. In: Meidanis, J., Nakhleh, L. (eds.) Comparative Genomics, pp. 53–75. Springer International Publishing, Cham (2017)CrossRefGoogle Scholar
  159. 159.
    Zhang, Q., Rao, S., Warnow, T.: New absolute fast converging phylogeny estimation methods with improved scalability and accuracy. In: Parida, L., Ukkonen, E. (eds.) 18th International Workshop on Algorithms in Bioinformatics (WABI 2018), pp. 8:1–8:12. LIPICS, Dagsttuhl (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of Illinois at Urbana-ChampaignChampaignUSA

Personalised recommendations