Skip to main content

Multi-SpaM: A Maximum-Likelihood Approach to Phylogeny Reconstruction Using Multiple Spaced-Word Matches and Quartet Trees

  • Conference paper
  • First Online:
Comparative Genomics (RECOMB-CG 2018)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11183))

Included in the following conference series:

Abstract

Word-based or ‘alignment-free’ methods for phylogeny reconstruction are much faster than traditional, alignment-based approaches, but they are generally less accurate. Most alignment-free methods calculate pairwise distances for a set of input sequences, for example from word frequencies, from so-called spaced-word matches or from the average length of common substrings. In this paper, we propose the first word-based phylogeny approach that is based on multiple sequence comparison and Maximum Likelihood. Our algorithm first samples small, gap-free alignments involving four taxa each. For each of these alignments, it then calculates a quartet tree and, finally, the program Quartet MaxCut is used to infer a super tree for the full set of input taxa from the calculated quartet trees. Experimental results show that trees calculated with our approach are of high quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Angiuoli, S.V., Salzberg, S.L.: Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics 27, 334–342 (2011)

    Article  Google Scholar 

  2. Avni, E., Yona, Z., Cohen, R., Snir, S.: The performance of two supertree schemes compared using synthetic and real data quartet input. J. Mol. Evol. 86, 150–165 (2018)

    Article  Google Scholar 

  3. Ayad, L.A., Charalampopoulos, P., Iliopoulos, C.S., Pissis, S.P.: Longest common prefixes with \(k\)-errors and applications. arXiv:1801.04425 [cs.DS] (2018)

  4. Baum, B.: Combining trees as a way of combining data sets for phylogenetic inference. Taxon 41, 3–10 (1992)

    Article  Google Scholar 

  5. Bernard, G., Chan, C.X., Ragan, M.A.: Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer. Sci. Rep. 6, 28970 (2016)

    Article  Google Scholar 

  6. Bininda-Emonds, O.R.P.: Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. Computational Biology. Springer, Netherlands (2004). https://doi.org/10.1007/978-1-4020-2330-9

    Book  MATH  Google Scholar 

  7. Bromberg, R., Grishin, N.V., Otwinowski, Z.: Phylogeny reconstruction with alignment-free method that corrects for horizontal gene transfer. PLoS Comput. Biol. 12, e1004985 (2016)

    Article  Google Scholar 

  8. Cattaneo, G., Ferraro Petrillo, U., Giancarlo, R., Roscigno, G.: An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop. J. Supercomput. 73, 1467–1483 (2017)

    Article  Google Scholar 

  9. Chiaromonte, F., Yap, V.B., Miller, W.: Scoring pairwise genomic sequence alignments. In: Altman, R.B., Dunker, A.K., Hunter, L., Klein, T.E. (eds.) Pacific Symposium on Biocomputing, Lihue, Hawaii, pp. 115–126 (2002)

    Google Scholar 

  10. Chor, B., Tuller, T.: Maximum likelihood of evolutionary trees is hard. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2005. LNCS, vol. 3500, pp. 296–310. Springer, Heidelberg (2005). https://doi.org/10.1007/11415770_23

    Chapter  Google Scholar 

  11. Comin, M., Schimd, M.: Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns. BMC Bioinform. 15, S1 (2014)

    Article  Google Scholar 

  12. Dalquen, D.A., Anisimova, M., Gonnet, G.H., Dessimoz, C.: ALF - a simulation framework for genome evolution. Mol. Biol. Evol. 29, 1115–1123 (2012)

    Article  Google Scholar 

  13. Dencker, T., Leimeister, C.A., Morgenstern, B.: Multi-SpaM: a maximum-likelihood approach to phylogeny reconstruction based on multiple spaced-word matches. arxiv.org/abs/1803.09222 [q-bio.PE] (2018). http://arxiv.org/abs/1703.08792

  14. Farris, J.S.: Methods for computing wagner trees. Syst. Biol. 19, 83–92 (1970)

    Article  Google Scholar 

  15. Felsenstein, J.: Evolutionary trees from DNA sequences:a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981)

    Article  Google Scholar 

  16. Felsenstein, J.: PHYLIP - phylogeny inference package (version 3.2). Cladistics 5, 164–166 (1989)

    Google Scholar 

  17. Fitch, W.: Toward defining the course of evolution: minimum change for a specific tree topology. Syst. Zool. 20, 406–416 (1971)

    Article  Google Scholar 

  18. Foulds, L., Graham, R.: The steiner problem in phylogeny is NP-complete. Adv. Appl. Math. 3, 43–49 (1982)

    Article  MathSciNet  Google Scholar 

  19. Gerth, M., Bleidorn, C.: Comparative genomics provides a timeframe for Wolbachia evolution and exposes a recent biotin synthesis operon transfer. Nat. Microbiol. 2, 16241 (2016)

    Article  Google Scholar 

  20. Girotto, S., Comin, M., Pizzi, C.: FSH: fast spaced seed hashing exploiting adjacent hashes. Algorithms Mol. Biol. 13, 8 (2018)

    Article  Google Scholar 

  21. Hahn, L., Leimeister, C.A., Ounit, R., Lonardi, S., Morgenstern, B.: rasbhari: optimizing spaced seeds for database searching, read mapping and alignment-free sequence comparison. PLOS Comput. Biol. 12(10), e1005107 (2016)

    Article  Google Scholar 

  22. Hatje, K., Kollmar, M.: A phylogenetic analysis of the brassicales clade based on an alignment-free sequence comparison method. Front. Plant Sci. 3, 192 (2012)

    Article  Google Scholar 

  23. Haubold, B., Klötzl, F., Pfaffelhuber, P.: andi: fast and accurate estimation of evolutionary distances between closely related genomes. Bioinformatics 31, 1169–1175 (2015)

    Article  Google Scholar 

  24. Haubold, B., Pfaffelhuber, P., Domazet-Loso, M., Wiehe, T.: Estimating mutation distances from unaligned genomes. J. Comput. Biol. 16, 1487–1500 (2009)

    Article  MathSciNet  Google Scholar 

  25. Horwege, S., et al.: Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches. Nucl. Acids Res. 42, W7–W11 (2014)

    Google Scholar 

  26. Ilie, L., Ilie, S., Bigvand, A.M.: SpEED: fast computation of sensitive spaced seeds. Bioinformatics 27, 2433–2434 (2011)

    Article  Google Scholar 

  27. Ilie, S.: Efficient Computation of Spaced Seeds. BMC Res. Notes 5, 123 (2012)

    Article  Google Scholar 

  28. Leimeister, C.A., Boden, M., Horwege, S., Lindner, S., Morgenstern, B.: Fast alignment-free sequence comparison using spaced-word frequencies. Bioinformatics 30, 1991–1999 (2014)

    Article  Google Scholar 

  29. Leimeister, C.A., Morgenstern, B.: kmacs: the \(k\)-mismatch average common substring approach to alignment-free sequence comparison. Bioinformatics 30, 2000–2008 (2014)

    Article  Google Scholar 

  30. Leimeister, C.A., Schellhorn, J., Schöbel, S., Gerth, M., Bleidorn, C., Morgenstern, B.: Prot-SpaM: Fast alignment-free phylogeny reconstruction based on whole-proteome sequences. bioRxiv (2018). https://doi.org/10.1101/306142

  31. Leimeister, C.A., Sohrabi-Jahromi, S., Morgenstern, B.: Fast and accurate phylogeny reconstruction using filtered spaced-word matches. Bioinformatics 33, 971–979 (2017)

    Google Scholar 

  32. Morgenstern, B., Schöbel, S., Leimeister, C.A.: Phylogeny reconstruction based on the length distribution of k-mismatch common substrings. Algorithms Mol. Biol. 12, 27 (2017)

    Article  Google Scholar 

  33. Morgenstern, B., Zhu, B., Horwege, S., Leimeister, C.A.: Estimating evolutionary distances between genomic sequences from spaced-word matches. Algorithms Mol. Biol. 10, 5 (2015)

    Article  Google Scholar 

  34. Newton, R., et al.: Genome characteristics of a generalist marine bacterial lineage. ISME J. 4, 784–798 (2010)

    Article  Google Scholar 

  35. Noé, L.: Best hits of 11110110111: model-free selection and parameter-free sensitivity calculation of spaced seeds. Algorithms Mol. Biol. 12, 1 (2017)

    Article  Google Scholar 

  36. OpenMP Forum: OpenMP C and C++ Application Program Interface, Version 2.0. Technical report (2002). http://www.openmp.org

  37. Ounit, R., Lonardi, S.: Higher classification accuracy of short metagenomic reads by discriminative spaced k-mers. In: Pop, M., Touzet, H. (eds.) WABI 2015. LNCS, vol. 9289, pp. 286–295. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48221-6_21

    Chapter  Google Scholar 

  38. Petrillo, U.F., Guerra, C., Pizzi, C.: A new distributed alignment-free approach to compare whole proteomes. Theor. Comput. Sci. 698, 100–112 (2017)

    Article  MathSciNet  Google Scholar 

  39. Pizzi, C.: MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics. Algorithms Mol. Biol. 11, 6 (2016)

    Article  Google Scholar 

  40. Ragan, M.: Matrix representation in reconstructing phylogenetic-relationships among the eukaryotes. Biosystems 28, 47–55 (1992)

    Article  Google Scholar 

  41. Ren, J., Bai, X., Lu, Y.Y., Tang, K., Wang, Y., Reinert, G., Sun, F.: Alignment-free sequence analysis and applications. Annu. Rev. Biomed. Data Sci. 1, 93–114 (2018)

    Article  Google Scholar 

  42. Robinson, D.F., Foulds, L.: Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981)

    Article  MathSciNet  Google Scholar 

  43. Roychowdhury, T., Vishnoi, A., Bhattacharya, A.: Next-generation anchor based phylogeny (NexABP): constructing phylogeny from next-generation sequencing data. Sci. Rep. 3, 2634 (2013)

    Article  Google Scholar 

  44. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)

    Google Scholar 

  45. Sievers, F., et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011)

    Article  Google Scholar 

  46. Snir, S., Rao, S.: Quartets MaxCut: a divide and conquer quartets algorithm. IEEE/ACM Trans. Comput. Biology Bioinform. 7, 704–718 (2010)

    Article  Google Scholar 

  47. Snir, S., Rao, S.: Quartet MaxCut: a fast algorithm for amalgamating quartet trees. Mol. Phylogenetics Evol. 62, 1–8 (2012)

    Article  Google Scholar 

  48. Song, K., Ren, J., Reinert, G., Deng, M., Waterman, M.S., Sun, F.: New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. Brief. Bioinform. 15, 343–353 (2014)

    Article  Google Scholar 

  49. Song, K., Ren, J., Zhai, Z., Liu, X., Deng, M., Sun, F.: Alignment-free sequence comparison based on next-generation sequencing reads. J. Comput. Biol. 20, 64–79 (2013)

    Article  MathSciNet  Google Scholar 

  50. Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014)

    Article  Google Scholar 

  51. Steel, M.: The complexity of reconstructing trees from qualitative characters and subtress. J. Classif. 9, 91–116 (1992)

    Article  Google Scholar 

  52. Tavaré, S.: Some probabilistic and statistical problems on the analysis of DNA sequences. Lect. Math. Life Sci. 17, 57–86 (1986)

    MathSciNet  MATH  Google Scholar 

  53. Thankachan, S.V., Apostolico, A., Aluru, S.: A provably efficient algorithm for the \(k\)-mismatch average common substring problem. J. Comput. Biol. 23, 472–482 (2016)

    Article  MathSciNet  Google Scholar 

  54. Thankachan, S.V., Chockalingam, S.P., Liu, Y., Aluru, A.K.S.: A greedy alignment-free distance estimator for phylogenetic inference. BMC Bioinform. 18, 238 (2017)

    Article  Google Scholar 

  55. Ulitsky, I., Burstein, D., Tuller, T., Chor, B.: The average common substring approach to phylogenomic reconstruction. J. Comput. Biol. 13, 336–350 (2006)

    Article  MathSciNet  Google Scholar 

  56. Yi, H., Jin, L.: Co-phylog: an assembly-free phylogenomic approach for closely related organisms. Nucl. Acids Res. 41, e75 (2013)

    Article  Google Scholar 

  57. Zielezinski, A., Vinga, S., Almeida, J., Karlowski, W.M.: Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol. 18, 186 (2017)

    Article  Google Scholar 

Download references

Funding

The project was funded by VW Foundation, project VWZN3157. We acknowledge support by the Open Access Publication Funds of Göttingen University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Burkhard Morgenstern .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dencker, T., Leimeister, CA., Gerth, M., Bleidorn, C., Snir, S., Morgenstern, B. (2018). Multi-SpaM: A Maximum-Likelihood Approach to Phylogeny Reconstruction Using Multiple Spaced-Word Matches and Quartet Trees. In: Blanchette, M., Ouangraoua, A. (eds) Comparative Genomics. RECOMB-CG 2018. Lecture Notes in Computer Science(), vol 11183. Springer, Cham. https://doi.org/10.1007/978-3-030-00834-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00834-5_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00833-8

  • Online ISBN: 978-3-030-00834-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics