Alignment-Free Phylogenetic Reconstruction

  • Constantinos Daskalakis
  • Sebastien Roch
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6044)

Abstract

We introduce the first polynomial-time phylogenetic reconstruction algorithm under a model of sequence evolution allowing insertions and deletions (or indels). Given appropriate assumptions, our algorithm requires sequence lengths growing polynomially in the number of leaf taxa. Our techniques are distance-based and largely bypass the problem of multiple alignment.

Keywords

Phylogenetic reconstruction indels alignment 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Thorne, J.L., Kishino, H., Felsenstein, J.: An evolutionary model for maximum likelihood alignment of dna sequences. Journal of Molecular Evolution 33(2), 114–124 (1991)CrossRefGoogle Scholar
  2. 2.
    Thorne, J.L., Kishino, H., Felsenstein, J.: Inching toward reality: An improved likelihood model of sequence evolution. Journal of Molecular Evolution 34(1), 3–16 (1992)CrossRefGoogle Scholar
  3. 3.
    Loytynoja, A., Goldman, N.: Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis. Science 320(5883), 1632–1635 (2008)CrossRefGoogle Scholar
  4. 4.
    Wong, K.M., Suchard, M.A., Huelsenbeck, J.P.: Alignment Uncertainty and Genomic Analysis. Science 319(5862), 473–476 (2008)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Metzler, D.: Statistical alignment based on fragment insertion and deletion models. Bioinformatics 19(4), 490–499 (2003)CrossRefGoogle Scholar
  6. 6.
    Miklos, I., Lunter, G.A., Holmes, I.: A ”Long Indel” Model For Evolutionary Sequence Alignment. Mol. Biol. Evol. 21(3), 529–540 (2004)CrossRefGoogle Scholar
  7. 7.
    Suchard, M.A., Redelings, B.D.: BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics 22(16), 2047–2048 (2006)CrossRefGoogle Scholar
  8. 8.
    Rivas, E., Eddy, S.R.: Probabilistic phylogenetic inference with insertions and deletions. PLoS Comput. Biol. 4, e1000172 (2008)Google Scholar
  9. 9.
    Liu, K., Raghavan, S., Nelesen, S., Linder, C.R., Warnow, T.: Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees. Science 324(5934), 1561–1564 (2009)CrossRefGoogle Scholar
  10. 10.
    Felsenstein, J.: Cases in which parsimony or compatibility methods will be positively misleading. Syst. Biol., 401–410 (1978)Google Scholar
  11. 11.
    Erdös, P.L., Steel, M.A., Székely, L.A., Warnow, T.A.: A few logs suffice to build (almost) all trees (part 1). Random Struct. Algor. 14(2), 153–184 (1999)MATHCrossRefGoogle Scholar
  12. 12.
    Semple, C., Steel, M.: Phylogenetics. Mathematics and its Applications series, vol. 22. Oxford University Press, Oxford (2003)MATHGoogle Scholar
  13. 13.
    Graur, D., Li, W.-H.: Fundamentals of Molecular Evolution, 2nd edn. Sinauer Associates, Inc., Sunderland (1999)Google Scholar
  14. 14.
    Felsenstein, J.: Inferring Phylogenies. Sinauer, New York (2004)Google Scholar
  15. 15.
    Atteson, K.: The performance of neighbor-joining methods of phylogenetic reconstruction. Algorithmica 25(2-3), 251–278 (1999)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Erdös, P.L., Steel, M.A., Székely, L.A., Warnow, T.A.: A few logs suffice to build (almost) all trees (part 2). Theor. Comput. Sci. 221, 77–118 (1999)MATHCrossRefGoogle Scholar
  17. 17.
    Huson, D.H., Nettles, S.H., Warnow, T.J.: Disk-covering, a fast-converging method for phylogenetic tree reconstruction. J. Comput. Biol. 6(3–4) (1999)Google Scholar
  18. 18.
    Steel, M.A., Székely, L.A.: Inverting random functions. Ann. Comb. 3(1), 103–113 (1999); Combinatorics and biology (Los Alamos, NM, 1998) Google Scholar
  19. 19.
    Csurös, M., Kao, M.Y.: Provably fast and accurate recovery of evolutionary trees through harmonic greedy triplets. SIAM Journal on Computing 31(1), 306–322 (2001)MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Csurös, M.: Fast recovery of evolutionary trees with thousands of nodes. J. Comput. Biol. 9(2), 277–297 (2002)CrossRefGoogle Scholar
  21. 21.
    Steel, M.A., Székely, L.A.: Inverting random functions. II. Explicit bounds for discrete maximum likelihood estimation, with applications. SIAM J. Discrete Math. 15(4), 562–575 (2002) (electronic) Google Scholar
  22. 22.
    King, V., Zhang, L., Zhou, Y.: On the complexity of distance-based evolutionary tree reconstruction. In: SODA 2003: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 444–453. Society for Industrial and Applied Mathematics, Philadelphia (2003)Google Scholar
  23. 23.
    Mossel, E., Roch, S.: Learning nonsingular phylogenies and hidden Markov models. Ann. Appl. Probab. 16(2), 583–614 (2006)MATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Daskalakis, C., Mossel, E., Roch, S.: Optimal phylogenetic reconstruction. In: STOC 2006: Proceedings of the 38th Annual ACM Symposium on Theory of Computing, pp. 159–168. ACM Press, New York (2006)CrossRefGoogle Scholar
  25. 25.
    Lacey, M.R., Chang, J.T.: A signal-to-noise analysis of phylogeny estimation by neighbor-joining: insufficiency of polynomial length sequences. Math. Biosci. 199(2), 188–215 (2006)MATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Daskalakis, C., Hill, C., Jaffe, A., Mihaescu, R., Mossel, E., Rao, S.: Maximal accurate forests from distance matrices. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2006. LNCS (LNBI), vol. 3909, pp. 281–295. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  27. 27.
    Mossel, E.: Distorted metrics on trees and phylogenetic forests. IEEE/ACM Trans. Comput. Bio. Bioinform. 4(1), 108–116 (2007)CrossRefMathSciNetGoogle Scholar
  28. 28.
    Gronau, I., Moran, S., Snir, S.: Fast and reliable reconstruction of phylogenetic trees with very short edges. In: Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 379–388. Society for Industrial and Applied Mathematics, Philadelphia (2008)Google Scholar
  29. 29.
    Roch, S.: Sequence-length requirement for distance-based phylogeny reconstruction: Breaking the polynomial barrier. In: FOCS, pp. 729–738 (2008)Google Scholar
  30. 30.
    Daskalakis, C., Mossel, E., Roch, S.: Phylogenies without branch bounds: Contracting the short, pruning the deep. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS, vol. 5541, pp. 451–465. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  31. 31.
    Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology 1(4), 337–348 (1994)CrossRefGoogle Scholar
  32. 32.
    Elias, I.: Settling the intractability of multiple alignment. Journal of Computational Biology 13(7), 1323–1339 (2006) PMID: 17037961Google Scholar
  33. 33.
    Higgins, D.G., Sharp, P.M.: Clustal: a package for performing multiple sequence alignment on a microcomputer. Gene 73(1), 237–244 (1988)CrossRefGoogle Scholar
  34. 34.
    Katoh, K., Misawa, K., Kuma, K.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucl. Acids Res. 30(14), 3059–3066 (2002)CrossRefGoogle Scholar
  35. 35.
    Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl. Acids Res. 32(5), 1792–1797 (2004)CrossRefGoogle Scholar
  36. 36.
    Thatte, B.D.: Invertibility of the TKF model of sequence evolution. Math. Biosci. 200(1), 58–75 (2006)MATHCrossRefMathSciNetGoogle Scholar
  37. 37.
    Andoni, A., Daskalakis, C., Hassidim, A., Roch, S.: Trace reconstruction on a tree (2009) (Preprint)Google Scholar
  38. 38.
    Hohl, M., Ragan, M.A.: Is Multiple-Sequence Alignment Required for Accurate Inference of Phylogeny? Syst. Biol. 56(2), 206–221 (2007)CrossRefGoogle Scholar
  39. 39.
    Karlin, S., Taylor, H.M.: A second course in stochastic processes, p. 542. Academic Press Inc.[Harcourt Brace Jovanovich Publishers], New York (1981)MATHGoogle Scholar
  40. 40.
    Buneman, P.: The recovery of trees from measures of dissimilarity. In: Mathematics in the Archaelogical and Historical Sciences, pp. 187–395. Edinburgh University Press, Edinburgh (1971)Google Scholar
  41. 41.
    Athreya, K.B., Ney, P.E.: Branching processes. Springer, New York (1972); Die Grundlehren der mathematischen Wissenschaften, Band 196MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Constantinos Daskalakis
    • 1
  • Sebastien Roch
    • 2
  1. 1.CSAIL, MIT 
  2. 2.Department of Mathematics and Bioinformatics ProgamUCLA 

Personalised recommendations