A Large Version of the Small Parsimony Problem

  • Jakob Fredslund
  • Jotun Hein
  • Tejs Scharling
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2812)


Given a multiple alignment over k sequences, an evolutionary tree relating the sequences, and a subadditive gap penalty function (e.g. an affine function), we reconstruct the internal nodes of the tree optimally: we find the optimal explanation in terms of indels of the observed gaps and find the most parsimonious assignment of nucleotides. The gaps of the alignment are represented in a so-called gap graph, and through theoretically sound preprocessing the graph is reduced to pave the way for a running time which in all but the most pathological examples is far better than the exponential worst case time. E.g. for a tree with nine leaves and a random alignment of length 10.000 with 60% gaps, the running time is on average around 45 seconds. For a real alignment of length 9868 of nine HIV-1 sequences, the running time is less than one second.


Internal Node Evolutionary Tree Tree Covering Directed Edge Alignment Length 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fitch, W.M.: Towards defining the course of evolution: minimum change for a specific tree topology. Systematic Zoology 20, 406–416 (1971)CrossRefGoogle Scholar
  2. 2.
    Sankoff, D.: Matching sequences under deletion/insertion constraints. Proc. Natl. Acad. Sci. USA 69, 4–6 (1972)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Hartigan, J.A.: Miminum mutation fits to a given tree. Biometrics 20, 53–65 (1973)CrossRefGoogle Scholar
  4. 4.
    Sellers, P.: An algorithm for the distance between two finite sequences. J. Comb. Theory 16, 253–258 (1974)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Sankoff, D.: Minimal mutation trees of sequences. SIAM J. appl. Math 78, 35–42 (1975)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Waterman, M.S., Smith, T.F., Beyer, W.A.: Some biological sequence metrics. Advances in Mathematics 20, 367–387 (1976)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1981)CrossRefGoogle Scholar
  8. 8.
    Fredman, M.L.: Algorithms for computing evolutionary similarity measures with length independent gap penalties. Bull. Math. Biol. 46(4), 545–563 (1984)MathSciNetGoogle Scholar
  9. 9.
    Hein, J.J.: A method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. Mol. Biol. Evol. 6(6), 649–668 (1989)Google Scholar
  10. 10.
    Wang, L., Jiang, T., Lawler, E.L.: Approximation Algorithms for Tree Alignment with a Given Phylogeny. Algorithmica 16, 302–315 (1996)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Wang, L., Gusfield, D.: Improved Approximation Algorithms for Tree Alignment. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 220–233. Springer, Heidelberg (1996)Google Scholar
  12. 12.
    Stoye, J.: Multiple sequence alignment with the divide-and-conquer method. Gene  211(2), GC45–GC56 (1998)Google Scholar
  13. 13.
    Althaus, E., Caprara, A., Lenhof, H.-P., Reinert, K.: Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics. In: ECCB 2002, pp. 4–16 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Jakob Fredslund
    • 1
  • Jotun Hein
    • 2
  • Tejs Scharling
    • 1
  1. 1.Bioinformatics Research Center, Department of Computer ScienceUniversity of AarhusDenmark
  2. 2.Department of StatisticsUniversity of OxfordUnited Kingdom

Personalised recommendations