Abstract
Given a multiple alignment over k sequences, an evolutionary tree relating the sequences, and a subadditive gap penalty function (e.g. an affine function), we reconstruct the internal nodes of the tree optimally: we find the optimal explanation in terms of indels of the observed gaps and find the most parsimonious assignment of nucleotides. The gaps of the alignment are represented in a so-called gap graph, and through theoretically sound preprocessing the graph is reduced to pave the way for a running time which in all but the most pathological examples is far better than the exponential worst case time. E.g. for a tree with nine leaves and a random alignment of length 10.000 with 60% gaps, the running time is on average around 45 seconds. For a real alignment of length 9868 of nine HIV-1 sequences, the running time is less than one second.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Fitch, W.M.: Towards defining the course of evolution: minimum change for a specific tree topology. Systematic Zoology 20, 406–416 (1971)
Sankoff, D.: Matching sequences under deletion/insertion constraints. Proc. Natl. Acad. Sci. USA 69, 4–6 (1972)
Hartigan, J.A.: Miminum mutation fits to a given tree. Biometrics 20, 53–65 (1973)
Sellers, P.: An algorithm for the distance between two finite sequences. J. Comb. Theory 16, 253–258 (1974)
Sankoff, D.: Minimal mutation trees of sequences. SIAM J. appl. Math 78, 35–42 (1975)
Waterman, M.S., Smith, T.F., Beyer, W.A.: Some biological sequence metrics. Advances in Mathematics 20, 367–387 (1976)
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1981)
Fredman, M.L.: Algorithms for computing evolutionary similarity measures with length independent gap penalties. Bull. Math. Biol. 46(4), 545–563 (1984)
Hein, J.J.: A method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. Mol. Biol. Evol. 6(6), 649–668 (1989)
Wang, L., Jiang, T., Lawler, E.L.: Approximation Algorithms for Tree Alignment with a Given Phylogeny. Algorithmica 16, 302–315 (1996)
Wang, L., Gusfield, D.: Improved Approximation Algorithms for Tree Alignment. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 220–233. Springer, Heidelberg (1996)
Stoye, J.: Multiple sequence alignment with the divide-and-conquer method. Gene 211(2), GC45–GC56 (1998)
Althaus, E., Caprara, A., Lenhof, H.-P., Reinert, K.: Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics. In: ECCB 2002, pp. 4–16 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fredslund, J., Hein, J., Scharling, T. (2003). A Large Version of the Small Parsimony Problem. In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-39763-2_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20076-5
Online ISBN: 978-3-540-39763-2
eBook Packages: Springer Book Archive