A Large Version of the Small Parsimony Problem
Given a multiple alignment over k sequences, an evolutionary tree relating the sequences, and a subadditive gap penalty function (e.g. an affine function), we reconstruct the internal nodes of the tree optimally: we find the optimal explanation in terms of indels of the observed gaps and find the most parsimonious assignment of nucleotides. The gaps of the alignment are represented in a so-called gap graph, and through theoretically sound preprocessing the graph is reduced to pave the way for a running time which in all but the most pathological examples is far better than the exponential worst case time. E.g. for a tree with nine leaves and a random alignment of length 10.000 with 60% gaps, the running time is on average around 45 seconds. For a real alignment of length 9868 of nine HIV-1 sequences, the running time is less than one second.
KeywordsInternal Node Evolutionary Tree Tree Covering Directed Edge Alignment Length
Unable to display preview. Download preview PDF.
- 9.Hein, J.J.: A method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. Mol. Biol. Evol. 6(6), 649–668 (1989)Google Scholar
- 11.Wang, L., Gusfield, D.: Improved Approximation Algorithms for Tree Alignment. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 220–233. Springer, Heidelberg (1996)Google Scholar
- 12.Stoye, J.: Multiple sequence alignment with the divide-and-conquer method. Gene 211(2), GC45–GC56 (1998)Google Scholar
- 13.Althaus, E., Caprara, A., Lenhof, H.-P., Reinert, K.: Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics. In: ECCB 2002, pp. 4–16 (2002)Google Scholar