New Algorithms for Multiple DNA Sequence Alignment
We present a mathematical framework for anchoring inglobal multiple alignment. Our framework uses anchors that are hits to spaced seeds and identifies anchors progressively, using a phylogenetic tree. We compute anchors in the tree starting at the root and going to the leaves, and from the leaves going up. In both cases, we compute thresholds for anchors to minimize errors. One innovative aspect of our approach is the approximate inference of ancestral sequences with accomodation for ambiguity. This, combined with proper scoring techniques and seeding, lets us pick many anchors in homologous positions as we align up a phylogenetic tree, minimizing total work. Our algorithm is reasonably successful in simulations, is comparable to existing software in terms of accuracy and substantially more efficient.
KeywordsAncestral Sequence Full Alignment Homologous Position Good Anchor Eulerian Path
Unable to display preview. Download preview PDF.
- 5.Brown, D.: Multiple vector seeds for protein alignment. In: These proceedingsGoogle Scholar
- 8.Brudno, M., Morgenstern, B.: Fast and sensitive alignment of large genomic sequences. In: Proceedings of CSB 2002, pp. 138–147 (2002)Google Scholar
- 10.Eppstein, D., Giancarlo, R., Galil, Z., Italiano, G.F.: Sparse dynamic programming. I: Linear cost functions; II: Convex and concave cost functions. J. ACM 39 (1992)Google Scholar
- 13.Hohl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. Bioinf. 18, 312–320 (2002)Google Scholar
- 16.Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. J. Bioinf. and Comp. Biol. (2004) (to appear)Google Scholar