New Algorithms for Multiple DNA Sequence Alignment

Brown, Daniel G.; Hudek, Alexander K.

doi:10.1007/978-3-540-30219-3_27

Daniel G. Brown²¹ &
Alexander K. Hudek²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3240))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

600 Accesses
2 Citations

Abstract

We present a mathematical framework for anchoring inglobal multiple alignment. Our framework uses anchors that are hits to spaced seeds and identifies anchors progressively, using a phylogenetic tree. We compute anchors in the tree starting at the root and going to the leaves, and from the leaves going up. In both cases, we compute thresholds for anchors to minimize errors. One innovative aspect of our approach is the approximate inference of ancestral sequences with accomodation for ambiguity. This, combined with proper scoring techniques and seeding, lets us pick many anchors in homologous positions as we align up a phylogenetic tree, minimizing total work. Our algorithm is reasonably successful in simulations, is comparable to existing software in terms of accuracy and substantially more efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blanchette, M., Kent, W.J., Riemer, C., et al.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715 (2004)
Article Google Scholar
Bray, N., Pachter, L.: MAVID: Constrained ancestral alignment of multiple sequences. Genome Res. 14, 693–699 (2004)
Article Google Scholar
Brejova, B., Brown, D., Vinar, T.: Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)
Chapter Google Scholar
Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for homologous coding regions. J. Bioinf. and Comp. Biol. 1, 595–610 (2004)
Article Google Scholar
Brown, D.: Multiple vector seeds for protein alignment. In: These proceedings
Google Scholar
Brudno, M., Chapman, M., Gottgens, B., Batzoglou, S., Morgenstern, B.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinf. 4, 66 (2003)
Article Google Scholar
Brudno, M., Do, C., Cooper, G., Kim, M., et al.: LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731 (2003)
Article Google Scholar
Brudno, M., Morgenstern, B.: Fast and sensitive alignment of large genomic sequences. In: Proceedings of CSB 2002, pp. 138–147 (2002)
Google Scholar
Carrillo, H., Lipman, D.: The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 48, 1073–1082 (1988)
Article MATH MathSciNet Google Scholar
Eppstein, D., Giancarlo, R., Galil, Z., Italiano, G.F.: Sparse dynamic programming. I: Linear cost functions; II: Convex and concave cost functions. J. ACM 39 (1992)
Google Scholar
Feller, W.: An Introduction to Probability Theory and Its Applications. John Wiley & Sons, New York (1957)
MATH Google Scholar
Fitch, W.M.: Toward defining the course of evolution: minimum change for a specified tree topology. Syst. Zool. 20, 406–416 (1971)
Article Google Scholar
Hohl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. Bioinf. 18, 312–320 (2002)
Google Scholar
Kececioglu, J.D., Zhang, W.: Aligning alignments. In: Farach-Colton, M. (ed.) CPM 1998. LNCS, vol. 1448, pp. 189–208. Springer, Heidelberg (1998)
Chapter Google Scholar
Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Appl. Math. 138, 253–263 (2004)
Article MATH MathSciNet Google Scholar
Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. J. Bioinf. and Comp. Biol. (2004) (to appear)
Google Scholar
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinf. 18, 440–445 (2002)
Article Google Scholar
Ma, B., Wang, Z., Zhang, K.: Alignment between two multiple alignments. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 254–265. Springer, Heidelberg (2003)
Chapter Google Scholar
Morgenstern, B., Dress, A., Werner, T.: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Natl. Acad. Sci. 93, 12098–12103 (1996)
Article MATH Google Scholar
Thompson, J., Higgins, D., Gibson, T.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice. Nucl. Acids Res. 22, 4673–4680 (1994)
Article Google Scholar
Zhang, Y., Waterman, M.: An eulerian path approach to global multiple alignment for DNA sequences. J. Comp. Biol. 10, 803–819 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
Daniel G. Brown & Alexander K. Hudek

Authors

Daniel G. Brown
View author publications
You can also search for this author in PubMed Google Scholar
Alexander K. Hudek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Computational Biology Unit, HIB, University of Bergen, 5020, Bergen, Norway
Inge Jonassen
Department of Biology,, Penn Center for Bioinformatics, Penn Genomics Institute, 415 S. University Ave., PA 19104, Philadelphia, USA
Junhyong Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brown, D.G., Hudek, A.K. (2004). New Algorithms for Multiple DNA Sequence Alignment. In: Jonassen, I., Kim, J. (eds) Algorithms in Bioinformatics. WABI 2004. Lecture Notes in Computer Science(), vol 3240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30219-3_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-30219-3_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23018-2
Online ISBN: 978-3-540-30219-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics