Advertisement

New Algorithms for Multiple DNA Sequence Alignment

  • Daniel G. Brown
  • Alexander K. Hudek
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3240)

Abstract

We present a mathematical framework for anchoring inglobal multiple alignment. Our framework uses anchors that are hits to spaced seeds and identifies anchors progressively, using a phylogenetic tree. We compute anchors in the tree starting at the root and going to the leaves, and from the leaves going up. In both cases, we compute thresholds for anchors to minimize errors. One innovative aspect of our approach is the approximate inference of ancestral sequences with accomodation for ambiguity. This, combined with proper scoring techniques and seeding, lets us pick many anchors in homologous positions as we align up a phylogenetic tree, minimizing total work. Our algorithm is reasonably successful in simulations, is comparable to existing software in terms of accuracy and substantially more efficient.

Keywords

Ancestral Sequence Full Alignment Homologous Position Good Anchor Eulerian Path 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blanchette, M., Kent, W.J., Riemer, C., et al.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715 (2004)CrossRefGoogle Scholar
  2. 2.
    Bray, N., Pachter, L.: MAVID: Constrained ancestral alignment of multiple sequences. Genome Res. 14, 693–699 (2004)CrossRefGoogle Scholar
  3. 3.
    Brejova, B., Brown, D., Vinar, T.: Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for homologous coding regions. J. Bioinf. and Comp. Biol. 1, 595–610 (2004)CrossRefGoogle Scholar
  5. 5.
    Brown, D.: Multiple vector seeds for protein alignment. In: These proceedingsGoogle Scholar
  6. 6.
    Brudno, M., Chapman, M., Gottgens, B., Batzoglou, S., Morgenstern, B.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinf. 4, 66 (2003)CrossRefGoogle Scholar
  7. 7.
    Brudno, M., Do, C., Cooper, G., Kim, M., et al.: LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731 (2003)CrossRefGoogle Scholar
  8. 8.
    Brudno, M., Morgenstern, B.: Fast and sensitive alignment of large genomic sequences. In: Proceedings of CSB 2002, pp. 138–147 (2002)Google Scholar
  9. 9.
    Carrillo, H., Lipman, D.: The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 48, 1073–1082 (1988)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Eppstein, D., Giancarlo, R., Galil, Z., Italiano, G.F.: Sparse dynamic programming. I: Linear cost functions; II: Convex and concave cost functions. J. ACM 39 (1992)Google Scholar
  11. 11.
    Feller, W.: An Introduction to Probability Theory and Its Applications. John Wiley & Sons, New York (1957)zbMATHGoogle Scholar
  12. 12.
    Fitch, W.M.: Toward defining the course of evolution: minimum change for a specified tree topology. Syst. Zool. 20, 406–416 (1971)CrossRefGoogle Scholar
  13. 13.
    Hohl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. Bioinf. 18, 312–320 (2002)Google Scholar
  14. 14.
    Kececioglu, J.D., Zhang, W.: Aligning alignments. In: Farach-Colton, M. (ed.) CPM 1998. LNCS, vol. 1448, pp. 189–208. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  15. 15.
    Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Appl. Math. 138, 253–263 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. J. Bioinf. and Comp. Biol. (2004) (to appear)Google Scholar
  17. 17.
    Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinf. 18, 440–445 (2002)CrossRefGoogle Scholar
  18. 18.
    Ma, B., Wang, Z., Zhang, K.: Alignment between two multiple alignments. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 254–265. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  19. 19.
    Morgenstern, B., Dress, A., Werner, T.: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Natl. Acad. Sci. 93, 12098–12103 (1996)zbMATHCrossRefGoogle Scholar
  20. 20.
    Thompson, J., Higgins, D., Gibson, T.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice. Nucl. Acids Res. 22, 4673–4680 (1994)CrossRefGoogle Scholar
  21. 21.
    Zhang, Y., Waterman, M.: An eulerian path approach to global multiple alignment for DNA sequences. J. Comp. Biol. 10, 803–819 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Daniel G. Brown
    • 1
  • Alexander K. Hudek
    • 1
  1. 1.School of Computer ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations