Multiple Vector Seeds for Protein Alignment

  • Daniel G. Brown
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3240)


We present a framework for improving local protein alignment algorithms. Specifically, we discuss how to extend local protein aligners to use a collection of vector seeds [3] to reduce noise hits. We model picking a set of vector seeds as an integer programming problem, and give algorithms to choose such a set of seeds. A good set of vector seeds we have chosen allows four times fewer false positive hits, while preserving essentially identical sensitivity as BLASTP.


False Positive Rate Protein Alignment True Alignment Nucleotide Alignment Unrelated Sequence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Molecular Biology 215(3), 403–410 (1990)Google Scholar
  2. 2.
    Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research 28(1), 45–48 (2000)CrossRefGoogle Scholar
  3. 3.
    Brejova, B., Brown, D., Vinar, T.: Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for homologous coding regions. J. Bioinf. and Comp. Biol. 1, 595–610 (2004)CrossRefGoogle Scholar
  5. 5.
    Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proceedings of the 7th Annual International Conference on Computational Biology (RECOMB), pp. 67–75 (2003)Google Scholar
  6. 6.
    Choi, K.P., Zhang, L.: Sensitive analysis and efficient method for identifying optimal spaced seeds. J. Comp and Sys. Sci. 68, 22–40 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Hochbaum, D.: Approximating covering and packing problems. In: Hochbaum, D. (ed.) Approximation algorithms for NP-hard problems, pp. 94–143. PWS (1997)Google Scholar
  8. 8.
    Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Appl. Math. 138, 253–263 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Li, M., Ma, B., Kisman, D., Tromp, J.: Patternhunter II: Highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology (2004)Google Scholar
  10. 10.
    Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)CrossRefGoogle Scholar
  11. 11.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)CrossRefGoogle Scholar
  12. 12.
    Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for DNA similarity search. In: Proceedings of the 8th Annual International Conference on Computational Biology (RECOMB), pp. 76–84 (2004)Google Scholar
  13. 13.
    Xu, J., Brown, D., Li, M., Ma, B.: Optimizing multiple spaced seeds for homology search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 47–58. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Daniel G. Brown
    • 1
  1. 1.School of Computer ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations