Vector Seeds: An Extension to Spaced Seeds Allows Substantial Improvements in Sensitivity and Specificity

  • Broňa Brejová
  • Daniel G. Brown
  • Tomáš Vinař
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2812)


We present improved techniques for finding homologous regions in DNA and protein sequences. Our approach focuses on the core region of a local pairwise alignment; we suggest new ways to characterize these regions that allow marked improvements in both specificity and sensitivity over existing techniques for sequence alignment. For any such characterization, which we call a vector seed, we give an efficient algorithm that estimates the specificity and sensitivity of that seed under reasonable probabilistic models of sequence. We also characterize the probability of a match when an alignment is required to have multiple hits before it is detected. Our extensions fit well with existing approaches to sequence alignment, while still offering substantial improvement in runtime and sensitivity, particularly for the important problem of identifying matches between homologous coding DNA sequences.


False Positive Rate False Negative Rate Hash Table Protein Alignment Good Seed 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Molecular Biology 215(3), 403–410 (1990)Google Scholar
  2. 2.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17), 3389–3392 (1997)CrossRefGoogle Scholar
  3. 3.
    Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research 28(1), 45–48 (2000)CrossRefGoogle Scholar
  4. 4.
    Brejová, B., Brown, D., Vinař, T.: Optimal spaced seeds for hidden Markov models, with application to homologous coding regions. In: Proceedings of the 14th Annual Symposium on Combinatorial Pattern Matching, CPM (2003) (to appear)Google Scholar
  5. 5.
    Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic dna. In: Proceedings of the 7th Annual International Conference on Computational Biology (RECOMB), pp. 67–75 (2003)Google Scholar
  6. 6.
    Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 89(22), pp. 10915–10919 (1992)Google Scholar
  7. 7.
    Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds. (unpublished)Google Scholar
  8. 8.
    Kent, W.J.: BLAT–the BLAST-like alignment tool. Genome Research 12(4), 656–664 (2002)MathSciNetGoogle Scholar
  9. 9.
    Korf, I., Flicek, P., Duan, D., Brent, M.R.: Integrating genomic homology into gene structure prediction. Bioinformatics 17(Suppl. 1), S140–S148 (2001)Google Scholar
  10. 10.
    Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Broňa Brejová
    • 1
  • Daniel G. Brown
    • 1
  • Tomáš Vinař
    • 1
  1. 1.School of Computer ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations