Quality of Algorithms for Sequence Comparison

  • Mikhail Roytberg
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6744)


Pair-wise sequence alignment is the basic method of comparative analysis of proteins and nucleic acids. Studying the results of the alignment one has to consider two questions: (1) did the program find all the interesting similarities (“sensitivity”) and (2) are all the found similarities interesting (“selectivity”). Definitely, one has to specify, what alignments are considered as the interesting ones. Analogous questions can be addressed to each of the obtained alignments: (3) which part of the aligned positions are aligned correctly (“confidence”) and (4) does alignment contain all pairs of the corresponding positions of compared sequences (“accuracy”). Naturally, the answer on the questions depends on the definition of the correct alignment. The presentation addresses the above two pairs of questions that are extremely important in interpreting of the results of sequence comparison.


alignment seed sequence comparison sensitivity selectivity accuracy confidence 


  1. 1.
    Altschul, S.F., Gish, W., Miller, W., et al.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)CrossRefGoogle Scholar
  2. 2.
    Ma, B., Tromp, J., Li, M.: PatternHunter: Fasterand more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)CrossRefGoogle Scholar
  3. 3.
    Brejová, B., Brown, D.G., Vinař, T.: Optimal Spaced Seeds for Hidden Markov Models, with Application to Homologous Coding Regions. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 42–54. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Brejová, B., Brown, D.G., Vinař, T.: Vector seeds: An extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for homologous coding regions. Journal of Bioinformatics and Computational Biology 1(4), 595–610 (2004)CrossRefzbMATHGoogle Scholar
  6. 6.
    Brown, D.: Optimizing multiple seeds for protein homology search. IEEE Transactions on Computational Biology and Bioinformatics 2(1), 29–38 (2005)CrossRefGoogle Scholar
  7. 7.
    Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proceedings of the 7th Annual International Conference on Computational Molecular Biology (RECOMB 2003), Berlin, Germany, April 2003, pp. 67–75. ACM Press, New York (2003)Google Scholar
  8. 8.
    Kucherov, G., Noé, L., Roytberg, M.: Multiseed lossless filtration. IEEE Transactions on Computational Biology and Bioinformatics 2(1), 51–61 (2005)CrossRefGoogle Scholar
  9. 9.
    Li, M., Ma, B., Kisman, D., Tromp, J.: Pattern Hunter II: Highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology (2004), Earlier version in GIW 2003 (International Conference on Genome Informatics)Google Scholar
  10. 10.
    Kucherov, G., Noé, L., Roytberg, M.: A unifying framework for seed sensitivity and its application to subset seeds. Journal of Bioinformatics and Computational Biology 4(2), 553–569 (2006)CrossRefGoogle Scholar
  11. 11.
    Xu, J., Brown, D.G., Li, M., Ma, B.: Optimizing Multiple Spaced Seeds for Homology Search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 47–58. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  12. 12.
    Yang, I., Wang, S., Chen, Y., Huang, P., Ye, L., Huang, X., Chao, K.: Efficient methods for generating optimal single and multiple spaced seeds. In: Proceedings of the IEEE 4th Symposium on Bioinformatics and Bioengineering(BIBE 2004), Taichung, Taiwan, May 19-21, 2004, pp. 411–416. IEEE Computer Society Press, Los Alamitos (2004)CrossRefGoogle Scholar
  13. 13.
    Sunyaev, Bogopolsky, G.A., Oleynikova, N.V., Vlasov, P.K., Finkelstein, A.V., Roytberg, M.A.: From Analysis of Protein Structural Alignments Toward a Novel Approach to Align Protein Sequences. PROTEINS: Structure, Function, and Bioinformatics 54(3), 569–582 (2004)CrossRefGoogle Scholar
  14. 14.
    Stoye, J., Evers, D., Meyer, F.: Rose: generating sequence families. Bioinformatics 14, 157–163 (1998)CrossRefGoogle Scholar
  15. 15.
    Polyanovsky, V., Roytberg, M., Tumanyan, V.: Reconstruction of Genuine Pair-Wise Sequence Alignment. J. Comput. Biol. (April 24, 2008) (Epub ahead of print)Google Scholar
  16. 16.
    Vogt, G., Etzold, T., Argos, P.: An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J. Mol. Biol. 249, 816–831 (1995)CrossRefGoogle Scholar
  17. 17.
    Domingues, F.S., Lackner, P., Andreeva, A., et al.: Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. J. Mol. Biol. 297, 1003–1013 (2000)CrossRefGoogle Scholar
  18. 18.
    Mevissen, H.T., Vingron, M.: Quantifying the local reliability of a sequence alignment. Prot. Eng. 9, 127–132 (1996)CrossRefGoogle Scholar
  19. 19.
    Vingron, M., Argos, P.: Determination of reliable regions in protein sequence alignments. Prot. Eng. 3, 565–569 (1990)CrossRefGoogle Scholar
  20. 20.
    Litvinov, I.I., Lobanov, Yu, M., Mironov, A.A., et al.: Information on the Secondary Structure Improves the Quality of Protein Sequence Alignment. Mol. Biol. 40, 474–480 (2006)CrossRefGoogle Scholar
  21. 21.
    Wallqvist, A., Fukunishi, Y., Murphy, L.R., et al.: Iterative sequence/secondary structure search for protein homologs: Comparison with amino acid sequence alignments and application to fold recognition in genome databases. Bioinformatics 16, 988–1002 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Mikhail Roytberg
    • 1
    • 2
  1. 1.Institute of Mathematical Problems in BiologyRASMoscow RegionRussia
  2. 2.National Research University Higher School of EconomicsMoscowRussia

Personalised recommendations