Improved Alignment of Protein Sequences Based on Common Parts

  • David Hoksza
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4983)


In the last twenty years, protein databases have been growing exponentially. To speed up the search, heuristic approaches have been proposed and their accuracy has been steadily growing, but exact search is still needed in some cases. The only exact search algorithm remains SSEARCH (or it’s clones) which sequentially scans database of protein sequences, and performs full alignment against each of the sequences.

Due to the need of the exact search, we focus on improving the sequential search algorithm. We decrease the costs needed to compute the alignment of pair of protein sequences when used with large databases. This is achieved by reusing alignment calculations of common parts of the sequences without loss of accuracy.

With this method, we reduced the computational costs by up to 20 % depending on the database size and subset used. We also implemented approximate search which further reduced computational costs for the the sake of some accuracy loss.


protein databases Smith-Waterman algorithm 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI–BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRefGoogle Scholar
  2. 2.
    Barton, G.J.: An efficient algorithm to locate all locally optimal alignments between two sequences allowing for gaps. Computer Applications in the Biosciences 9(6), 729–734 (1993)Google Scholar
  3. 3.
    Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model for evolutionary change in proteins. Atlas of Protein Sequence and Structure 5, 345–352 (1978)Google Scholar
  4. 4.
    Dydel, S., Bała, P.: Large scale protein sequence alignment using fpga reprogrammable logic devices. In: Becker, J., Platzner, M., Vernalde, S. (eds.) FPL 2004. LNCS, vol. 3203, pp. 23–32. Springer, Heidelberg (2004)Google Scholar
  5. 5.
    Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1982)CrossRefGoogle Scholar
  6. 6.
  7. 7.
    Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proc. Natl Acad. Sci. USA. 89, 10915–10919 (1992)CrossRefGoogle Scholar
  8. 8.
    Hoksza, D., Skopal, T.: Index-based approach to similarity search in protein and nucleotide databases. In: DATESO, pp. 67–80 (2007)Google Scholar
  9. 9.
    Itoh, M., Goto, S., Akutsu, T., Kanehisa, M.: Fast and accurate database homology search using upper bounds of local alignment scores. Bioinformatics 21(7), 912–921 (2005)CrossRefGoogle Scholar
  10. 10.
    Lipman, D.J., Pearson, W.R.: Rapid and Sensitive Protein Similarity Searches. Science 227, 1435–1441 (1985)CrossRefGoogle Scholar
  11. 11.
  12. 12.
    Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)CrossRefGoogle Scholar
  13. 13.
    Ramdas, T., Egan, G.: A survey of fpgas for acceleration of high performance computing and their application to computational molecular biology. In: TENCON 2005 IEEE Region, vol. 10, pp. 1–6 (2005)Google Scholar
  14. 14.
    Rognes, T., Seeberg, E.: Six-fold speed-up of smith-waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16(8), 699–706 (2000)CrossRefGoogle Scholar
  15. 15.
    Sellers, P.H.: The theory and computation of evolutionary distances: Pattern recognition. J. Algorithms 1(4), 359–373 (1980)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)CrossRefGoogle Scholar
  17. 17.
    Wu, C., Apweiler, R., Bairoch, A., Natale, D.A., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Mazumder, R., O’Donovan, C., Redaschi, N., Suzek, B.: The universal protein resource (uniprot): an expanding universe of protein information. Nucleic Acids Res. 34(Database issue)(1), D187–D191 (2006)CrossRefGoogle Scholar
  18. 18.
    Xu, W., Miranker, D.P.: A metric model of amino acid substitution. Bioinformatics 20(8), 1214–1221 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • David Hoksza
    • 1
  1. 1.Department of software engineering, Faculty of Mathematics and PhysicsCharles University in PraguePrague 1Czech Republic

Personalised recommendations