Improved Alignment of Protein Sequences Based on Common Parts
In the last twenty years, protein databases have been growing exponentially. To speed up the search, heuristic approaches have been proposed and their accuracy has been steadily growing, but exact search is still needed in some cases. The only exact search algorithm remains SSEARCH (or it’s clones) which sequentially scans database of protein sequences, and performs full alignment against each of the sequences.
Due to the need of the exact search, we focus on improving the sequential search algorithm. We decrease the costs needed to compute the alignment of pair of protein sequences when used with large databases. This is achieved by reusing alignment calculations of common parts of the sequences without loss of accuracy.
With this method, we reduced the computational costs by up to 20 % depending on the database size and subset used. We also implemented approximate search which further reduced computational costs for the the sake of some accuracy loss.
Keywordsprotein databases Smith-Waterman algorithm
Unable to display preview. Download preview PDF.
- 2.Barton, G.J.: An efficient algorithm to locate all locally optimal alignments between two sequences allowing for gaps. Computer Applications in the Biosciences 9(6), 729–734 (1993)Google Scholar
- 3.Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model for evolutionary change in proteins. Atlas of Protein Sequence and Structure 5, 345–352 (1978)Google Scholar
- 4.Dydel, S., Bała, P.: Large scale protein sequence alignment using fpga reprogrammable logic devices. In: Becker, J., Platzner, M., Vernalde, S. (eds.) FPL 2004. LNCS, vol. 3203, pp. 23–32. Springer, Heidelberg (2004)Google Scholar
- 8.Hoksza, D., Skopal, T.: Index-based approach to similarity search in protein and nucleotide databases. In: DATESO, pp. 67–80 (2007)Google Scholar
- 11.MPSrch, http://www.ebi.ac.uk/MPsrch/
- 13.Ramdas, T., Egan, G.: A survey of fpgas for acceleration of high performance computing and their application to computational molecular biology. In: TENCON 2005 IEEE Region, vol. 10, pp. 1–6 (2005)Google Scholar
- 17.Wu, C., Apweiler, R., Bairoch, A., Natale, D.A., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Mazumder, R., O’Donovan, C., Redaschi, N., Suzek, B.: The universal protein resource (uniprot): an expanding universe of protein information. Nucleic Acids Res. 34(Database issue)(1), D187–D191 (2006)CrossRefGoogle Scholar