CSI: Clustered Segment Indexing for Efficient Approximate Searching on the Secondary Structure of Protein Sequences

Seo, Minkoo; Park, Sanghyun; Won, Jung-Im

doi:10.1007/11425274_25

Minkoo Seo²²,
Sanghyun Park²² &
Jung-Im Won²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3488))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

1095 Accesses

Abstract

Approximate searching on the primary structure (i.e., amino acid arrangement) of protein sequences is an essential part in predicting the functions and evolutionary histories of proteins. However, because proteins distant in an evolutionary history do not conserve amino acid residue arrangements, approximate searching on proteins’ secondary structure is quite important in finding out distant homology. In this paper, we propose an indexing scheme for efficient approximate searching on the secondary structure of protein sequences which can be easily implemented in RDBMS. Exploiting the concept of clustering and lookahead, the proposed indexing scheme processes three types of secondary structure queries (i.e., exact match, range match, and wildcard match) very quickly. To evaluate the performance of the proposed method, we conducted extensive experiments using a set of actual protein sequences. According to the experimental results, the proposed method was proved to be faster than the existing indexing methods up to 6.3 times in exact match, 3.3 times in range match, and 1.5 times in wildcard match, respectively.

This work was supported by the Korea Research Foundation Grant. (KRF-2004-003-D00302)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alberts, B., Bray, D., Lweis, J., Raff, M., Roberts, K., Watson, J.D.: Molecular Biology of the Cell, 3rd edn. Garland Publishing Inc. (1994)
Google Scholar
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Research 25(17) (1997)
Google Scholar
Aung, Z., Fu, W., Tan, K.-L.: An Efficient Index-based Protein Structure Database Searching Method. In: Proc. IEEE DASFAA Conf. (2003)
Google Scholar
Baxevanis, A.D., Ouellette, B.F.F.: BIOINFORMATICS: A Practical Guide to the Analysis of Genes and Proteins, 2nd edn. Wiley Interscience, Hoboken (2001)
Google Scholar
Camoglu, O., Kahveci, T., Singh, A.K.: Towards Index-based Similarity Search for Protein Structure Databases. In: Proc. IEEE Computer Society Bioinformatics Conf., pp. 148–158 (2003)
Google Scholar
Eidhammer, I., Jonassen, I.: Protein Structure Comparison and Structure Patterns - An Algorithmic Approach. ISMB tutorial (2001)
Google Scholar
Fondrat, C., Dessen, P.: A Rapid Access Motif Database(RAMdb) with a Searching Algorithm for the Retrieval Patterns in Nucleic Acids or Protein Databanks. Computer Applications in the Bioscience 11(3), 273–279 (1995)
Google Scholar
Frishman, D., Argos, P.: Seventy-five Accuracy in Protein Secondary Structure Prediction. Proteins 27(3), 329–335 (1997)
Article Google Scholar
Frishman, D., Argos, P.: Incorporation of Long-Distance Interactions into a Secondary Structure Prediction Algorithm. Protein Engineering 9(2), 133–142 (1996)
Article Google Scholar
Gibrat, J.F., Madel, T., Bryant, S.H.: Surprising Similarities in Structure Comparison. Current Opinion in Structural Biology 6(3), 377–385 (1996)
Article Google Scholar
Hammel, L., Patel, J.M.: Searching on the Secondary Structure of Protein Sequence. In: Proc. VLDB Conf. (2002)
Google Scholar
Holm, L., Sander, C.: Protein Structure Comparison by Alignment of Distance Matrices. J. Molecular Biology 233(1), 123–138 (1993)
Article Google Scholar
Hunt, E., Atkinson, M.P., Irving, R.W.: Database Indexing for Large DNA and Protein Sequence Collections. VLDB Journal 11(3), 256–271 (2002)
Article MATH Google Scholar
Koehl, P.: Protein Structure Similarities. Current Opinion in Structural Biology 11(3), 348–353 (2001)
Article Google Scholar
Mount, D.W.: Bioinformatics. Cold Spring Harbor Laboratory Press (2000)
Google Scholar
Stephen, G.A.: String Searching Algorithms. World Scientific Publishing, Singapore (1994)
MATH Google Scholar
Wang, H., Perng, C.-S., Fan, W., Park, S., Yu, P.S.: Indexing Weighted Sequences in Large Databases. In: Proc. IEEE ICDE Conf., pp. 63–74 (2003)
Google Scholar
Williams, H.E.: Genomic Information Retrieval. In: Proc. Australasian Database Conf., pp. 27–35 (2003)
Google Scholar
Wu, C.H., Yeh, L.-S.L., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Kourtesis, P., Ledley, R.S., Suzek, B.E., Vinayaka, C.R., Zhang, J., Barker, W.C.: The Protein Information Resource. Nucleic Acids Research 31(1), 345–347 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Yonsei University, Korea
Minkoo Seo, Sanghyun Park & Jung-Im Won

Authors

Minkoo Seo
View author publications
You can also search for this author in PubMed Google Scholar
Sanghyun Park
View author publications
You can also search for this author in PubMed Google Scholar
Jung-Im Won
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIRIS - UFR d’Informatique, Université Claude Bernard Lyon 1, 43, boulevard du 11 novembre 1918, 69622, Villeurbanne, France
Mohand-Said Hacid
Department of Computer Science, State University of New York, 12222, Albany, NY, USA
Neil V. Murray
Department of Computer Science, University of North Carolina, 28223, Charlotte, NC, USA
Zbigniew W. Raś
Shimane University, 89-1 Enya-cho Izumo, 6938501, Shimane, Japan
Shusaku Tsumoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Seo, M., Park, S., Won, JI. (2005). CSI: Clustered Segment Indexing for Efficient Approximate Searching on the Secondary Structure of Protein Sequences. In: Hacid, MS., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds) Foundations of Intelligent Systems. ISMIS 2005. Lecture Notes in Computer Science(), vol 3488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11425274_25

Download citation

DOI: https://doi.org/10.1007/11425274_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25878-0
Online ISBN: 978-3-540-31949-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics