Skip to main content

CSI: Clustered Segment Indexing for Efficient Approximate Searching on the Secondary Structure of Protein Sequences

  • Conference paper
Foundations of Intelligent Systems (ISMIS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3488))

Included in the following conference series:

  • 1095 Accesses

Abstract

Approximate searching on the primary structure (i.e., amino acid arrangement) of protein sequences is an essential part in predicting the functions and evolutionary histories of proteins. However, because proteins distant in an evolutionary history do not conserve amino acid residue arrangements, approximate searching on proteins’ secondary structure is quite important in finding out distant homology. In this paper, we propose an indexing scheme for efficient approximate searching on the secondary structure of protein sequences which can be easily implemented in RDBMS. Exploiting the concept of clustering and lookahead, the proposed indexing scheme processes three types of secondary structure queries (i.e., exact match, range match, and wildcard match) very quickly. To evaluate the performance of the proposed method, we conducted extensive experiments using a set of actual protein sequences. According to the experimental results, the proposed method was proved to be faster than the existing indexing methods up to 6.3 times in exact match, 3.3 times in range match, and 1.5 times in wildcard match, respectively.

This work was supported by the Korea Research Foundation Grant. (KRF-2004-003-D00302)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alberts, B., Bray, D., Lweis, J., Raff, M., Roberts, K., Watson, J.D.: Molecular Biology of the Cell, 3rd edn. Garland Publishing Inc. (1994)

    Google Scholar 

  2. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Research 25(17) (1997)

    Google Scholar 

  3. Aung, Z., Fu, W., Tan, K.-L.: An Efficient Index-based Protein Structure Database Searching Method. In: Proc. IEEE DASFAA Conf. (2003)

    Google Scholar 

  4. Baxevanis, A.D., Ouellette, B.F.F.: BIOINFORMATICS: A Practical Guide to the Analysis of Genes and Proteins, 2nd edn. Wiley Interscience, Hoboken (2001)

    Google Scholar 

  5. Camoglu, O., Kahveci, T., Singh, A.K.: Towards Index-based Similarity Search for Protein Structure Databases. In: Proc. IEEE Computer Society Bioinformatics Conf., pp. 148–158 (2003)

    Google Scholar 

  6. Eidhammer, I., Jonassen, I.: Protein Structure Comparison and Structure Patterns - An Algorithmic Approach. ISMB tutorial (2001)

    Google Scholar 

  7. Fondrat, C., Dessen, P.: A Rapid Access Motif Database(RAMdb) with a Searching Algorithm for the Retrieval Patterns in Nucleic Acids or Protein Databanks. Computer Applications in the Bioscience 11(3), 273–279 (1995)

    Google Scholar 

  8. Frishman, D., Argos, P.: Seventy-five Accuracy in Protein Secondary Structure Prediction. Proteins 27(3), 329–335 (1997)

    Article  Google Scholar 

  9. Frishman, D., Argos, P.: Incorporation of Long-Distance Interactions into a Secondary Structure Prediction Algorithm. Protein Engineering 9(2), 133–142 (1996)

    Article  Google Scholar 

  10. Gibrat, J.F., Madel, T., Bryant, S.H.: Surprising Similarities in Structure Comparison. Current Opinion in Structural Biology 6(3), 377–385 (1996)

    Article  Google Scholar 

  11. Hammel, L., Patel, J.M.: Searching on the Secondary Structure of Protein Sequence. In: Proc. VLDB Conf. (2002)

    Google Scholar 

  12. Holm, L., Sander, C.: Protein Structure Comparison by Alignment of Distance Matrices. J. Molecular Biology 233(1), 123–138 (1993)

    Article  Google Scholar 

  13. Hunt, E., Atkinson, M.P., Irving, R.W.: Database Indexing for Large DNA and Protein Sequence Collections. VLDB Journal 11(3), 256–271 (2002)

    Article  MATH  Google Scholar 

  14. Koehl, P.: Protein Structure Similarities. Current Opinion in Structural Biology 11(3), 348–353 (2001)

    Article  Google Scholar 

  15. Mount, D.W.: Bioinformatics. Cold Spring Harbor Laboratory Press (2000)

    Google Scholar 

  16. Stephen, G.A.: String Searching Algorithms. World Scientific Publishing, Singapore (1994)

    MATH  Google Scholar 

  17. Wang, H., Perng, C.-S., Fan, W., Park, S., Yu, P.S.: Indexing Weighted Sequences in Large Databases. In: Proc. IEEE ICDE Conf., pp. 63–74 (2003)

    Google Scholar 

  18. Williams, H.E.: Genomic Information Retrieval. In: Proc. Australasian Database Conf., pp. 27–35 (2003)

    Google Scholar 

  19. Wu, C.H., Yeh, L.-S.L., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Kourtesis, P., Ledley, R.S., Suzek, B.E., Vinayaka, C.R., Zhang, J., Barker, W.C.: The Protein Information Resource. Nucleic Acids Research 31(1), 345–347 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Seo, M., Park, S., Won, JI. (2005). CSI: Clustered Segment Indexing for Efficient Approximate Searching on the Secondary Structure of Protein Sequences. In: Hacid, MS., Murray, N.V., RaĹ›, Z.W., Tsumoto, S. (eds) Foundations of Intelligent Systems. ISMIS 2005. Lecture Notes in Computer Science(), vol 3488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11425274_25

Download citation

  • DOI: https://doi.org/10.1007/11425274_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25878-0

  • Online ISBN: 978-3-540-31949-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics