Multithreaded PSS-SQL for Searching Databases of Secondary Structures

Mrozek, Dariusz

doi:10.1007/978-3-319-06971-5_2

Dariusz Mrozek¹⁵

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

981 Accesses

Abstract

Protein secondary structure (PSS), as an organizational level, provides important information regarding protein construction and regular spatial shapes, including alpha-helices, beta-strands, and loops, which protein amino acid chain can adopt in some of its regions. The relevance of this information and the scope of its practical applications cause the requirement for its effective storage and processing. In this chapter, we will see how PSSs can be stored in the relational database and processed with the use of the protein secondary structure-structured query language (PSS-SQL). The PSS-SQL is an extension to the SQL language. It allows formulation of queries against a relational database in order to find proteins having secondary structures similar to the structural pattern specified by a user. In this chapter, we will see how this process can be accelerated by parallel implementation of the alignment using multiple threads working on multiple-core CPUs.

...; life was no longer considered to be a result of mysterious and vague phenomena acting on organisms, but instead the consequence of numerous chemical processes made possible thanks to proteins.

Amit Kessel, Nir Ben-Tal, 2010 [13]

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Google Scholar
Anvik, J., MacDonald, S., Szafron, D., Schaeffer, J., Bromling, S., Tan, K.: Generating parallel programs from the wavefront design pattern. In: Proceedings of the 7th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS’02), Fort Lauderdale, Florida, April 2002, pp. 1–8 (2002)
Google Scholar
Apweiler, R., Bairoch, A., Wu, C.H., et al.: Uniprot: the universal protein knowledgebase. Nucl. Acids Res. 32(Database issue), D115–D119 (2004)
Google Scholar
Berman, H., et al.: The Protein Data Bank. Nucl. Acids Res. 28, 235–242 (2000)
Article Google Scholar
Can, T., Wang, Y.: CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features. In: Proceedings of the 2003 IEEE Bioinformatics Conference (CSB 2003), pp. 169–179 (2003)
Google Scholar
Date, C.: An Introduction to Database Systems, 8th edn. Addison-Wesley, Reading (2003)
MATH Google Scholar
Frishman, D., Argos, P.: Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. Protein Eng. 9(2), 133–142 (1996)
Article Google Scholar
Gibrat, J., Madej, T., Bryant, S.: Surprising similarities in structure comparison. Curr. Opin. Struct. Biol. 6(3), 377–385 (1996)
Article Google Scholar
Hammel, L., Patel, J.M.: Searching on the secondary structure of protein sequences. In: Proceedings of 28th International Conference on Very Large Data Bases, Hong Kong, China, 2002, pp. 634–645 (2002)
Google Scholar
Joosten, R.P., Te Beek, T.A.H., Krieger, E., Hekkelman, M.L., et al.: A series of PDB related databases for everyday needs. Nucl. Acid Res. 39(Database issue), D411–D419 (2011)
Google Scholar
Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983)
Article Google Scholar
Källberg, M., Wang, H., Wang, S., Peng, J., Wang, Z., Lu, H., Xu, J.: Template-based protein structure modeling using the RaptorX web server. Nat. Protoc. 7, 1511–1522 (2012)
Article Google Scholar
Kessel, A., Ben-Tal, N.: Introduction to Proteins: Structure, Function, and Motion, 1st edn. CRC Press, Boca Raton (2010)
Google Scholar
Liu, W., Schmidt, B.: Parallel design pattern for computational biology and scientific computing applications. In: Proceedings of the 2003 IEEE International Conference on Cluster Computing, pp. 456–459 (2003)
Google Scholar
Małysiak-Mrozek, B., Kozielski, S., Mrozek, D.: Server-side query language for protein structure similarity searching. In: Human-Computer Systems Interaction: Backgrounds and Applications. Springer, Berlin, Advances in Intelligent and Soft Computing 99(2), 395–415 (2012)
Google Scholar
Mrozek, D., Małysiak-Mrozek, B.: CASSERT: a two-phase alignment algorithm for matching 3D structures of proteins. In: Kwiecień, A., Gaj, P., Stera, P. (eds.) Proceedings of 22nd International Conference on Computer Networks, Communications in Computer and Information, Springer-Verlag, CCIS 370, 334–343 (2013)
Google Scholar
Mrozek, D., Wieczorek, D., Małysiak-Mrozek, B., Kozielski, S.: PSS-SQL: protein secondary structure—structured query language. In: Proceedings of 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 2010, Buenos Aires, Argentina, pp. 1073–1076 (2010)
Google Scholar
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995)
Google Scholar
Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., et al.: CATH—a hierarchic classification of protein domain structures. Structure 5(8), 1093–1108 (1997)
Article Google Scholar
Shapiro, J., Brutlag, D.: FoldMiner and LOCK2: protein structure comparison and motif discovery on the web. Nucl. Acids Res. 32, 536–541 (2004)
Google Scholar
Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Google Scholar
Socha, B.: Multithreaded execution of the Smith-Waterman algorithm in the query language for protein secondary structures. MSc thesis, supervised by Mrozek D., Silesian University of Technology, Gliwice, Poland (2013)
Google Scholar
Stephens, S., Chen, J.Y., Thomas, Sh.: ODM BLAST: sequence homology search in the RDBMS. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering (2004)
Google Scholar
Tata, S., Patel, J.M., Friedman, J.S., Swaroop, A.: Declarative querying for biological sequences. In: Proceedings of 22nd International Conference on Data Engineering, IEEE Computer Society, 2006, pp. 87–98 (2006)
Google Scholar
Wang, Y., Sunderraman, R., Tian, H.: A domain specific data management architecture for protein structure data. In: Proceedings of 28th IEEE EMBS Annual International Conference, New York City, USA, pp. 5751–5754 (2006)
Google Scholar
Wieczorek, D., Małysiak-Mrozek, B., Kozielski, S., Mrozek, D.: A metod for matching sequences of protein secondary structures. J. Med. Info. Technol. 16, 133–137 (2010)
Google Scholar
Wieczorek, D., Małysiak-Mrozek, B., Kozielski, S., Mrozek, D.: A declarative query language for protein secondary structures. J. Med. Info. Technol. 16, 139–148 (2010)
Google Scholar
Yang, Y., Faraggi, E., Zhao, H., Zhou, Y.: Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of the query and corresponding native properties of templates. Bioinformatics 27, 2076–2082 (2011)
Google Scholar
Zomaya, A.Y.: Parallel Computing for Bioinformatics and Computational Biology: Models, Enabling Technologies, and Case Studies, 1st edn. Wiley-Interscience, New York (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Dariusz Mrozek

Authors

Dariusz Mrozek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dariusz Mrozek .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mrozek, D. (2014). Multithreaded PSS-SQL for Searching Databases of Secondary Structures. In: High-Performance Computational Solutions in Protein Bioinformatics. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-06971-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-06971-5_2
Published: 05 June 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06970-8
Online ISBN: 978-3-319-06971-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics