A Review on Protein Structure Classification

  • N. SajithraEmail author
  • D. Ramyachitra
  • P. Manikandan
Conference paper
Part of the Lecture Notes in Computational Vision and Biomechanics book series (LNCVB, volume 30)


A massive amount of sequence data is gradually produced by the genome projects that have to be annotated in terms of structure, molecular, and biological functions. In structural genomics, the aim is to resolve several protein structures in an efficient way and to exploit the solved protein structures for assigning the biological function to theoretically solved protein structures. In earlier stages, the protein structures are classified manually in a successful manner and now it suffers from updating problem because of the high throughput of recently solved protein structures. To overcome this issue, several data mining techniques have been examined for the structural classification of the protein world. This review article presents an overview of the existing classification techniques, databases, tools, and performance metrics used for evaluating the performance of protein structure classification algorithms.


Protein structure Classification techniques Tools Databases Computational biology Challenges 



The authors like to thank the Department of Science and Technology (DST), New Delhi (DST/INSPIRE Fellowship/2015/IF150093) for the financial support under INSPIRE Fellowship for this research work.


  1. 1.
    Richardson J (1981) The anatomy and taxonomy of protein structure. Adv Protein Chem 34:167CrossRefGoogle Scholar
  2. 2.
    Branden C, Tooze J (1991) Introduction to protein structures. Garland Publishing, New YorkGoogle Scholar
  3. 3.
    Kolodny R et al (2013) On the universe of protein folds. Annu Rev Biophys 42:559–582CrossRefGoogle Scholar
  4. 4.
    Ouzounis CA et al (2003) Classification schemes for protein structure and function. Nat Rev Genet 4(7):508–519CrossRefGoogle Scholar
  5. 5.
    Hadley C, Jones DT (1999) A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Structure 7(9):1099–1112CrossRefGoogle Scholar
  6. 6.
    Pastore A, Lesk AM (1990) Comparison of the structures of globins and phycocyanins: evidence for evolutionary relationship. Proteins 8(2):133–155CrossRefGoogle Scholar
  7. 7.
    Ravantti J et al (2013) Automatic comparison and classification of protein structures. J Struct Biol 183(1):47–56CrossRefGoogle Scholar
  8. 8.
    Palmenberg et al (2009) Sequencing and analyses of all known human rhinovirus genomes reveal structure and evolution. Science 324:55–59CrossRefGoogle Scholar
  9. 9.
    Le Q et al (2009) Structural alphabets for protein structure classification: a comparison study. J Mol Biol 387(2):431–450CrossRefGoogle Scholar
  10. 10.
    Murzin AG et al (1995) Scop: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540Google Scholar
  11. 11.
    Govindarajan S et al (1999) Estimating the total number of protein folds. Proteins: Struct Funct Bioinform 35:408–414CrossRefGoogle Scholar
  12. 12.
    Andreeva et al (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36:D419–D425CrossRefGoogle Scholar
  13. 13.
    Burley S et al (1999) Structural genomics: beyond the human genome project. Nat Genet 23:151–157CrossRefGoogle Scholar
  14. 14.
    Hieter P, Boguski M (1997) Functional genomics: it’s all how you read it. Science 278:601–602CrossRefGoogle Scholar
  15. 15.
    Jain P et al (2009) Supervised machine learning algorithms for protein structure classification. Comput Biol Chem 33(3):216–223CrossRefGoogle Scholar
  16. 16.
    Røgen P, Fain B (2003) Automatic classification of protein structure by using Gauss integrals. Proc Natl Acad Sci U S A. 100(1):119–124CrossRefGoogle Scholar
  17. 17.
    Levy ED et al (2006) 3D complex: a structural classification of protein complexes. PLoS Comput Biol 2(11):e155CrossRefGoogle Scholar
  18. 18.
    Daras P et al (2006) Three-dimensional shape-structure comparison method for protein classification. IEEE/ACM Trans Comput Biol Bioinform 3(3):193–207CrossRefGoogle Scholar
  19. 19.
    Cui X, Gao X (2017) K-nearest uphill clustering in the protein structure space. Neurocomputing 220:52–59CrossRefGoogle Scholar
  20. 20.
    Leon F et al (2009) Performance analysis of algorithms for protein structure classification. In: 2009 IEEE 20th international workshop on database and expert systems application. ISBN: 978-0-7695-3763-4
  21. 21.
    Jain P, Hirst JD (2010) Automatic structure classification of small proteins using random forest. BMC Bioinform 11:364CrossRefGoogle Scholar
  22. 22.
    Dietmann S, Holm L (2001) Identification of homology in protein structure classification. Nat Struct Biol 8(11):953–957CrossRefGoogle Scholar
  23. 23.
    Najibi SM et al (2017) Protein structure classification and loop modeling using multiple Ramachandran distributions. Comput Struct Biotechnol J 8(15):243–254CrossRefGoogle Scholar
  24. 24.
    Swindells MB et al (1998) Contemporary approaches to protein structure classification. BioEssays 20(11):884–891CrossRefGoogle Scholar
  25. 25.
    Sali A, Blundell TL (1990) Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J Mol Biol 212:403–428. Scholar
  26. 26.
    Holm L, Sander C (1993) Protein structure comparison by alignment of distance matrices. J Mol Biol 233:123–138. Scholar
  27. 27.
    Taylor WR, Orengo CA (1989) Protein structure alignment. J Mol Biol 208:1–22CrossRefGoogle Scholar
  28. 28.
    Pedruzzi I et al (2013) HAMAP in 2013, new developments in the protein family classification and annotation system. Nucleic Acids Res 41:D584–D589CrossRefGoogle Scholar
  29. 29.
    Haft DH, Selengut JD, White O (2003) The TIGRFAMs database of protein families. Nucleic Acids Res 31:371–373CrossRefGoogle Scholar
  30. 30.
    Mi H, Muruganujan A, Thomas PD (2013) PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 41:D377–D386CrossRefGoogle Scholar
  31. 31.
    Akiva E et al (2013) The structure–function linkage database. Nucleic Acids Res 42:D521–D530CrossRefGoogle Scholar
  32. 32.
    Finn RD et al (2014) Pfam: the protein families database. Nucleic Acids Res 42:D222–D230CrossRefGoogle Scholar
  33. 33.
    Letunic I, Doerks T, Bork P (2015) SMART: recent updates, new developments and status in 2015. Nucleic Acids Res 43:D257–D260CrossRefGoogle Scholar
  34. 34.
    Hunter S et al (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40:D306–D312CrossRefGoogle Scholar
  35. 35.
    Attwood TK et al (2012) The PRINTS database: a fine-grained protein sequence annotation and analysis resource—its status in 2012. Database 2012:bas019CrossRefGoogle Scholar
  36. 36.
    Sillitoe I et al (2015) CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res 43:D376–D381CrossRefGoogle Scholar
  37. 37.
    Marchler-Bauer A et al (2013) CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res 41:D348–D352CrossRefGoogle Scholar
  38. 38.
    Cheng H et al (2014) ECOD: an evolutionary classification of protein domains. PLoS Comput Biol 10:e1003926CrossRefGoogle Scholar
  39. 39.
    Andreeva A et al (2007) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36:D419–D425CrossRefGoogle Scholar
  40. 40.
    Bernstein FC et al (1977) The protein data bank. Eur J Biochem 80:319–324CrossRefGoogle Scholar
  41. 41.
    Consortium, U (2008) The universal protein resource (UniProt). Nucleic Acids Res 36:D190–D195CrossRefGoogle Scholar
  42. 42.
    Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637CrossRefGoogle Scholar
  43. 43.
    Andreeva A et al (2014) SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res 42:310–314CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceBharathiar UniversityCoimbatoreIndia

Personalised recommendations