Protein Structure Annotations

  • Mirko Torrisi
  • Gianluca Pollastri


This chapter aims to introduce to the specifics of protein structure annotations and their fundamental position in structural bioinformatics, bioinformatics in general. Proteins are profoundly characterised by their structure in every aspect of their functioning and, while over the last decades there has been a close to exponential growth of known protein sequences, the growth of known protein structures has been closer to linear because of the high complexity and cost of determining them. Thus, protein structure predictors are among the most thoroughly assessed tools in bioinformatics (in venues such as CASP or CAMEO) because they allow the structural study of proteins on a large scale. This chapter presents the key types of protein structure annotation and the methods and algorithms for predicting them, with the aim to give both a historical perspective on their development and a snapshot of their current state of the art. From one-dimensional protein annotations – i.e. secondary structure, solvent accessibility and torsion angles – to more complex and informative two-dimensional protein abstractions, i.e. contact maps, both mature and currently developing methods for protein structure annotations are introduced. The aim of this overview is to facilitate the adoption and development of state-of-the-art protein structural predictors. Particular attention is given to some of the best performing and freely available web servers and standalone programmes to predict protein structure annotations.


Protein structure annotations Secondary structure prediction Solvent accessibility prediction Torsional angles prediction Contact map prediction 


  1. Adhikari B, Hou J, Cheng J (2017) DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics 34(9):1466–1472PubMedCentralCrossRefPubMedGoogle Scholar
  2. Ahmad S, Gromiha M, Fawareh H, Sarai A (2004) ASAView: database and tool for solvent accessibility representation in proteins. BMC Bioinformatics 5:51PubMedPubMedCentralCrossRefGoogle Scholar
  3. Aloy P, Stark A, Hadley C, Russell RB (2003) Predictions without templates: new folds, secondary structure, and contacts in CASP5. Proteins Struct Funct Bioinforma 53(S6):436–456CrossRefGoogle Scholar
  4. Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402PubMedPubMedCentralCrossRefGoogle Scholar
  5. Baldi P, Brunak S, Frasconi P, Soda G, Pollastri G (1999) Exploiting the past and the future in protein secondary structure prediction. Bioinforma Oxf Engl 15(11):937–946CrossRefGoogle Scholar
  6. Bartoli L, Capriotti E, Fariselli P, Martelli PL, Casadio R (2008) The pros and cons of predicting protein contact maps. Methods Mol Biol Clifton NJ 413:199–217Google Scholar
  7. Baú D, Martin AJ, Mooney C, Vullo A, Walsh I, Pollastri G (2006) Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins. BMC Bioinformatics 7:402PubMedPubMedCentralCrossRefGoogle Scholar
  8. Berman HM et al (2000) The Protein Data Bank. Nucleic Acids Res 28(1):235–242PubMedPubMedCentralCrossRefGoogle Scholar
  9. Buchan DWA, Jones DT (2018) Improved protein contact predictions with the MetaPSICOV2 server in CASP12. Proteins 86(Suppl 1):78–83PubMedCrossRefGoogle Scholar
  10. Buchan DWA, Ward SM, Lobley AE, Nugent TCO, Bryson K, Jones DT (2010) Protein annotation and modelling servers at University College London. Nucleic Acids Res 38(suppl_2):W563–W568PubMedPubMedCentralCrossRefGoogle Scholar
  11. Bystroff C, Thorsson V, Baker D (2000) HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J Mol Biol 301(1):173–190PubMedCrossRefGoogle Scholar
  12. Cheng J, Baldi P (2007) Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics 8:113PubMedPubMedCentralCrossRefGoogle Scholar
  13. Cheng J, Randall AZ, Sweredoski MJ, Baldi P (2005) SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res 33(suppl_2):W72–W76PubMedPubMedCentralCrossRefGoogle Scholar
  14. Chou PY, Fasman GD (1974) Prediction of protein conformation. Biochemistry (Mosc) 13(2):222–245CrossRefGoogle Scholar
  15. Cornette JL, Cease KB, Margalit H, Spouge JL, Berzofsky JA, DeLisi C (1987) Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. J Mol Biol 195(3):659–685PubMedCrossRefGoogle Scholar
  16. Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton GJ (1998) JPred: a consensus secondary structure prediction server. Bioinformatics 14(10):892–893PubMedCrossRefGoogle Scholar
  17. De Brevern AG, Etchebest C, Hazout S (2000) Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins Struct Funct Bioinforma 41(3):271–287CrossRefGoogle Scholar
  18. Di Lena P, Fariselli P, Margara L, Vassura M, Casadio R (2010) Fast overlapping of protein contact maps by alignment of eigenvectors. Bioinformatics 26(18):2250–2258PubMedCrossRefGoogle Scholar
  19. Di Lena P, Fariselli P, Margara L, Vassura M, Casadio R (2011) Is there an optimal substitution matrix for contact prediction with correlated mutations? IEEEACM Trans Comput Biol Bioinforma 8(4):1017–1028CrossRefGoogle Scholar
  20. Di Lena P, Nagata K, Baldi P (2012) Deep architectures for protein contact map prediction. Bioinformatics 28(19):2449–2457PubMedPubMedCentralCrossRefGoogle Scholar
  21. Drozdetskiy A, Cole C, Procter J, Barton GJ (2015) JPred4: a protein secondary structure prediction server. Nucleic Acids Res 43(W1):W389–W394PubMedPubMedCentralCrossRefGoogle Scholar
  22. Eickholt J, Cheng J (2012) Predicting protein residue–residue contacts using deep networks and boosting. Bioinformatics 28(23):3066–3072PubMedPubMedCentralCrossRefGoogle Scholar
  23. Faraggi E, Yang Y, Zhang S, Zhou Y (2009) Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. Structure 17(11):1515–1527PubMedPubMedCentralCrossRefGoogle Scholar
  24. Fariselli P, Olmea O, Valencia A, Casadio R (2001) Prediction of contact maps with neural networks and correlated mutations. Protein Eng Des Sel 14(11):835–843CrossRefGoogle Scholar
  25. Fauchère JL, Charton M, Kier LB, Verloop A, Pliska V (1988) Amino acid side chain parameters for correlation studies in biology and pharmacology. Int J Pept Protein Res 32(4):269–278PubMedCrossRefGoogle Scholar
  26. Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39(Web Server issue):W29–W37PubMedPubMedCentralCrossRefGoogle Scholar
  27. Göbel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins 18(4):309–317PubMedCrossRefGoogle Scholar
  28. Haas J et al Continuous Automated Model Evaluation (CAMEO) complementing the critical assessment of structure prediction in CASP12. Proteins: Struct Funct Bioinf p. n/a-n/aGoogle Scholar
  29. Heffernan R et al (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5:11476PubMedPubMedCentralCrossRefGoogle Scholar
  30. Heffernan R et al (2016) Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins. Bioinformatics 32(6):843–849PubMedCrossRefGoogle Scholar
  31. Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 33(18):2842–2849PubMedCrossRefGoogle Scholar
  32. Holbrook SR, Muskal SM, Kim SH (1990) Predicting surface exposure of amino acids from protein sequence. Protein Eng 3(8):659–665PubMedCrossRefGoogle Scholar
  33. Huang Y, Bystroff C (2006) Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions. Bioinformatics 22(4):413–422PubMedCrossRefGoogle Scholar
  34. Johnson LS, Eddy SR, Portugaly E (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics 11:431PubMedPubMedCentralCrossRefGoogle Scholar
  35. Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292(2):195–202PubMedCrossRefGoogle Scholar
  36. Jones DT, Swindells MB (2002) Getting the most from PSI–BLAST. Trends Biochem Sci 27(3):161–164PubMedCrossRefGoogle Scholar
  37. Jones DT, Buchan DWA, Cozzetto D, Pontil M (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28(2):184–190PubMedCrossRefGoogle Scholar
  38. Jones DT, Singh T, Kosciolek T, Tetchner S (2015) MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 31(7):999–1006PubMedCrossRefGoogle Scholar
  39. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637PubMedCrossRefGoogle Scholar
  40. Kaján L, Hopf TA, Kalaš M, Marks DS, Rost B (2014) FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC Bioinformatics 15:85PubMedPubMedCentralCrossRefGoogle Scholar
  41. Kendrew JC et al (1960) Structure of myoglobin: a three-dimensional Fourier synthesis at 2 A. resolution. Nature 185(4711):422–427PubMedCrossRefGoogle Scholar
  42. Kim DE, DiMaio F, Wang RY-R, Song Y, Baker D (2014) One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Proteins 82(2):208–218PubMedCrossRefGoogle Scholar
  43. Kinch LN, Li W, Monastyrskyy B, Kryshtafovych A, Grishin NV (2016) Assessment of CASP11 contact-assisted predictions. Proteins 84(Suppl 1):164–180PubMedPubMedCentralCrossRefGoogle Scholar
  44. Kosciolek T, Jones DT (2014) De Novo structure prediction of globular proteins aided by sequence variation-derived contacts. PLoS One 9(3):e92197PubMedPubMedCentralCrossRefGoogle Scholar
  45. Kosciolek T, Jones DT (2016) Accurate contact predictions using covariation techniques and machine learning. Proteins 84(Suppl 1):145–151PubMedCrossRefGoogle Scholar
  46. Kuang R, Leslie CS, Yang A-S (2004) Protein backbone angle prediction with machine learning approaches. Bioinformatics 20(10):1612–1621PubMedCrossRefGoogle Scholar
  47. Kukic P, Mirabello C, Tradigo G, Walsh I, Veltri P, Pollastri G (2014) Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks. BMC Bioinformatics 15:6PubMedPubMedCentralCrossRefGoogle Scholar
  48. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444PubMedCrossRefGoogle Scholar
  49. Lyons J et al (2014) Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto-encoder deep neural network. J Comput Chem 35(28):2040–2046PubMedCrossRefGoogle Scholar
  50. MacCallum RM (2004) Striped sheets and protein contact prediction. Bioinformatics 20(suppl_1):i224–i231PubMedCrossRefGoogle Scholar
  51. Magnan CN, Baldi P (2014) SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics 30(18):2592–2597PubMedPubMedCentralCrossRefGoogle Scholar
  52. Martin J, Letellier G, Marin A, Taly J-F, de Brevern AG, Gibrat J-F (2005) Protein secondary structure assignment revisited: a detailed analysis of different assignment methods. BMC Struct Biol 5:17PubMedPubMedCentralCrossRefGoogle Scholar
  53. Martin AJ, Mooney C, Walsh I, Pollastri G (2010) Contact map prediction by machine learning. In: Pan Y, Zomaya A, Rangwala H, Karypis G (eds) Introduction to protein structure prediction. Wiley.
  54. Mirabello C, Pollastri G (2013) Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility. Bioinformatics 29(16):2056–2058PubMedCrossRefGoogle Scholar
  55. Monastyrskyy B, D’Andrea D, Fidelis K, Tramontano A, Kryshtafovych A (2014) Evaluation of residue–residue contact prediction in CASP10. Proteins Struct Funct Bioinforma 82:138–153CrossRefGoogle Scholar
  56. Monastyrskyy B, D’Andrea D, Fidelis K, Tramontano A, Kryshtafovych A (2016) New encouraging developments in contact prediction: Assessment of the CASP11 results. Proteins 84(Suppl 1):131–144PubMedCrossRefGoogle Scholar
  57. Mooney C, Pollastri G (2009) Beyond the twilight zone: automated prediction of structural properties of proteins by recursive neural networks and remote homology information. Proteins Struct Funct Bioinforma 77(1):181–190CrossRefGoogle Scholar
  58. Mooney C, Vullo A, Pollastri G (2006) Protein structural motif prediction in multidimensional ø-ψ space leads to improved secondary structure prediction. J Comput Biol 13(8):1489–1502PubMedCrossRefGoogle Scholar
  59. Mooney C, Cessieux A, Shields DC, Pollastri G (2013) SCL-Epred: a generalised de novo eukaryotic protein subcellular localisation predictor. Amino Acids 45(2):291–299PubMedCrossRefGoogle Scholar
  60. Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540PubMedGoogle Scholar
  61. Olmea O, Valencia A (1997) Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold Des 2:S25–S32PubMedCrossRefGoogle Scholar
  62. Pascarella S, Persio RD, Bossa F, Argos P (1998) Easy method to predict solvent accessibility from multiple protein sequence alignments. Proteins Struct Funct Bioinforma 32(2):190–199CrossRefGoogle Scholar
  63. Pauling L, Corey RB (1951) Configurations of polypeptide chains with favored orientations around single bonds. Proc Natl Acad Sci U S A 37(11):729–740PubMedPubMedCentralCrossRefGoogle Scholar
  64. Pazos F, Helmer-Citterich M, Ausiello G, Valencia A (1997) Correlated mutations contain information about protein-protein interaction. J Mol Biol 271(4):511–523PubMedCrossRefGoogle Scholar
  65. Perutz MF, Rossmann MG, Cullis AF, Muirhead H, Will G, North AC (1960) Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-A. resolution, obtained by X-ray analysis. Nature 185(4711):416–422PubMedCrossRefGoogle Scholar
  66. Pollastri G, Baldi P (2002) Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics 18(suppl_1):S62–S70PubMedCrossRefGoogle Scholar
  67. Pollastri G, McLysaght A (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21(8):1719–1720PubMedCrossRefGoogle Scholar
  68. Pollastri G, Baldi P, Fariselli P, Casadio R (2001) Improved prediction of the number of residue contacts in proteins by recurrent neural networks. Bioinformatics 17(suppl_1):S234–S242PubMedCrossRefGoogle Scholar
  69. Pollastri G, Baldi P, Fariselli P, Casadio R (2002) Prediction of coordination number and relative solvent accessibility in proteins. Proteins 47(2):142–153PubMedCrossRefGoogle Scholar
  70. Pollastri G, Martin AJ, Mooney C, Vullo A (2007) Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information. BMC Bioinformatics 8:201PubMedPubMedCentralCrossRefGoogle Scholar
  71. Remmert M, Biegert A, Hauser A, Söding J (2012) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9(2):173–175CrossRefGoogle Scholar
  72. Rost B (2001) Review: protein secondary structure prediction continues to rise. J Struct Biol 134(2):204–218PubMedCrossRefGoogle Scholar
  73. Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232(2):584–599PubMedCrossRefGoogle Scholar
  74. Rost B, Sander C (1994) Conservation and prediction of solvent accessibility in protein families. Proteins 20(3):216–226PubMedCrossRefGoogle Scholar
  75. Roy A, Kucukural A, Zhang Y (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 5(4):725–738PubMedPubMedCentralCrossRefGoogle Scholar
  76. Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AMJJ (2018) Assessment of contact predictions in CASP12: co-evolution and deep learning coming of age. Proteins Struct Funct Bioinforma 86:51–66CrossRefGoogle Scholar
  77. Schäffer AA et al (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29(14):2994–3005PubMedPubMedCentralCrossRefGoogle Scholar
  78. Schlessinger A, Punta M, Rost B (2007) Natively unstructured regions in proteins identified from contact predictions. Bioinforma Oxf Engl 23(18):2376–2384CrossRefGoogle Scholar
  79. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117PubMedCrossRefGoogle Scholar
  80. Seemayer S, Gruber M, Söding J (2014) CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations. Bioinformatics 30(21):3128–3130PubMedPubMedCentralCrossRefGoogle Scholar
  81. Sims GE, Choi I-G, Kim S-H (2005) Protein conformational space in higher order ϕ-Ψ maps. Proc Natl Acad Sci U S A 102(3):618–621PubMedPubMedCentralCrossRefGoogle Scholar
  82. Tegge AN, Wang Z, Eickholt J, Cheng J (2009) NNcon: improved protein contact map prediction using 2D-recursive neural networks. Nucleic Acids Res 37(suppl_2):W515–W518PubMedPubMedCentralCrossRefGoogle Scholar
  83. The UniProt Consortium (2016) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45(D1):D158–D169CrossRefGoogle Scholar
  84. Thomas H (2005) An amino acid has two sides: a new 2D measure provides a different view of solvent exposure. Proteins Struct Funct Bioinforma 59(1):38–48CrossRefGoogle Scholar
  85. Ting D, Wang G, Shapovalov M, Mitra R, Jordan MI, Jr RLD (2010) Neighbor-dependent Ramachandran probability distributions of amino acids developed from a Hierarchical Dirichlet process model. PLoS Comput Biol 6(4):e1000763PubMedPubMedCentralCrossRefGoogle Scholar
  86. Torrisi M, Kaleel M, Pollastri G (2018) Porter 5: state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes. bioRxiv:289033Google Scholar
  87. Vassura M, Margara L, Di Lena P, Medri F, Fariselli P, Casadio R (2008) Reconstruction of 3D structures from protein contact maps. IEEEACM Trans Comput Biol Bioinforma 5(3):357–367CrossRefGoogle Scholar
  88. Vassura M et al (2011) Blurring contact maps of thousands of proteins: what we can learn by reconstructing 3D structure. BioData Min 4:1PubMedPubMedCentralCrossRefGoogle Scholar
  89. Vendruscolo M, Kussell E, Domany E (1997) Recovery of protein structure from contact maps. Fold Des 2(5):295–306PubMedCrossRefGoogle Scholar
  90. Vullo A, Walsh I, Pollastri G (2006) A two-stage approach for improved prediction of residue contact maps. BMC Bioinformatics 7:180PubMedPubMedCentralCrossRefGoogle Scholar
  91. Walsh I, Baù D, Martin AJ, Mooney C, Vullo A, Pollastri G (2009) Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks. BMC Struct Biol 9:5PubMedPubMedCentralCrossRefGoogle Scholar
  92. Walsh I, Pollastri G, Tosatto SCE (2016) Correct machine learning on protein sequences: a peer-reviewing perspective. Brief Bioinform 17(5):831–840PubMedCrossRefGoogle Scholar
  93. Wang S, Li W, Liu S, Xu J (2016) RaptorX-property: a web server for protein structure property prediction. Nucleic Acids Res 44(W1):W430–W435PubMedPubMedCentralCrossRefGoogle Scholar
  94. Wang S, Sun S, Li Z, Zhang R, Xu J (2017) Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput Biol 13(1):e1005324PubMedPubMedCentralCrossRefGoogle Scholar
  95. Wang S, Sun S, Xu J (2018) Analysis of deep learning methods for blind protein contact prediction in CASP12. Proteins 86(Suppl 1):67–77PubMedCrossRefGoogle Scholar
  96. Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ (2009) Jalview version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25(9):1189–1191PubMedPubMedCentralCrossRefGoogle Scholar
  97. Wood MJ, Hirst JD (2005) Protein secondary structure prediction with dihedral angles. Proteins Struct Funct Bioinforma 59(3):476–481CrossRefGoogle Scholar
  98. Xia L, Pan X-M (2000) New method for accurate prediction of solvent accessibility from protein sequence. Proteins Struct Funct Bioinforma 42(1):1–5Google Scholar
  99. Yang Y, Faraggi E, Zhao H, Zhou Y (2011) Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 27(15):2076–2082PubMedPubMedCentralCrossRefGoogle Scholar
  100. Yang Y et al (2016) Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform 19(3):482–494PubMedCentralPubMedGoogle Scholar
  101. Yuan Z (2005) Better prediction of protein contact number using a support vector regression analysis of amino acid sequence. BMC Bioinformatics 6:248PubMedPubMedCentralCrossRefGoogle Scholar
  102. Zemla A (2003) LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res 31(13):3370–3374PubMedPubMedCentralCrossRefGoogle Scholar
  103. Zemla A, Venclovas Č, Fidelis K, Rost B (1999) A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment. Proteins Struct Funct Bioinforma 34(2):220–223CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Mirko Torrisi
    • 1
  • Gianluca Pollastri
    • 1
  1. 1.School of Computer Science, University College DublinDublinIreland

Personalised recommendations