Encyclopedia of Biophysics

Living Edition
| Editors: Gordon Roberts, Anthony Watts, European Biophysical Societies

Protein Secondary Structure Prediction in 2018

  • Edda Kloppmann
  • Jonas ReebEmail author
  • Peter Hönigschmid
  • Burkhard Rost
Living reference work entry
DOI: https://doi.org/10.1007/978-3-642-35943-9_429-1

Synonyms

Definition

Protein secondary structure prediction aims at the prediction of secondary structure on the residue level from sequence information alone. Predicted are commonly alpha-helices and beta-strands, i.e., the most prevalent regular secondary structure segments. On the opposite side of regular secondary structure are irregular or disordered regions often referred to as loops, random coils, or disorder.

Introduction

Fifteen years ago, science leaped when putting up the almost entire blueprint for human life. Now that the parts are known, can this blueprint be used as a manual to understand how the machine works? “Like with every proper manual, usually we do not find the information we need and in the rare cases that we do, we do not understand the answer” jokes Anna Tramontano (La Sapienza, Rome, 1957–2017). Every year since, new surprising findings gave glimpses at how incomplete the knowledge is. Despite immense advances in molecular biology over the last 15 years, substantial experimental information for around 15% of human proteins remains missing (Baker et al. 2017). Structural biology has also leaped over the last 15 years: 90% of the experimental high-resolution structures known today have been determined after 2000. In parallel with this increase, the number of proteins of known sequence has exploded. In September 2018, about 144,000 experimental protein structures were in the Protein Data Bank (PDB, www.pdb.org; Berman et al. 2000; Rose et al. 2017) as opposed to about 125 million protein sequences in UniProtKB (The UniProt Consortium 2017).

Secondary structure is arguably the simplest, meaningful aspect of protein structure. It is derived from the amino acid sequence (misleadingly dubbed “primary structure” in the past) and serves as the building blocks to form the full three-dimensional (3D) or tertiary structure. In terms of information content, secondary structure is essentially one dimensional as it can be mapped onto a string of letters that assign a secondary structure state to each residue.

Linus Pauling pioneered the idea that protein structures consist of regular substructures, namely, secondary structure, before any 3D structure had been experimentally observed. He proposed many types and shapes for such substructures all stabilized by hydrogen bond formation (Pauling and Corey 1950). One of these substructures, alpha-helices (Fig. 1a), was observed a decade later when the first high-resolution protein structures were solved: myoglobin and hemoglobin (Kendrew et al. 1960; Perutz et al. 1960). The second major regular substructures are beta-strands that aggregate into beta-sheets (Fig. 1b). Proposed already in 1933, and refined by Pauling and Corey, they were first observed in structures of lysozyme in 1965 (Astbury 1933; Eisenberg 2003).
Fig. 1

Regular secondary structure. The protein shown is Rad60, which mimics small ubiquitin-like modifier proteins. Its 3D structure has been experimentally determined at an extremely high level of detail, i.e., to a resolution of <1 Å (PDB identifier 3goe (Prudden et al. 2009)). Alpha-helices are shown in orange (a) and beta-sheets are indicated as green arrows (b). Alpha-helices are stabilized by hydrogen bonding usually between residues i and i + 4 (a, hydrogen bonds marked by blue dashes), while beta-sheets are formed between beta-strands that connect residues more distant in terms of their sequence separation (b, hydrogen bonds marked by blue dashes). Beta-sheets typically involve hydrogen bonding of every other residue

Protein Structure Classification Based on Secondary Structure

Michael Levitt proposed to classify proteins according to these main constituents into alpha, beta, and alpha+beta (Levitt and Greer 1977). He thereby introduced the first step that is still at the heart of today’s two major protein structure classifications, namely, of CATH (www.cathdb.info; Sillitoe et al. 2015) and SCOP (scop2.mrc-lmb.cam.ac.uk/scop; Andreeva et al. 2014). A more recent addition to structure classification is TopSearch (topsearch.services.came.sbg.ac.at; Wiederstein et al. 2014) that importantly deviates from this concept by not using derivatives of the original Levitt classes for any major classification step. However, it still uses the concept of secondary structure.

There are different types of helices and strands, and there are several methods that automatically assign those types from 3D coordinates as deposited in the PDB. Most often used is DSSP (Kabsch and Sander 1983). DSSP identifies hydrogen bonds through simple electrostatic energy and then assembles regular patterns of hydrogen bonds into eight classes. About 37% of residues for which experimental structures are available can be classified as alpha-helix and 22% as beta-strand. All other residues are in regions that appear less regular. Often, they are referred to as loops or even more misleadingly as random coils.

Application of Secondary Structure Prediction

Secondary structure helps in many contexts, for example, when aligning proteins in the twilight and midnight zones of sequence comparisons (Rost 1999) in which evolutionary sequence relations are hardly recognizable. It can be used to verify comparative models, to predict protein structure, and to support guesses about protein function and evolution. Furthermore, it serves as potentially the most important input feature used by higher-level prediction methods that address questions beyond structure such as predictions of protein binding, functional classes, and subcellular localization. Secondary structure predictions are also one of the most important input features used by methods that predict the effect of non-synonymous SNPs, i.e., single amino acid changes, or more generally of methods that explore dynamical features of proteins. Thus, starting with good predictions of secondary structure is always a good idea.

Prediction Continuously Improved to over 80% Q3

Performance Measure. The simplest way to measure the performance of secondary structure prediction is the three-state per-residue accuracy (Q3) that describes the percentage of all residues correctly predicted in either of the three states: helix, strand, and others. Some proteins are easier to predict than others, and the prediction accuracy is compiled over a distribution of proteins (Fig. 2).
Fig. 2

Distribution of Q3 scores. Q3 gives the percentage of residues correctly predicted in one of the three states: helix, strand, and others. Here, the Q3 score is calculated individually for each protein chain in a set of 2,546 nonredundant protein chains. Plotted is the distribution of Q3 scores for this dataset. The distribution averages to a Q3 score around 82% with a range from 33% to 100%. This distribution originates from ReProf predictions (Yachdav et al. 2014, http://www.predictprotein.org); it is likely similar to that of today’s top prediction method(s). Which side of the distribution is your protein likely to be on? The best way to answer that question is to explore the reliability index (Fig. 3)

Before the first protein structure was solved, A.G. Szent-Györgyi predicted secondary structure based on the intrinsic biophysical features of proline residues, which are known to break helices (Szent-Györgyi and Cohen 1957). Even though it is often true, this simple rule on its own makes a very poor prediction method. In fact, the following statement is overwhelmingly valid: Simple rules that may suggest a pseudo-understanding of protein structure formation do not suffice to predict structure from sequence.

First-Generation Methods. A major step forward in secondary structure prediction condensed around the deep mind of Barry Robson. The first steps in the mid-1970s compiled simple statistics that measured the preference of a certain amino acid for particular types of secondary structure. Jean Garnier and David Osguthorpe complemented this and shaped the first successful approach among these methods, called GOR (Garnier et al. 1978). This first type of method reached levels of Q3 around 55% (Rost et al. 1993).

Second-Generation Methods. The next big step forward expanded the simple single residue statistics (i.e., what are the odds of a proline residue to be in a helix), to the level of segments of several consecutive residues, (i.e., what are the odds of a proline residue to be in a helix when flanked by these specific residues on either side). GOR3 (Garnier et al. 1996; Gibrat et al. 1987) was one of the most successful representatives of these methods that reached Q3 levels around 62% (Rost et al. 1993).

Secondary structure formation is determined by local interactions within a fragment of residues with length N, as well as global interactions involving partners from outside a fragment of N residues. Second-generation methods predicted the secondary structure of the residue at the center of a fragment of N consecutive residues, with N ranging from 5 to 50 and thereby addressing only local interactions. One could hypothesize that only 62% of the interactions are determined locally, and therefore, prediction accuracy is capped at 62%.

One way to address this hypothesis would be to use larger fragments. The empirical finding was that N = 11 was better than N = 9 was better than N = 8 and so forth. However, there is a limit: N = 50 is not better than N = 11. This observation could mean that global interactions do not matter much. On the other hand, this finding can also be explained differently. It is probable that global interactions are important and that longer fragments are dominated in terms of their information content by noise, i.e., increasing N decreases the signal-to-noise ratio. The smaller the dataset in question, the harder it is to detect the signal.

Third-Generation Methods. This challenge was addressed by the third-generation methods that use evolutionary information to embed global information and to increase the information density (signal-to-noise ratio). To achieve this, users first have to build a multiple sequence alignment using proteins related to the query sequence for which they seek the prediction. When introduced, these methods used relatively simple pairwise alignments against databases that were small by today’s numbers. Nevertheless, performance was immediately boosted to levels above 72%. The first method that surpassed a sustained level of Q3 > 72% was PHDsec (Rost 1996). Several other improvements specifically addressed issues such as improved prediction for beta-strands. PROFsec (Rost 1996) further improved the approach by extending the sequence profiles and by combining predictions of solvent accessibility and secondary structure. Better database search methods and larger databases brought the performance to levels of 78% Q3 (Jones 1999; Przybylski and Rost 2002). Due to the ongoing search for improvements and the increase of available data, today’s methods such as s2D (Sormanni et al. 2015), RaptorX Property (Wang et al. 2016), SPIDER3 (Heffernan et al. 2015, 2017), or ReProf (Yachdav et al. 2014) are citing Q3 values from 80% to 85%.

Prediction Based on Known Structures. All the above numbers hold for proteins that have no significant similarity to proteins for which experimental structures are already available. However, the structural coverage, i.e., the percentage of proteins that can be modeled based on the known experimental structures, is increasing steadily (Kiefer et al. 2009). Therefore, most proteins known today have some local region for which some aspects of structure can be modeled. Is secondary structure inferred from these models better than de novo secondary structure prediction methods when Q3 is measured?

The answer depends on the quality of the model, i.e., ultimately on the similarity between the protein under investigation and the protein with a similar sequence for which an experimental structure is available. For high similarity, reading secondary structure predictions off the comparative models is better than using expert prediction methods. For low levels of similarity, prediction methods are better (Marti-Renom et al. 2002; Eyrich et al. 2003; Faraggi et al. 2012). However, the increased information stored in data banks have given rise to new methods that rely more heavily on available structural data (Zhang et al. 2011).

Hints for Users

  1. 1.

    Find the right method. Good and mediocre secondary structure prediction methods differ substantially in their usefulness. Newer methods are sometimes better than older ones, but the newest methods are not necessarily the best. More readily available methods are also not necessarily better. Thus, the first advice is to spend some time on identifying a few of the good methods.

     
  2. 2.

    Compare methods but rely more on reliability indices than on consensus. Assume you applied the “top methods” for your protein. Which one of these predictions should you use? Many users tend to believe more in residues that are predicted the same way by different methods, i.e., that would exploit some consensus between the methods (Zhang et al. 2011). There are many good reasons for expecting that such consensus or averages help. However, in a field in which tools are as diligently crafted as the top secondary structure prediction methods, what is usually advisable may become a mistake. The best methods provide estimates for the reliability of each prediction (Fig. 3), and if the developers have done this well, such estimates are much more relevant than any type of simple consensus (Eyrich et al. 2003). Although some publications demonstrate that consensus methods can fare better (Zhang et al. 2011), finding such a method will be difficult.

     
  3. 3.

    There may or there may not be today’s best method. The previous two rules suggest that this entry would be most helpful if it suggested a list of top methods. There are many reasons why this entry does not provide such a list. The simplest is that the objective comparison of methods that automatically assessed the state of the art in the field until a few years ago are no longer alive (Eyrich et al. 2003; Rychlewski and Fischer 2005). Conclusions about the state of the art from the literature alone might be very misleading. Furthermore, any noncontinuous assessment method will at best provide a correct view up to a certain time. This encyclopedia will hopefully help you long after that time.

     
  4. 4.

    Better alignments, better predictions. The major source of improvement of prediction methods over the last two decades originated from growing databases and from adequately integrating this increased information. The single most important way to improve predictions is by improving the extraction of information contained in the multiple sequence alignment utilized to get the prediction. For the particular case of secondary structure prediction of water-soluble proteins, this largely boils down to the more diverse family members you include in the alignment, the better. (Even at the price of including some remote nonmembers by mistake!)

    Typically, hidden Markov models (HMMs) are better at identifying distant relations than PSI-BLAST (Altschul 1997), and typically profile-profile alignment methods are the best. For instance, one recent method (ReProf, http://www.predictprotein.org) increased secondary structure prediction by almost an entire percentage point simply by switching from PSI-BLAST to HHblits (http://toolkit.lmb.uni-muenchen.de/hhblits; Remmert et al. 2012).

     
  5. 5.

    Predictions have 20% mistakes: find them! Levels of about 80% accuracy imply that 20% of the residues are predicted incorrectly. Roughly a fourth of these are “bad” mistakes of the type “helix predicted where strand is observed” and vice versa (Rost 2005; Zhang et al. 2011). Most mistakes for helices and strands tend to be on the protein surface. In contrast, few mistakes tend to characterize active and binding sites. However, all of these facts comprise averages: predictions are substantially worse for some proteins and much better for others (see Fig. 2). Thus, the error rate ranges from almost entirely correct in some proteins to levels close to random predictions for others. Reliability indices can provide some clue whether your protein is more likely to be an average performer or belonging to either of those two extremes: the more residues are predicted at unusually high levels of reliability, the higher the accuracy for the protein. Taking a closer look at residues with low reliability often reveals an important story.

    After several decades of research in the field, it is still not possible to identify what types of proteins fall more often into which extreme of these classes. There are trivial correlations of the type: Since most methods predict strand less accurately than helix, proteins with more helix content tend to be predicted more accurately. However, no correlation claimed between the success in secondary structure prediction and functional traits has withstood the winds of time. Extreme examples are orthologous enzymes, i.e., enzymes that perform an identical or similar function in different organisms. These may differ strongly in terms of prediction accuracy, although no such trends can reliably be established for the difference in performance between different organisms, in general.

     
  6. 6.

    Secondary structure predictions help to predict protein disorder. The study of protein structures reveals again and again how the intricate details of structures determine function. However, many proteins, in particular in eukaryotes, have long regions of what is often referred to as intrinsically unstructured or disordered, i.e., regions that are dynamic and fluctuate in conformational space (Dunker and Obradovic 2001; Dunker et al. 2008). Differently put, if one shone a light at them at different time points, one would not observe the same pattern. Disorder occurs in regions that need flexibility to bind to many different substrates or to impose access to a large space to sense intruders into this region (like light sensors in airports) or to buffer intrusion (like filling material in packages). Estimates about how much disorder can be observed in human proteins differ widely according to the choice of parameters and methods. One simplification is that 15–30% of all human proteins have at least 1 region with at least 50 disordered residues. These numbers increase almost twofold when reducing the criterion to a minimum of 30 consecutive residues (Schlessinger et al. 2011). Some disorder is strongly enriched in non-regular secondary structure and others in contact-deprived helices. Furthermore, other disorder has a high propensity for secondary structure switching. Consequently, secondary structure prediction methods help in the identification of protein disorder.

     
  7. 7.
    Important new information for understanding function and evolution. The pursuit of understanding protein function and evolution typically begins with the study of sequence data. Often, analyses also end there, wasting the wealth of detail available from protein structures that usually is needed to discriminate between alternative hypotheses. Since models based on experimental 3D structures are commonly available for fewer than 30–40% of all residues, important details are often missing. Fortunately, secondary structure already captures some of these details in many cases. Therefore, predictions of secondary structure often help more to identify evolutionary and functional similarities than comparisons of secondary structure derived from experimental 3D structures (Przybylski and Rost 2004).
    Fig. 3

    Reliability Index. Cumulative percentage of residues and Q3 scores are plotted at different reliability index (RI) values. The reliability index reflects the strength of a particular prediction for one particular residue. This index enables users to zoom into the most reliable prediction since performance is proportional to the RI as the curve shows. For instance, roughly half of all the residues are predicted at RI ≥ 8. For this most reliably predicted subset, prediction accuracy (Q3) is almost 95%, i.e., about 13 percentage points higher than the performance for all residues. This plot was generated using ReProf for the same dataset as used for Fig. 2

     

Examples for Methods

In the following, a collection of secondary structure prediction methods is presented with respect to their quality and availability. As with all methods in computational biology, for many reasons the most readily available prediction methods are often not the best ones. In fact, some of the easiest-to-reach methods perform substantially worse than a method any advanced student can develop in a short summer.

PHDsec/PROFsec/ReProf, PredictProtein. ReProf is the latest improvement in this series of methods and increases performance substantially by replacing PSI-BLAST with HHblits. ReProf is available as web service, as package, and as a web server through the first Internet server in the field, namely, PredictProtein (www.predictprotein.org, Yachdav et al. 2014).

PSIPRED, developed by the group of David Jones. PSIPRED (Jones 1999) is clearly one of the top performers (Eyrich et al. 2003). The important step introduced by PSIPRED was the move from the simple BLAST-like pairwise alignments to using further reaching PSI-BLAST-based alignments. The group has continuously updated the service. It is available as a stand-alone version and as a web server together with a suite of other prediction methods (http://bioinf.cs.ucl.ac.uk/psipred/, Buchan et al. 2013).

SABLE. SABLE initially improved over PSIPRED by combining PSI-BLAST profiles and neural networks and predicted solvent accessibility to improve the prediction of secondary structure (Adamczak et al. 2005). A web server and a stand-alone version are available (http://sable.cchmc.org/).

PORTER, Distill. PORTER defines one important point in the ongoing struggle for advances (Mirabello and Pollastri 2013). Improvements come through using more advanced machine learning devices than those exploited in ReProf and PSIPRED, in addressing particular shortcomings such as finding an optimal path between comparative modeling and de novo prediction and in the particular way in which output from different structure prediction methods is combined to predict secondary structure. The web server Distill (distill.ucd.ie/distill) combines various structure prediction methods from the group (Bau et al. 2006).

S2D. This method uses multiple layers of neural networks for the prediction (Sormanni et al. 2015). It also differs from the other methods in this list through its usage of a training set built from chemical shift analyses of NMR structures which allows it to better distinguish between ordered secondary structure and disordered regions.

SPIDER3. This recent deep learning-based approach combines the prediction of several protein structural features, including secondary structure and solvent accessibility, simultaneously by training multiple bi-directional recurrent neural network and then using all predicted outputs from the previous iterations as input in the following ones (Heffernan et al. 2017).

Summary

This entry introduces the prediction of protein secondary structure, namely, of alpha-helix, beta-strand, and others. Since secondary structure is the simplest, yet meaningful aspect of protein structure, it is utilized in a large variety of applications: in protein structure prediction, when aligning proteins with little sequence similarity, and in the prediction of protein binding and subcellular localization among others. In general, predicting secondary structure is always a good idea to start any attempt at analyzing a protein sequence.

The history of secondary structure predictions is described, i.e., important aspects and the ongoing improvement of the prediction methods. Furthermore, useful hints for users are provided and a number of today’s secondary structure prediction methods described.

Cross-References

References

  1. Adamczak R, Porollo A, Meller J (2005) Combining prediction of secondary structure and solvent accessibility in proteins. Proteins 59(3):467–475.  https://doi.org/10.1002/prot.20441CrossRefPubMedGoogle Scholar
  2. Altschul S (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402.  https://doi.org/10.1093/nar/25.17.3389CrossRefPubMedPubMedCentralGoogle Scholar
  3. Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG (2014) SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res 42(D1):310–314.  https://doi.org/10.1093/nar/gkt1242CrossRefGoogle Scholar
  4. Astbury WT (1933) The X-ray interpretation of fibre structure. J Soc Dye Colour 49(6):168CrossRefGoogle Scholar
  5. Baker MS, Ahn SB, Mohamedali A, Islam MT, Cantor D, Verhaert PD, Fanayan S, Sharma S, Nice EC, Connor M, Ranganathan S (2017) Accelerating the search for the missing proteins in the human proteome. Nat Commun 8(May 2016):14271–14271.  https://doi.org/10.1038/ncomms14271CrossRefPubMedPubMedCentralGoogle Scholar
  6. Bau D, Martin AJ, Mooney C, Vullo A, Walsh I, Pollastri G (2006) Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins. BMC Bioinformatics 7:402.  https://doi.org/10.1186/1471-2105-7-402CrossRefPubMedPubMedCentralGoogle Scholar
  7. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28(1):235–242CrossRefGoogle Scholar
  8. Buchan DWA, Minneci F, Nugent TCO, Bryson K, Jones DT (2013) Scalable web services for the PSIPRED Protein Analysis Workbench. Nucleic Acids Res 41(Web Server issue):349–357.  https://doi.org/10.1093/nar/gkt381CrossRefGoogle Scholar
  9. Dunker AK, Obradovic Z (2001) The protein trinity – linking function and disorder. Nat Biotechnol 19(9):805–806.  https://doi.org/10.1038/nbt0901-805CrossRefPubMedGoogle Scholar
  10. Dunker AK, Silman I, Uversky VN, Sussman JL (2008) Function and structure of inherently disordered proteins. Curr Opin Struct Biol 18(6):756–764.  https://doi.org/10.1016/j.sbi.2008.10.002CrossRefPubMedGoogle Scholar
  11. Eisenberg D (2003) The discovery of the alpha-helix and beta-sheet, the principal structural features of proteins. Proc Natl Acad Sci USA 100(20):11207–11210.  https://doi.org/10.1073/pnas.2034522100CrossRefPubMedGoogle Scholar
  12. Eyrich VA, Przybylski D, Koh IY, Grana O, Pazos F, Valencia A, Rost B (2003) CAFASP3 in the spotlight of EVA. Proteins 53(Suppl 6):548–560.  https://doi.org/10.1002/prot.10534CrossRefPubMedGoogle Scholar
  13. Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y (2012) SPINE X: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J Comput Chem 33(3):259–267.  https://doi.org/10.1002/jcc.21968CrossRefPubMedGoogle Scholar
  14. Garnier J, Osguthorpe DJ, Robson B (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120(1):97–120CrossRefGoogle Scholar
  15. Garnier J, Gibrat JF, Robson B (1996) GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol 266:540–553CrossRefGoogle Scholar
  16. Gibrat JF, Garnier J, Robson B (1987) Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J Mol Biol 198(3):425–443.  https://doi.org/10.1016/0022-2836(87)90292-0CrossRefPubMedGoogle Scholar
  17. Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5(May):11476–11476.  https://doi.org/10.1038/srep11476CrossRefPubMedPubMedCentralGoogle Scholar
  18. Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 33(18):2842–2849.  https://doi.org/10.1093/bioinformatics/btx218CrossRefPubMedGoogle Scholar
  19. Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292(2):195–202.  https://doi.org/10.1006/jmbi.1999.3091CrossRefGoogle Scholar
  20. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637.  https://doi.org/10.1002/bip.360221211CrossRefPubMedPubMedCentralGoogle Scholar
  21. Kendrew JC, Dickerson RE, Strandberg BE, Hart RG, Davies DR, Phillips DC, Shore VC (1960) Structure of myoglobin: a three-dimensional Fourier synthesis at 2 A. resolution. Nature 185(4711):422–427CrossRefGoogle Scholar
  22. Kiefer F, Arnold K, Kunzli M, Bordoli L, Schwede T (2009) The SWISS-MODEL Repository and associated resources. Nucleic Acids Res 37(Database):D387–D392.  https://doi.org/10.1093/nar/gkn750CrossRefPubMedGoogle Scholar
  23. Levitt M, Greer J (1977) Automatic identification of secondary structure in globular proteins. J Mol Biol 114(2):181–239CrossRefGoogle Scholar
  24. Marti-Renom MA, Madhusudhan MS, Fiser A, Rost B, Sali A (2002) Reliability of assessment of protein structure prediction methods. Structure 10(3):435–440CrossRefGoogle Scholar
  25. Mirabello C, Pollastri G (2013) Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility. Bioinformatics 29(16):2056–2058.  https://doi.org/10.1093/bioinformatics/btt344CrossRefPubMedGoogle Scholar
  26. Pauling L, Corey RB (1950) Two hydrogen-bonded spiral configurations of the polypeptide chain. J Am Chem Soc 72(11):5349–5349.  https://doi.org/10.1021/ja01167a545CrossRefGoogle Scholar
  27. Perutz MF, Rossmann MG, Cullis AF, Muirhead H, Will G, North AC (1960) Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-A. resolution, obtained by X-ray analysis. Nature 185(4711):416–422CrossRefGoogle Scholar
  28. Prudden J, Perry JJ, Arvai AS, Tainer JA, Boddy MN (2009) Molecular mimicry of SUMO promotes DNA repair. Nat Struct Mol Biol 16(5):509–516.  https://doi.org/10.1038/nsmb.1582CrossRefPubMedPubMedCentralGoogle Scholar
  29. Przybylski D, Rost B (2002) Alignments grow, secondary structure prediction improves. Proteins 46(2):197–205CrossRefGoogle Scholar
  30. Przybylski D, Rost B (2004) Improving fold recognition without folds. J Mol Biol 341(1):255–269.  https://doi.org/10.1016/j.jmb.2004.05.041CrossRefPubMedGoogle Scholar
  31. Remmert M, Biegert A, Hauser A, Söding J (2012) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9(2):173–175CrossRefGoogle Scholar
  32. Rose PW, Prlić A, Altunkaya A, Bi C, Bradley AR, Christie CH, Di Costanzo L, Duarte JM, Dutta S, Feng Z, Green RK, Goodsell DS, Hudson B, Kalro T, Lowe R, Peisach E, Randle C, Rose AS, Shao C, Tao YP, Valasatava Y, Voigt M, Westbrook JD, Woo J, Yang H, Young JY, Zardecki C, Berman HM, Burley SK (2017) The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res 45(D1):D271–D281.  https://doi.org/10.1093/nar/gkw1000CrossRefPubMedGoogle Scholar
  33. Rost B (1996) PHD: predicting one-dimensional protein structure by profile-based neural networks. Methods Enzymol 266:525–539CrossRefGoogle Scholar
  34. Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12(2):85–94CrossRefGoogle Scholar
  35. Rost B (2005) How to use protein 1D structure predicted by PROFphd. In: Walker JM (ed) The proteomics protocols handbook. Humana Press, New York, pp 875–901CrossRefGoogle Scholar
  36. Rost B, Schneider R, Sander C (1993) Progress in protein structure prediction? Trends Biochem Sci 18(4):120–123CrossRefGoogle Scholar
  37. Rychlewski L, Fischer D (2005) LiveBench-8: the large-scale, continuous assessment of automated protein structure prediction. Protein Sci 14(1):240–245.  https://doi.org/10.1110/ps.04888805CrossRefPubMedPubMedCentralGoogle Scholar
  38. Schlessinger A, Schaefer C, Vicedo E, Schmidberger M, Punta M, Rost B (2011) Protein disorder – a breakthrough invention of evolution? Curr Opin Struct Biol 21(3):412–418.  https://doi.org/10.1016/j.sbi.2011.03.014CrossRefPubMedGoogle Scholar
  39. Sillitoe I, Lewis TE, Cuff A, Das S, Ashford P, Dawson NL, Furnham N, Laskowski RA, Lee D, Lees JG, Lehtinen S, Studer RA, Thornton J, Orengo CA (2015) CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res 43(D1):D376–D381.  https://doi.org/10.1093/nar/gku947CrossRefPubMedGoogle Scholar
  40. Sormanni P, Camilloni C, Fariselli P, Vendruscolo M (2015) The s2D method: simultaneous sequence-based prediction of the statistical populations of ordered and disordered regions in proteins. J Mol Biol 427(4):982–996.  https://doi.org/10.1016/j.jmb.2014.12.007CrossRefPubMedGoogle Scholar
  41. Szent-Györgyi AG, Cohen C (1957) Role of proline in polypeptide cahin configuration of proteins. Science 126:697CrossRefGoogle Scholar
  42. The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45(D1):D158–D169.  https://doi.org/10.1093/nar/gkw1099CrossRefGoogle Scholar
  43. Wang S, Li W, Liu S, Xu J (2016) RaptorX-Property: a web server for protein structure property prediction. Nucleic Acids Res 44:W430–W435.  https://doi.org/10.1093/nar/gkw306CrossRefPubMedPubMedCentralGoogle Scholar
  44. Wiederstein M, Gruber M, Frank K, Melo F, Sippl MJ (2014) Structure-based characterization of multiprotein complexes. Structure 22(7):1063–1070.  https://doi.org/10.1016/j.str.2014.05.005CrossRefPubMedPubMedCentralGoogle Scholar
  45. Yachdav G, Kloppmann E, Kajan L, Hecht M, Goldberg T, Hamp T, Hönigschmid P, Schafferhans A, Roos M, Bernhofer M, Richter L, Ashkenazy H, Punta M, Schlessinger A, Bromberg Y, Schneider R, Vriend G, Sander C, Ben-Tal N, Rost B (2014) PredictProtein-an open resource for online prediction of protein structural and functional features. Nucleic Acids Res 42(Web Server issue):W337–W343.  https://doi.org/10.1093/nar/gku366CrossRefPubMedPubMedCentralGoogle Scholar
  46. Zhang H, Zhang T, Chen K, Kedarisetti KD, Mizianty MJ, Bao Q, Stach W, Kurgan L (2011) Critical assessment of high-throughput standalone methods for secondary structure prediction. Brief Bioinform 12(6):672–688.  https://doi.org/10.1093/bib/bbq088CrossRefPubMedGoogle Scholar

Copyright information

© European Biophysical Societies' Association (EBSA) 2019

Authors and Affiliations

  • Edda Kloppmann
    • 1
  • Jonas Reeb
    • 1
    Email author
  • Peter Hönigschmid
    • 2
  • Burkhard Rost
    • 1
  1. 1.Technische Universität MünchenGarchingGermany
  2. 2.Technische Universität München, Wissenschaftszentrum WeihenstephanFreisingGermany

Section editors and affiliations

  • Franca Fraternali

There are no affiliations available