Skip to main content

An Introduction to Protein Contact Prediction

  • Protocol
Bioinformatics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 453))

Abstract

A fundamental problem in molecular biology is the prediction of the three-dimensional structure of a protein from its amino acid sequence. However, molecular modeling to find the structure is at present intractable and is likely to remain so for some time, hence intermediate steps such as predicting which residues pairs are in contact have been developed. Predicted contact pairs have been used for fold prediction, as an initial condition or constraint for molecular modeling, and as a filter to rank multiple models arising from homology modeling. As contact prediction has advanced it is becoming more common for 3D structure predictors to integrate contact prediction into structure building, as this often gives information that is orthogonal to that produced by other methods. This chapter shows how evolutionary information contained in protein sequences and multiple sequence alignments can be used to predict protein structure, and the state-of-the-art predictors and their methodologies are reviewed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gobel, U., Sander, C, Scheider, R., et al. (1994) Correlated mutations and residue contacts in proteins. Proteins 18, 309–317.

    Article  PubMed  CAS  Google Scholar 

  2. McLachlan, A.D. (1971) Tests for comparing related amino acid sequences. J Mol Biol 61, 409–424.

    Article  PubMed  CAS  Google Scholar 

  3. Neher, E. (1994) How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci USA 91(1), 98–102.

    Article  PubMed  CAS  Google Scholar 

  4. Vicatos, S., Reddy, B.V.B., and Kaznes-sis, Y. (2005) Prediction of distant residue contacts with the use of evolutionary information. Proteins: Structure, Function, and Bioinformatics 58, 935–949.

    Article  CAS  Google Scholar 

  5. Singer, M.S., Vriend, G., and Bywater, R.P. (2002) Prediction of protein residue contacts with a PDB-derived likelihood matrix. Protein Eng l5(9), 721–725.

    Article  Google Scholar 

  6. Lin, K., Kleinjung, J., Taylor, W., et al. (2003) Testing homology with CAO: A contact-based Markov model of protein evolution. Comp Biol Chem 27, 93–102.

    Article  CAS  Google Scholar 

  7. Clarke, N.D. (1995) Covariation of residues in the homeodomain sequence family. Protein Sci. 7(11), 2269–78.

    Article  Google Scholar 

  8. Korber, B.T.M., Farber, R.M., Wolpert, D.H., et al. (1993) Covariation of Mutations in the V3 Loop of Human Immunodeficiency Virus Type 1 Envelope Protein: An Information Theoretic Analysis. Proc Natl Acad Sci 90, 7176–7180.

    Article  PubMed  CAS  Google Scholar 

  9. Martin, L.C., Gloor, G.B., Dunn, S.D., et al. (2005) Using information theory to search for co-evolving residues in proteins. Bioinformatics 21(22), 4116–4124.

    Article  PubMed  CAS  Google Scholar 

  10. Oliveira, L., Paiva, A.C.M., and Vriend, G. (2002) Correlated Mutation Analyses on Very Large Sequence Families. Chem Bio Chem 3(10), 1010–1017.

    PubMed  CAS  Google Scholar 

  11. Akmaev, V.R., Kelley, S.T., and Stormo, G.D. (2000) Phylogenetically enhanced statistical tools for RNA structure prediction. Bioinformatics 16(6), 501–512.

    Article  PubMed  CAS  Google Scholar 

  12. Tillier, E.R.M. and Lui, T.W.H. (2003) Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics 19(6), 750–755.

    Article  PubMed  CAS  Google Scholar 

  13. Wollenberg, K.R., and Atchley, W.R. (2000) Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. Proc Natl Acad Sci USA 97, 3288–3291.

    Article  PubMed  CAS  Google Scholar 

  14. McGuffin, L.J., Bryson, K., and Jones, D.T (2000) The PSIPRED protein structure prediction server. Bioinformatics 16, 404–405.

    Article  PubMed  CAS  Google Scholar 

  15. Shapire, R.E., The boosting approach to machine learning: An overview. MSRI Workshop on Nonlinear Estimation and Classification. 2002: Springer.

    Google Scholar 

  16. Haykin, S., Neural Networks. 2nd ed. 1999: Prentice Hall. 104

    Google Scholar 

  17. Zell, A., Marnier, M., Vogt, N., et al, Stuttgart Neural Network Simulator User Manual Version 4.2. 1998: University of Stuttgart.

    Google Scholar 

  18. Punta, M., and Rost, B. (2005) PROFcon: novel prediction of long range contacts. Bioinformatics 21(13),2960–2968.

    Article  PubMed  CAS  Google Scholar 

  19. Hamilton, N., Burrage, K, Ragan, M.A., et al. (2004) Protein contact prediction using patterns of correlation. Proteins: Structure, Function, and Bioinformatics 56, 679–684.

    Article  CAS  Google Scholar 

  20. Fariselli, P., Olmea, O., Valencia, A., et al. (2001) Prediction of contact maps with neural networks and correlated mutations. Protein Eng 14, 835–843.

    Article  PubMed  CAS  Google Scholar 

  21. MacCallum, R.M. (2004) Stripped sheets and protein contact prediction. Bioinformatics 20(1), i224–i231.

    Article  PubMed  CAS  Google Scholar 

  22. Cortes, C, and Vapnik, V. (1995) Support vector network. Machine and learning 20, 273–297.

    Google Scholar 

  23. Boser, B., Guyon, I., and Vapnik, V. A training algorithm for optimal margin classifiers. in Proceedings of the fifth annual workshop on computational learning theory. 1992.

    Google Scholar 

  24. Chang, C-C, and Lin, C-J, LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu. tw/ cjlin/libsvm. 2001.

    Google Scholar 

  25. Koski, T., Hidden Markov Models for Bioinformatics. 2002: Springer.

    Google Scholar 

  26. Karplus, K, Karchin, R., Draper, J., et al. (2003) Combining local-structure, fold-recognition, and new-fold methods for protein structure prediction. Proteins: Structure, Function, and Genetics 53(S6), 491–496.

    Article  CAS  Google Scholar 

  27. Shao, Y. and Bystroff, C. (2003) Predicting Interresidue contacts using templates and pathways. Proteins 53, 497–502.

    Article  PubMed  CAS  Google Scholar 

  28. Conrad, C, Erfle, H., Warnat, P., et al. (2004) Automatic Identification of Subcel-lular Phenotypes on Human Cell Arrays. Genome Research 14, 1130–1136.

    Article  PubMed  CAS  Google Scholar 

  29. Tsai, C-H, Chen, B-J, Chan, C-h, et al. (2005) Improving disulphide connectivity prediction with sequential distance between oxidized cysteines. Bioinformatics 21(4), 4416–4419.

    Article  PubMed  CAS  Google Scholar 

  30. Hu, J., Shen, X., Shao, Y., et al., eds. Mining protein contact maps. In 2nd BIOKDD Workshop on Data Mining in Bioinformatics. 2002.

    Google Scholar 

  31. Yuan, Z. (2005) Better prediction of protein contact number using a support vector regression analysis if amino acid sequence. BMC Bioinformatics 6, 248–257.

    Article  PubMed  Google Scholar 

  32. Aloy, P., Stark, A., Hadley, C, et al. (2003) Predictions without templates: new folds, secondary structure, and contacts in CASP5. Proteins Suppl. 6, 436–456.

    Article  Google Scholar 

  33. Olmea, O., and Valencia, A. (1997) Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold Design 2, S25–S32.

    Article  CAS  Google Scholar 

  34. Mirny, L. and Domany, E. (1996) Protein Fold Recognition and Dynamics in The Space of Contact Maps. Proteins 26, 319–410.

    Article  Google Scholar 

  35. Fariselli, P., Olmea, O., Valencia, A., et al. (2001) Progress in predicting inter-residue contacts of proteins with neural networks and correlated mutations. Proteins Suppl 5,157–162.

    Article  PubMed  CAS  Google Scholar 

  36. Fariselli, P. and Casadio, R. (1999) Neural network based prediction of residue contacts in protein. Protein Eng 12, 15–21.

    Article  PubMed  CAS  Google Scholar 

  37. Grana, O., Baker, D., Maccallum, R.M., et al. (2005) CASP6 assessment of contact prediction. Proteins: Structure, Function, and Bioinformatics 61 Suppl 7, 214–24.

    Article  CAS  Google Scholar 

  38. Koh, I.Y.Y., Eyrich, V.A., Marti-Renom, M.A., et al. (2003) EVA: evaluation of protein structure prediction servers. Nucleic Acids Research 31, 3311–3315.

    Article  PubMed  CAS  Google Scholar 

  39. Pazos, F., Helmer-Citterich, M., and Aus-iello, G. (1997) Correlated mutations contain information about protein-protein interaction. J Mol Biol 271, 511–523.

    Article  PubMed  CAS  Google Scholar 

  40. Rychlewski, L., and Fischer, D. (2005) LiveBench-8: The large-scale, continuous assessment of automated protein structure prediction. Protein Science 14, 240–245.

    Article  PubMed  CAS  Google Scholar 

  41. Pollastri, G. and Baldi, P. (2002) Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics 18(Suppl. 1), S62–S70.

    PubMed  Google Scholar 

  42. Kohonen, T., and Makisari, K. (1989) The self-organizing feature maps. Phys Scripta 39, 168–172.

    Article  Google Scholar 

  43. Andreeva, A., Howorth, D., Brenner, S.E., et al. (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Research 32(Database issue), D226–9.

    Article  PubMed  CAS  Google Scholar 

  44. Zhang, Y., Arakaki, A.K., and Skolnick, J. (2005) TASSER: An automated method for the prediction of protein tertiary structures. Protein Structure, Function, and Bioinformatics Suppl. 7, 91–98.

    Article  Google Scholar 

  45. Kim, D.E., Chivian, D., and Baker, D. (2004) Protein structure prediction and analysis using the Robetta server. Nucleic Acids Research 32, W526–W531.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The authors gratefully acknowledge financial support from the University of Queensland, the ARC Australian Centre for Bio-informatics and the Institute for Molecular Bioscience. The first author would also like to acknowledge the support of Prof. Kevin Burrage's Australian Federation Fellowship.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Hamilton, N., Huber, T. (2008). An Introduction to Protein Contact Prediction. In: Keith, J.M. (eds) Bioinformatics. Methods in Molecular Biology™, vol 453. Humana Press. https://doi.org/10.1007/978-1-60327-429-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-429-6_3

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-60327-428-9

  • Online ISBN: 978-1-60327-429-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics