Abstract
A fundamental problem in molecular biology is the prediction of the three-dimensional structure of a protein from its amino acid sequence. However, molecular modeling to find the structure is at present intractable and is likely to remain so for some time, hence intermediate steps such as predicting which residues pairs are in contact have been developed. Predicted contact pairs have been used for fold prediction, as an initial condition or constraint for molecular modeling, and as a filter to rank multiple models arising from homology modeling. As contact prediction has advanced it is becoming more common for 3D structure predictors to integrate contact prediction into structure building, as this often gives information that is orthogonal to that produced by other methods. This chapter shows how evolutionary information contained in protein sequences and multiple sequence alignments can be used to predict protein structure, and the state-of-the-art predictors and their methodologies are reviewed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gobel, U., Sander, C, Scheider, R., et al. (1994) Correlated mutations and residue contacts in proteins. Proteins 18, 309–317.
McLachlan, A.D. (1971) Tests for comparing related amino acid sequences. J Mol Biol 61, 409–424.
Neher, E. (1994) How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci USA 91(1), 98–102.
Vicatos, S., Reddy, B.V.B., and Kaznes-sis, Y. (2005) Prediction of distant residue contacts with the use of evolutionary information. Proteins: Structure, Function, and Bioinformatics 58, 935–949.
Singer, M.S., Vriend, G., and Bywater, R.P. (2002) Prediction of protein residue contacts with a PDB-derived likelihood matrix. Protein Eng l5(9), 721–725.
Lin, K., Kleinjung, J., Taylor, W., et al. (2003) Testing homology with CAO: A contact-based Markov model of protein evolution. Comp Biol Chem 27, 93–102.
Clarke, N.D. (1995) Covariation of residues in the homeodomain sequence family. Protein Sci. 7(11), 2269–78.
Korber, B.T.M., Farber, R.M., Wolpert, D.H., et al. (1993) Covariation of Mutations in the V3 Loop of Human Immunodeficiency Virus Type 1 Envelope Protein: An Information Theoretic Analysis. Proc Natl Acad Sci 90, 7176–7180.
Martin, L.C., Gloor, G.B., Dunn, S.D., et al. (2005) Using information theory to search for co-evolving residues in proteins. Bioinformatics 21(22), 4116–4124.
Oliveira, L., Paiva, A.C.M., and Vriend, G. (2002) Correlated Mutation Analyses on Very Large Sequence Families. Chem Bio Chem 3(10), 1010–1017.
Akmaev, V.R., Kelley, S.T., and Stormo, G.D. (2000) Phylogenetically enhanced statistical tools for RNA structure prediction. Bioinformatics 16(6), 501–512.
Tillier, E.R.M. and Lui, T.W.H. (2003) Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics 19(6), 750–755.
Wollenberg, K.R., and Atchley, W.R. (2000) Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. Proc Natl Acad Sci USA 97, 3288–3291.
McGuffin, L.J., Bryson, K., and Jones, D.T (2000) The PSIPRED protein structure prediction server. Bioinformatics 16, 404–405.
Shapire, R.E., The boosting approach to machine learning: An overview. MSRI Workshop on Nonlinear Estimation and Classification. 2002: Springer.
Haykin, S., Neural Networks. 2nd ed. 1999: Prentice Hall. 104
Zell, A., Marnier, M., Vogt, N., et al, Stuttgart Neural Network Simulator User Manual Version 4.2. 1998: University of Stuttgart.
Punta, M., and Rost, B. (2005) PROFcon: novel prediction of long range contacts. Bioinformatics 21(13),2960–2968.
Hamilton, N., Burrage, K, Ragan, M.A., et al. (2004) Protein contact prediction using patterns of correlation. Proteins: Structure, Function, and Bioinformatics 56, 679–684.
Fariselli, P., Olmea, O., Valencia, A., et al. (2001) Prediction of contact maps with neural networks and correlated mutations. Protein Eng 14, 835–843.
MacCallum, R.M. (2004) Stripped sheets and protein contact prediction. Bioinformatics 20(1), i224–i231.
Cortes, C, and Vapnik, V. (1995) Support vector network. Machine and learning 20, 273–297.
Boser, B., Guyon, I., and Vapnik, V. A training algorithm for optimal margin classifiers. in Proceedings of the fifth annual workshop on computational learning theory. 1992.
Chang, C-C, and Lin, C-J, LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu. tw/ cjlin/libsvm. 2001.
Koski, T., Hidden Markov Models for Bioinformatics. 2002: Springer.
Karplus, K, Karchin, R., Draper, J., et al. (2003) Combining local-structure, fold-recognition, and new-fold methods for protein structure prediction. Proteins: Structure, Function, and Genetics 53(S6), 491–496.
Shao, Y. and Bystroff, C. (2003) Predicting Interresidue contacts using templates and pathways. Proteins 53, 497–502.
Conrad, C, Erfle, H., Warnat, P., et al. (2004) Automatic Identification of Subcel-lular Phenotypes on Human Cell Arrays. Genome Research 14, 1130–1136.
Tsai, C-H, Chen, B-J, Chan, C-h, et al. (2005) Improving disulphide connectivity prediction with sequential distance between oxidized cysteines. Bioinformatics 21(4), 4416–4419.
Hu, J., Shen, X., Shao, Y., et al., eds. Mining protein contact maps. In 2nd BIOKDD Workshop on Data Mining in Bioinformatics. 2002.
Yuan, Z. (2005) Better prediction of protein contact number using a support vector regression analysis if amino acid sequence. BMC Bioinformatics 6, 248–257.
Aloy, P., Stark, A., Hadley, C, et al. (2003) Predictions without templates: new folds, secondary structure, and contacts in CASP5. Proteins Suppl. 6, 436–456.
Olmea, O., and Valencia, A. (1997) Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold Design 2, S25–S32.
Mirny, L. and Domany, E. (1996) Protein Fold Recognition and Dynamics in The Space of Contact Maps. Proteins 26, 319–410.
Fariselli, P., Olmea, O., Valencia, A., et al. (2001) Progress in predicting inter-residue contacts of proteins with neural networks and correlated mutations. Proteins Suppl 5,157–162.
Fariselli, P. and Casadio, R. (1999) Neural network based prediction of residue contacts in protein. Protein Eng 12, 15–21.
Grana, O., Baker, D., Maccallum, R.M., et al. (2005) CASP6 assessment of contact prediction. Proteins: Structure, Function, and Bioinformatics 61 Suppl 7, 214–24.
Koh, I.Y.Y., Eyrich, V.A., Marti-Renom, M.A., et al. (2003) EVA: evaluation of protein structure prediction servers. Nucleic Acids Research 31, 3311–3315.
Pazos, F., Helmer-Citterich, M., and Aus-iello, G. (1997) Correlated mutations contain information about protein-protein interaction. J Mol Biol 271, 511–523.
Rychlewski, L., and Fischer, D. (2005) LiveBench-8: The large-scale, continuous assessment of automated protein structure prediction. Protein Science 14, 240–245.
Pollastri, G. and Baldi, P. (2002) Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics 18(Suppl. 1), S62–S70.
Kohonen, T., and Makisari, K. (1989) The self-organizing feature maps. Phys Scripta 39, 168–172.
Andreeva, A., Howorth, D., Brenner, S.E., et al. (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Research 32(Database issue), D226–9.
Zhang, Y., Arakaki, A.K., and Skolnick, J. (2005) TASSER: An automated method for the prediction of protein tertiary structures. Protein Structure, Function, and Bioinformatics Suppl. 7, 91–98.
Kim, D.E., Chivian, D., and Baker, D. (2004) Protein structure prediction and analysis using the Robetta server. Nucleic Acids Research 32, W526–W531.
Acknowledgments
The authors gratefully acknowledge financial support from the University of Queensland, the ARC Australian Centre for Bio-informatics and the Institute for Molecular Bioscience. The first author would also like to acknowledge the support of Prof. Kevin Burrage's Australian Federation Fellowship.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Hamilton, N., Huber, T. (2008). An Introduction to Protein Contact Prediction. In: Keith, J.M. (eds) Bioinformatics. Methods in Molecular Biology™, vol 453. Humana Press. https://doi.org/10.1007/978-1-60327-429-6_3
Download citation
DOI: https://doi.org/10.1007/978-1-60327-429-6_3
Publisher Name: Humana Press
Print ISBN: 978-1-60327-428-9
Online ISBN: 978-1-60327-429-6
eBook Packages: Springer Protocols