The Classification of Protein Domains

Marsden, Russell L.; Orengo, Christine A.

doi:10.1007/978-1-60327-429-6_5

Russell L. Marsden³ &
Christine A. Orengo³

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 453))

4153 Accesses
3 Citations

Abstract

The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function, and the extent to which the functional repertoire can vary across the three kingdoms of life. This has led to the creation of a wide range of protein family classifications that aim to group proteins based on their evolutionary relationships.

This chapter discusses the approaches and methods that are frequently used in the classification of proteins, with a specific emphasis on the classification of protein domains. The construction of both domain sequence and domain structure databases is considered and the chapter shows how the use of domain family annotations to assign structural and functional information is enhancing our understanding of genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vogel, C., Bashton, M., Kerrison, N. D., et al. (2004) Structure, function and evolution of multidomain proteins. Curr Opin Struct 14, 208–216.
Article CAS Google Scholar
Marsden, R. L., Lee, D., Maibaum, M., et al. (2006) Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space. Nucleic Acids Res 34, 1066–1080.
Article PubMed CAS Google Scholar
Needleman, S., Wunsch, C. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48, 443–453.
Article PubMed CAS Google Scholar
Pearson, W. R., Lipman, D. J. (1998) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85, 2444–2448.
Article Google Scholar
Altschul, S. F., Gish, W., Miller, W., et al. (1990) Basic local alignment search tool. J Mol Biol 215, 403–410.
PubMed CAS Google Scholar
Ponting, C. P. (2001) Issues in predicting protein function from sequence. Brief Bio-informat 2, 19–29.
CAS Google Scholar
Bru, C., Courcelle, E., Carrere, S., et al. (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res 33, D212–215.
Article PubMed CAS Google Scholar
Portugaly, E., Linial, N., Linial, M. (2007) EVEREST: a collection of evolutionary conserved protein domains. Nucleic Acids Res 35, D241–D246.
Article PubMed CAS Google Scholar
Heger, A., Wilton, C. A., Sivakumar, A., et al. (2005) ADDA: a domain database with global coverage of the protein universe. Nucleic Acids Res 33, D188–191.
Article PubMed CAS Google Scholar
Leinonen, R., Nardone, F., Zhu, W., et al. (2006). UniSave: the UniProtKB sequence/annotation version database. Bioinformatics 22, 1284–1285.
Article PubMed CAS Google Scholar
Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402.
Article PubMed CAS Google Scholar
Enright, A. J., Kunin, V., Ouzounis, C. A. (2003) Protein families and TRIBES in genome sequence space. Nucleic Acids Res 31, 4632–4638.
Article PubMed CAS Google Scholar
Krause, A., Stoye, J., Vingron, M. (2000) The SYSTERS protein sequence cluster set. Nucleic Acids Res 28, 270–272.
Article PubMed CAS Google Scholar
Kaplan, N., Friedlich, M., Fromer, M., et al. (2004) A functional hierarchical organization of the protein sequence space. BMC Bioin-formatics 5, 190–196.
Article Google Scholar
Gattiker, A., Michoud, K., Rivoire, C., et al. (2003) Automated annotation of microbial proteomes in SWISS-PROT. Comput Biol Chem 27, 49–58.
Article PubMed CAS Google Scholar
Feng, D. F., Doolittle, R. F. (1996) Progressive alignment of amino acid sequences and construction of phylogenetic trees from them. Methods Enzymol 266, 368–382.
Article PubMed CAS Google Scholar
Eddy, S. R. (1996) Hidden Markov models. Curr Opin Struct Biol 6, 361–365.
Article PubMed CAS Google Scholar
Hulo, N., Bairoch, A., Bulliard, V., et al. (2006) The PROSITE database. Nucleic Acids Res 34, D227–230.
Article PubMed CAS Google Scholar
Finn, R. D., Mistry, J., Schuster-Bockler, B., et al. (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34, D247–251.
Article PubMed CAS Google Scholar
Letunic, I., Copley, R.R., Pils, B., Pinkert, S., Schultz, J. and Bork, P. (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res 34, D257–260.
Article PubMed CAS Google Scholar
Bourne, P. E., Westbrook, J., Berman, H. M. (2004) The Protein Data Bank and lessons in data management. Brief Bioinform 5, 23–30.
Article PubMed CAS Google Scholar
Richardson, J. S. (1981) The anatomy and taxonomy of protein structure. Adv Prot Chem 34, 167–339.
Article CAS Google Scholar
Murzin, A. G., Brenner, S. E., Hubbard, T., et al. (2000) SCOP: a structural classification of proteins for the investigation of sequences and structures. J Mol Biol 247, 536–540.
Google Scholar
Orengo, C. A., Mitchie, A. D., Jones, S., et al. (1997) CATH—a hierarchical classification of protein domain structures. Structure 5, 1093–1108.
Article PubMed CAS Google Scholar
Holm, L., Sander, C. (1998) Dictionary of recurrent domains in protein structures. Proteins 33, 88–96.
Article PubMed CAS Google Scholar
Sowdhamini, R., Rufino, S. D., Blundell, T. L. (1996) A database of globular protein structural domains: clustering of representative family members into similar folds. Fold Des 1, 209–220.
Article PubMed CAS Google Scholar
Gibrat, J. F., Madej, T., Bryant, S. H. (1996) Surprising similarities in structure comparison. Curr Opin Struct Biol 6, 377–385.
Article PubMed CAS Google Scholar
Pearl, F. M., Bennett, C. F., Bray, J. E., et al. (2003) The CATH database: an extended protein family resource for structural and functional genomics. Acids Res 31, 452–455.
Article CAS Google Scholar
Taylor, W. R., Flores, T. P., Orengo, C. A. (1994) Multiple protein structure alignment. Protein Sci 3, 1858–1870.
Article PubMed CAS Google Scholar
Holm, L., Sander, C. (1993) Protein structure comparison by alignment of distance matrices. J Mol Biol 233, 123–128.
Article PubMed CAS Google Scholar
Quevillon, E., Silventoinen, V., Pillai, S., et al. (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33, W116–120.
Article PubMed CAS Google Scholar
Dayhoff, M. O., ed. (1965) Atlas of Protein Sequence and Structure. National Biomedi-cal Research Foundation, Washington, DC.
Google Scholar
Orengo, C. A., Jones, D. T., Thornton. J. M. (1994) Protein superfamilies and domain superfolds. Nature 372, 631–634.
Article PubMed CAS Google Scholar
Wernisch, L., Hunting, M., Wodak, S. J. (1999) Identification of structural domains in proteins by a graph heuristic. Proteins 35, 338–352.
Article PubMed CAS Google Scholar
Marchler-Bauer, A., Panchenko, A. R., Shoemaker, B. A., et al. (2002) SH CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res 30, 281–283.
Article PubMed CAS Google Scholar
Guda, C., Lu, S., Scheeff, E. D., et al. (2004) CE-MC: a multiple protein structure alignment server. Nucleic Acids Res 32, W100–103.
Article PubMed CAS Google Scholar
Sowdhamini, R., Burke, D. F., Deane, C., et al. (1998) Protein three-dimensional structural databases: domains, structurally aligned homologues and superfamilies. Acta Crys-tallogr D Biol Crystallogr 54, 1168–1177.
Article CAS Google Scholar
Orengo, C. A. (1999) CORA—topological fingerprints for protein structural families. Protein Sci 8, 699–715.
Article PubMed CAS Google Scholar
Hadley C., Jones, D. T. (1999) A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Struct Fold Des 7, 1099–1112.
Article CAS Google Scholar
Lupas, A. N., Ponting, C. P., Russell, R. B. (2001) On the evolution of protein folds. Are similar motifs in different protein folds the result of convergence, insertion or relics of an ancient peptide world? J Struct Biol 134, 191–203.
Article PubMed CAS Google Scholar
Park. J., Karplus, K., Barrett, C., et al. (1998) Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol 284, 1201–1210.
Article PubMed CAS Google Scholar
Gough, J., Chothia, C. (2002) SUPER-FAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res 30, 268–272.
Article PubMed CAS Google Scholar
Yeats, C., Maibaum, M., Marsden, R., et al. (2006) Gene3D: modeling protein structure, function and evolution. Nucleic Acids Res 34, D281–284.
Article PubMed CAS Google Scholar
Todd, A. E., Marsden, R. L., Thornton, J. M., et al. (2005) Progress of structural genomics initiatives: an analysis of solved target structures. J Mol Biol 348, 1235–1260.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Biochemistry and Molecular Biology Department, University College London, London, United Kingdom
Russell L. Marsden & Christine A. Orengo

Authors

Russell L. Marsden
View author publications
You can also search for this author in PubMed Google Scholar
Christine A. Orengo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
Jonathan M. Keith PhD

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Marsden, R.L., Orengo, C.A. (2008). The Classification of Protein Domains. In: Keith, J.M. (eds) Bioinformatics. Methods in Molecular Biology™, vol 453. Humana Press. https://doi.org/10.1007/978-1-60327-429-6_5

Download citation

DOI: https://doi.org/10.1007/978-1-60327-429-6_5
Publisher Name: Humana Press
Print ISBN: 978-1-60327-428-9
Online ISBN: 978-1-60327-429-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics