Abstract
The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function, and the extent to which the functional repertoire can vary across the three kingdoms of life. This has led to the creation of a wide range of protein family classifications that aim to group proteins based on their evolutionary relationships.
This chapter discusses the approaches and methods that are frequently used in the classification of proteins, with a specific emphasis on the classification of protein domains. The construction of both domain sequence and domain structure databases is considered and the chapter shows how the use of domain family annotations to assign structural and functional information is enhancing our understanding of genomes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vogel, C., Bashton, M., Kerrison, N. D., et al. (2004) Structure, function and evolution of multidomain proteins. Curr Opin Struct 14, 208–216.
Marsden, R. L., Lee, D., Maibaum, M., et al. (2006) Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space. Nucleic Acids Res 34, 1066–1080.
Needleman, S., Wunsch, C. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48, 443–453.
Pearson, W. R., Lipman, D. J. (1998) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85, 2444–2448.
Altschul, S. F., Gish, W., Miller, W., et al. (1990) Basic local alignment search tool. J Mol Biol 215, 403–410.
Ponting, C. P. (2001) Issues in predicting protein function from sequence. Brief Bio-informat 2, 19–29.
Bru, C., Courcelle, E., Carrere, S., et al. (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res 33, D212–215.
Portugaly, E., Linial, N., Linial, M. (2007) EVEREST: a collection of evolutionary conserved protein domains. Nucleic Acids Res 35, D241–D246.
Heger, A., Wilton, C. A., Sivakumar, A., et al. (2005) ADDA: a domain database with global coverage of the protein universe. Nucleic Acids Res 33, D188–191.
Leinonen, R., Nardone, F., Zhu, W., et al. (2006). UniSave: the UniProtKB sequence/annotation version database. Bioinformatics 22, 1284–1285.
Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402.
Enright, A. J., Kunin, V., Ouzounis, C. A. (2003) Protein families and TRIBES in genome sequence space. Nucleic Acids Res 31, 4632–4638.
Krause, A., Stoye, J., Vingron, M. (2000) The SYSTERS protein sequence cluster set. Nucleic Acids Res 28, 270–272.
Kaplan, N., Friedlich, M., Fromer, M., et al. (2004) A functional hierarchical organization of the protein sequence space. BMC Bioin-formatics 5, 190–196.
Gattiker, A., Michoud, K., Rivoire, C., et al. (2003) Automated annotation of microbial proteomes in SWISS-PROT. Comput Biol Chem 27, 49–58.
Feng, D. F., Doolittle, R. F. (1996) Progressive alignment of amino acid sequences and construction of phylogenetic trees from them. Methods Enzymol 266, 368–382.
Eddy, S. R. (1996) Hidden Markov models. Curr Opin Struct Biol 6, 361–365.
Hulo, N., Bairoch, A., Bulliard, V., et al. (2006) The PROSITE database. Nucleic Acids Res 34, D227–230.
Finn, R. D., Mistry, J., Schuster-Bockler, B., et al. (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34, D247–251.
Letunic, I., Copley, R.R., Pils, B., Pinkert, S., Schultz, J. and Bork, P. (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res 34, D257–260.
Bourne, P. E., Westbrook, J., Berman, H. M. (2004) The Protein Data Bank and lessons in data management. Brief Bioinform 5, 23–30.
Richardson, J. S. (1981) The anatomy and taxonomy of protein structure. Adv Prot Chem 34, 167–339.
Murzin, A. G., Brenner, S. E., Hubbard, T., et al. (2000) SCOP: a structural classification of proteins for the investigation of sequences and structures. J Mol Biol 247, 536–540.
Orengo, C. A., Mitchie, A. D., Jones, S., et al. (1997) CATH—a hierarchical classification of protein domain structures. Structure 5, 1093–1108.
Holm, L., Sander, C. (1998) Dictionary of recurrent domains in protein structures. Proteins 33, 88–96.
Sowdhamini, R., Rufino, S. D., Blundell, T. L. (1996) A database of globular protein structural domains: clustering of representative family members into similar folds. Fold Des 1, 209–220.
Gibrat, J. F., Madej, T., Bryant, S. H. (1996) Surprising similarities in structure comparison. Curr Opin Struct Biol 6, 377–385.
Pearl, F. M., Bennett, C. F., Bray, J. E., et al. (2003) The CATH database: an extended protein family resource for structural and functional genomics. Acids Res 31, 452–455.
Taylor, W. R., Flores, T. P., Orengo, C. A. (1994) Multiple protein structure alignment. Protein Sci 3, 1858–1870.
Holm, L., Sander, C. (1993) Protein structure comparison by alignment of distance matrices. J Mol Biol 233, 123–128.
Quevillon, E., Silventoinen, V., Pillai, S., et al. (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33, W116–120.
Dayhoff, M. O., ed. (1965) Atlas of Protein Sequence and Structure. National Biomedi-cal Research Foundation, Washington, DC.
Orengo, C. A., Jones, D. T., Thornton. J. M. (1994) Protein superfamilies and domain superfolds. Nature 372, 631–634.
Wernisch, L., Hunting, M., Wodak, S. J. (1999) Identification of structural domains in proteins by a graph heuristic. Proteins 35, 338–352.
Marchler-Bauer, A., Panchenko, A. R., Shoemaker, B. A., et al. (2002) SH CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res 30, 281–283.
Guda, C., Lu, S., Scheeff, E. D., et al. (2004) CE-MC: a multiple protein structure alignment server. Nucleic Acids Res 32, W100–103.
Sowdhamini, R., Burke, D. F., Deane, C., et al. (1998) Protein three-dimensional structural databases: domains, structurally aligned homologues and superfamilies. Acta Crys-tallogr D Biol Crystallogr 54, 1168–1177.
Orengo, C. A. (1999) CORA—topological fingerprints for protein structural families. Protein Sci 8, 699–715.
Hadley C., Jones, D. T. (1999) A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Struct Fold Des 7, 1099–1112.
Lupas, A. N., Ponting, C. P., Russell, R. B. (2001) On the evolution of protein folds. Are similar motifs in different protein folds the result of convergence, insertion or relics of an ancient peptide world? J Struct Biol 134, 191–203.
Park. J., Karplus, K., Barrett, C., et al. (1998) Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol 284, 1201–1210.
Gough, J., Chothia, C. (2002) SUPER-FAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res 30, 268–272.
Yeats, C., Maibaum, M., Marsden, R., et al. (2006) Gene3D: modeling protein structure, function and evolution. Nucleic Acids Res 34, D281–284.
Todd, A. E., Marsden, R. L., Thornton, J. M., et al. (2005) Progress of structural genomics initiatives: an analysis of solved target structures. J Mol Biol 348, 1235–1260.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Marsden, R.L., Orengo, C.A. (2008). The Classification of Protein Domains. In: Keith, J.M. (eds) Bioinformatics. Methods in Molecular Biology™, vol 453. Humana Press. https://doi.org/10.1007/978-1-60327-429-6_5
Download citation
DOI: https://doi.org/10.1007/978-1-60327-429-6_5
Publisher Name: Humana Press
Print ISBN: 978-1-60327-428-9
Online ISBN: 978-1-60327-429-6
eBook Packages: Springer Protocols