Skip to main content

The Classification of Protein Domains

  • Protocol
Bioinformatics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 453))

Abstract

The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function, and the extent to which the functional repertoire can vary across the three kingdoms of life. This has led to the creation of a wide range of protein family classifications that aim to group proteins based on their evolutionary relationships.

This chapter discusses the approaches and methods that are frequently used in the classification of proteins, with a specific emphasis on the classification of protein domains. The construction of both domain sequence and domain structure databases is considered and the chapter shows how the use of domain family annotations to assign structural and functional information is enhancing our understanding of genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vogel, C., Bashton, M., Kerrison, N. D., et al. (2004) Structure, function and evolution of multidomain proteins. Curr Opin Struct 14, 208–216.

    Article  CAS  Google Scholar 

  2. Marsden, R. L., Lee, D., Maibaum, M., et al. (2006) Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space. Nucleic Acids Res 34, 1066–1080.

    Article  PubMed  CAS  Google Scholar 

  3. Needleman, S., Wunsch, C. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48, 443–453.

    Article  PubMed  CAS  Google Scholar 

  4. Pearson, W. R., Lipman, D. J. (1998) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85, 2444–2448.

    Article  Google Scholar 

  5. Altschul, S. F., Gish, W., Miller, W., et al. (1990) Basic local alignment search tool. J Mol Biol 215, 403–410.

    PubMed  CAS  Google Scholar 

  6. Ponting, C. P. (2001) Issues in predicting protein function from sequence. Brief Bio-informat 2, 19–29.

    CAS  Google Scholar 

  7. Bru, C., Courcelle, E., Carrere, S., et al. (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res 33, D212–215.

    Article  PubMed  CAS  Google Scholar 

  8. Portugaly, E., Linial, N., Linial, M. (2007) EVEREST: a collection of evolutionary conserved protein domains. Nucleic Acids Res 35, D241–D246.

    Article  PubMed  CAS  Google Scholar 

  9. Heger, A., Wilton, C. A., Sivakumar, A., et al. (2005) ADDA: a domain database with global coverage of the protein universe. Nucleic Acids Res 33, D188–191.

    Article  PubMed  CAS  Google Scholar 

  10. Leinonen, R., Nardone, F., Zhu, W., et al. (2006). UniSave: the UniProtKB sequence/annotation version database. Bioinformatics 22, 1284–1285.

    Article  PubMed  CAS  Google Scholar 

  11. Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402.

    Article  PubMed  CAS  Google Scholar 

  12. Enright, A. J., Kunin, V., Ouzounis, C. A. (2003) Protein families and TRIBES in genome sequence space. Nucleic Acids Res 31, 4632–4638.

    Article  PubMed  CAS  Google Scholar 

  13. Krause, A., Stoye, J., Vingron, M. (2000) The SYSTERS protein sequence cluster set. Nucleic Acids Res 28, 270–272.

    Article  PubMed  CAS  Google Scholar 

  14. Kaplan, N., Friedlich, M., Fromer, M., et al. (2004) A functional hierarchical organization of the protein sequence space. BMC Bioin-formatics 5, 190–196.

    Article  Google Scholar 

  15. Gattiker, A., Michoud, K., Rivoire, C., et al. (2003) Automated annotation of microbial proteomes in SWISS-PROT. Comput Biol Chem 27, 49–58.

    Article  PubMed  CAS  Google Scholar 

  16. Feng, D. F., Doolittle, R. F. (1996) Progressive alignment of amino acid sequences and construction of phylogenetic trees from them. Methods Enzymol 266, 368–382.

    Article  PubMed  CAS  Google Scholar 

  17. Eddy, S. R. (1996) Hidden Markov models. Curr Opin Struct Biol 6, 361–365.

    Article  PubMed  CAS  Google Scholar 

  18. Hulo, N., Bairoch, A., Bulliard, V., et al. (2006) The PROSITE database. Nucleic Acids Res 34, D227–230.

    Article  PubMed  CAS  Google Scholar 

  19. Finn, R. D., Mistry, J., Schuster-Bockler, B., et al. (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34, D247–251.

    Article  PubMed  CAS  Google Scholar 

  20. Letunic, I., Copley, R.R., Pils, B., Pinkert, S., Schultz, J. and Bork, P. (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res 34, D257–260.

    Article  PubMed  CAS  Google Scholar 

  21. Bourne, P. E., Westbrook, J., Berman, H. M. (2004) The Protein Data Bank and lessons in data management. Brief Bioinform 5, 23–30.

    Article  PubMed  CAS  Google Scholar 

  22. Richardson, J. S. (1981) The anatomy and taxonomy of protein structure. Adv Prot Chem 34, 167–339.

    Article  CAS  Google Scholar 

  23. Murzin, A. G., Brenner, S. E., Hubbard, T., et al. (2000) SCOP: a structural classification of proteins for the investigation of sequences and structures. J Mol Biol 247, 536–540.

    Google Scholar 

  24. Orengo, C. A., Mitchie, A. D., Jones, S., et al. (1997) CATH—a hierarchical classification of protein domain structures. Structure 5, 1093–1108.

    Article  PubMed  CAS  Google Scholar 

  25. Holm, L., Sander, C. (1998) Dictionary of recurrent domains in protein structures. Proteins 33, 88–96.

    Article  PubMed  CAS  Google Scholar 

  26. Sowdhamini, R., Rufino, S. D., Blundell, T. L. (1996) A database of globular protein structural domains: clustering of representative family members into similar folds. Fold Des 1, 209–220.

    Article  PubMed  CAS  Google Scholar 

  27. Gibrat, J. F., Madej, T., Bryant, S. H. (1996) Surprising similarities in structure comparison. Curr Opin Struct Biol 6, 377–385.

    Article  PubMed  CAS  Google Scholar 

  28. Pearl, F. M., Bennett, C. F., Bray, J. E., et al. (2003) The CATH database: an extended protein family resource for structural and functional genomics. Acids Res 31, 452–455.

    Article  CAS  Google Scholar 

  29. Taylor, W. R., Flores, T. P., Orengo, C. A. (1994) Multiple protein structure alignment. Protein Sci 3, 1858–1870.

    Article  PubMed  CAS  Google Scholar 

  30. Holm, L., Sander, C. (1993) Protein structure comparison by alignment of distance matrices. J Mol Biol 233, 123–128.

    Article  PubMed  CAS  Google Scholar 

  31. Quevillon, E., Silventoinen, V., Pillai, S., et al. (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33, W116–120.

    Article  PubMed  CAS  Google Scholar 

  32. Dayhoff, M. O., ed. (1965) Atlas of Protein Sequence and Structure. National Biomedi-cal Research Foundation, Washington, DC.

    Google Scholar 

  33. Orengo, C. A., Jones, D. T., Thornton. J. M. (1994) Protein superfamilies and domain superfolds. Nature 372, 631–634.

    Article  PubMed  CAS  Google Scholar 

  34. Wernisch, L., Hunting, M., Wodak, S. J. (1999) Identification of structural domains in proteins by a graph heuristic. Proteins 35, 338–352.

    Article  PubMed  CAS  Google Scholar 

  35. Marchler-Bauer, A., Panchenko, A. R., Shoemaker, B. A., et al. (2002) SH CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res 30, 281–283.

    Article  PubMed  CAS  Google Scholar 

  36. Guda, C., Lu, S., Scheeff, E. D., et al. (2004) CE-MC: a multiple protein structure alignment server. Nucleic Acids Res 32, W100–103.

    Article  PubMed  CAS  Google Scholar 

  37. Sowdhamini, R., Burke, D. F., Deane, C., et al. (1998) Protein three-dimensional structural databases: domains, structurally aligned homologues and superfamilies. Acta Crys-tallogr D Biol Crystallogr 54, 1168–1177.

    Article  CAS  Google Scholar 

  38. Orengo, C. A. (1999) CORA—topological fingerprints for protein structural families. Protein Sci 8, 699–715.

    Article  PubMed  CAS  Google Scholar 

  39. Hadley C., Jones, D. T. (1999) A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Struct Fold Des 7, 1099–1112.

    Article  CAS  Google Scholar 

  40. Lupas, A. N., Ponting, C. P., Russell, R. B. (2001) On the evolution of protein folds. Are similar motifs in different protein folds the result of convergence, insertion or relics of an ancient peptide world? J Struct Biol 134, 191–203.

    Article  PubMed  CAS  Google Scholar 

  41. Park. J., Karplus, K., Barrett, C., et al. (1998) Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol 284, 1201–1210.

    Article  PubMed  CAS  Google Scholar 

  42. Gough, J., Chothia, C. (2002) SUPER-FAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res 30, 268–272.

    Article  PubMed  CAS  Google Scholar 

  43. Yeats, C., Maibaum, M., Marsden, R., et al. (2006) Gene3D: modeling protein structure, function and evolution. Nucleic Acids Res 34, D281–284.

    Article  PubMed  CAS  Google Scholar 

  44. Todd, A. E., Marsden, R. L., Thornton, J. M., et al. (2005) Progress of structural genomics initiatives: an analysis of solved target structures. J Mol Biol 348, 1235–1260.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Marsden, R.L., Orengo, C.A. (2008). The Classification of Protein Domains. In: Keith, J.M. (eds) Bioinformatics. Methods in Molecular Biology™, vol 453. Humana Press. https://doi.org/10.1007/978-1-60327-429-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-429-6_5

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-60327-428-9

  • Online ISBN: 978-1-60327-429-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics