Skip to main content

CATH-Gene3D: Generation of the Resource and Its Use in Obtaining Structural and Functional Annotations for Protein Sequences

  • Protocol
  • First Online:
Protein Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1558))

Abstract

This chapter describes the generation of the data in the CATH-Gene3D online resource and how it can be used to study protein domains and their evolutionary relationships. Methods will be presented for: comparing protein structures, recognizing homologs, predicting domain structures within protein sequences, and subclassifying superfamilies into functionally pure families, together with a guide on using the webpages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Brice MD, Rodgers JR et al (1977) The protein data bank: a computer-based archival file for macromolecular structures. J Mol Biol 112(3):535–542 http://view.ncbi.nlm.nih.gov/pubmed/875032

    Article  CAS  PubMed  Google Scholar 

  2. Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM (1997) CATH—a hierarchic classification of protein domain structures. Structure 5(8):1093–1108 http://www.ncbi.nlm.nih.gov/pubmed/9309224

  3. Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540 citeulike-article-id:2564113

    CAS  PubMed  Google Scholar 

  4. Oates ME, Stahlhacke J, Vavoulis DV, Smithers B, Rackham OJL, Sardar AJ et al (2015) The SUPERFAMILY 1.75 database in 2014: a doubling of data. Nucleic Acids Res 43(D1):D227–D333 http://dx.doi.org/10.1093/nar/gku1041. Oxford University Press

    Article  PubMed  Google Scholar 

  5. Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG (2014) SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res 42(D1):D310–D314 Oxford University Presshttp://dx.doi.org/10.1093/nar/gkt1242

    Article  CAS  PubMed  Google Scholar 

  6. Fox NK, Brenner SE, Chandonia J-MM. 2014 SCOPe: structural classification of proteins--extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 42(Database issue):D304–D309 Oxford University Press http://dx.doi.org/10.1093/nar/gkt1240

  7. Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S et al (2014) ECOD: an evolutionary classification of protein domains. PLoS Comput Biol 10(12):e1003926http://dx.doi.org/10.1371/journal.pcbi.1003926. Public Library of Science

    Article  PubMed  PubMed Central  Google Scholar 

  8. Ekman D, Björklund ÅK, Frey-Skött J, Elofsson A (2005) Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. J Mol Biol 348(1):231–243 http://dx.doi.org/10.1016/j.jmb.2005.02.007

    Article  CAS  PubMed  Google Scholar 

  9. Holland TA, Veretnik S, Shindyalov IN, Bourne PE (2006) Partitioning protein structures into domains: why is it so difficult? J Mol Biol. 361(3):562–590 http://www.ncbi.nlm.nih.gov/pubmed/16863650

  10. Karplus K, Barrett C, Hughey R (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14(10):846–856 http://www.ncbi.nlm.nih.gov/pubmed/9927713

  11. Karplus K, Karchin R, Draper J, Casper J, Mandel-Gutfreund Y, Diekhans M et al (2003) Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins 53(Suppl 6):491–496 http://www.ncbi.nlm.nih.gov/pubmed/14579338

  12. Taylor W, Orengo CA (1989) Protein structure alignment. J Mol Biol 208(1):1–22 http://dx.doi.org/10.1016/0022-2836(89)90084-3

    Article  CAS  PubMed  Google Scholar 

  13. Orengo CA, Taylor WR (1996) [36] SSAP: Sequential structure alignment program for protein structure comparison. In: Computer methods for macromolecular sequence analysis. Elsevier, pp 617–635 http://dx.doi.org/10.1016/S0076-6879(96)66038-8

  14. Swindells MB (1995) A procedure for detecting structural domains in proteins. Protein Sci 4(1):103–112 http://dx.doi.org/10.1002/pro.5560040113

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Siddiqui AS, Barton GJ (1995) Continuous and discontinuous domains: an algorithm for the automatic generation of reliable protein domain definitions. Protein Sci 4(5):872–884 http://dx.doi.org/10.1002/pro.5560040507

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Holm L, Sander C (1994) Parser for protein folding units. Proteins 19(3):256–268 http://dx.doi.org/10.1002/prot.340190309

    Article  CAS  PubMed  Google Scholar 

  17. Swindells MB (1995) A procedure for the automatic determination of hydrophobic cores in protein structures. Protein Sci 4(1):93–102 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2142969&tool=pmcentrez&rendertype=abstract

  18. Rossmann MG, Liljas A (1974) Letter: recognition of structural domains in globular proteins. J Mol Biol 85(1):177–181 http://www.ncbi.nlm.nih.gov/pubmed/4365123

  19. Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M et al (2007) The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res 35(Database issue):D291–D297 http://dx.doi.org/10.1093/nar/gkl959. Oxford University Press

    Article  CAS  PubMed  Google Scholar 

  20. Orengo CA, Thornton JM (2005) Protein families and their evolution—a structural perspective. Annu Rev Biochem 74(1):867–900 http://dx.doi.org/10.1146/annurev.biochem.74.082803.133029. Department of Biochemistry and Molecular Biology, University College, London WC1E 6BT, United Kingdom. orengo@biochemistry.ucl.ac.uk

    Article  CAS  PubMed  Google Scholar 

  21. Redfern OC, Harrison A, Dallman T, Pearl FMG, Orengo CA (2007) Cathedral: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures. PLoS Comput Biol 3:e232+ http://dx.plos.org/10.1371/journal.pcbi.0030232

  22. Subbiah S, Laurents DV, Levitt M (1993) Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core. Curr Biol 3(3):141–148 http://dx.doi.org/10.1016/0960-9822(93)90255-M

    Article  CAS  PubMed  Google Scholar 

  23. Zhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33(7):2302–2309 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1084323&tool=pmcentrez&rendertype=abstract

  24. Kolodny R, Koehl P, Levitt M (2005) Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol 346(4):1173–1188 http://dx.doi.org/10.1016/j.jmb.2004.12.032. Department of Structural Biology, Fairchild Building, Stanford University, Stanford CA 94305, USA. trachel@cs.stanford.edu

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Söding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7):951–960 http://dx.doi.org/10.1093/bioinformatics/bti125. Oxford University Press

    Article  PubMed  Google Scholar 

  26. Eddy SR (1996) Hidden Markov models. Curr Opin Struct Biol 6(3):361–365 http://www.ncbi.nlm.nih.gov/pubmed/8804822

  27. Finn RD, Clements J, Arndt W, Miller BL, Wheeler TJ, Schreiber F, et al 2015 HMMER web server: 2015 update. Nucleic Acids Res43(W1):W30–W38.http://nar.oxfordjournals.org/content/43/W1/W30. Oxford University Press

  28. The UniProt Consortium. (2014).UniProt: a hub for protein information. Nucleic Acids Res43(D1):D204–D212http://nar.oxfordjournals.org/content/43/D1/D204

    Article  PubMed Central  Google Scholar 

  29. Madera M (2008) Profilecomparer: a program for scoring and aligning profile hidden Markov models. Bioinformatics 24(22):2630–2631 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2579712{&}tool=pmcentrez{&}rendertype=abstract. Oxford Univ Press

    Google Scholar 

  30. Söding J, Biegert A, Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33(Web Server issue):W244–W248 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1160169{&}tool=pmcentrez{&}rendertype=abstract

    Google Scholar 

  31. Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL et al (2015) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44:D279–D285

    Article  PubMed  PubMed Central  Google Scholar 

  32. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402 http://dx.doi.org/10.1093/nar/25.17.3389.National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. altschul@ncbi.nlm.nih.gov:Oxford University Press

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Lee DA, Rentzsch R, Orengo C (2010) GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains. Nucleic Acids Res 38(3):720–737 http://dx.doi.org/10.1093/nar/gkp1049

    Article  CAS  PubMed  Google Scholar 

  34. Capra JA, Singh M (2008) Characterization and prediction of residues determining protein functional specificity. Bioinformatics 24(13):1473–1480 http://dx.doi.org/10.1093/bioinformatics/btn214Oxford University Press

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Valdar WSJ (2002) Scoring residue conservation. Proteins 48(2):227–241 http://dx.doi.org/10.1002/prot.10146. Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College London, London, United Kingdom: Wiley Subscription Services, Inc., A Wiley Company

    Article  CAS  PubMed  Google Scholar 

  36. Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8(3):275–282 http://view.ncbi.nlm.nih.gov/pubmed/1633570. Department of Biochemistry and Molecular Biology, University College, London, UK

    CAS  PubMed  Google Scholar 

  37. Sali A, Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234(3):779–815 http://www.ncbi.nlm.nih.gov/pubmed/8254673

  38. Webb B, Sali A (2014) Comparative protein structure modeling using MODELLER. Curr Protoc Bioinformatics 47:5.6.1–5.6.32 http://www.ncbi.nlm.nih.gov/pubmed/25199792

  39. Jiang Y, Oron TR, Clark WT, Bankapur AR, D’Andrea D, Lepore R, et al(2016) An expanded evaluation of protein function prediction methods shows an improvement in accuracy. http://arxiv.org/abs/1601.00891

  40. Moya Garcia A, Dawson NL, Kruger FA, et al (2016) A Structural and Functional View of Polypharmacology. bioRxiv

    Google Scholar 

  41. Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7(10):e1002195 http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002195

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Velankar S, Dana JM, Jacobsen J, van Ginkel G, Gane PJ, Luo J et al (2013) SIFTS: structure integration with function, taxonomy and sequences resource. Nucleic Acids Res 41(Database issue):D483–D489 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3531078&tool=pmcentrez&rendertype=abstract

  43. Dessailly BH, Dawson NL, Mizuguchi K, Orengo CA (2013) Functional site plasticity in domain superfamilies. Biochim Biophys Acta 1834(5):874–889

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Yeats C, Redfern OC, Orengo C (2010) A fast and automated solution for accurately resolving protein domain architectures. Bioinformatics 26(6):745–751 http://dx.doi.org/10.1093/bioinformatics/btq034

    Article  CAS  PubMed  Google Scholar 

  45. Lam SD, Dawson NL, Das S, Sillitoe I, Ashford P, Lee D et al (2016) Gene3D: expanding the utility of domain assignments. Nucleic Acids Res 44(D1):D404–D409 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4702871&tool=pmcentrez&rendertype=abstract

  46. Das S, Lee D, Sillitoe I, Dawson NL, Lees JG, Orengo CA (2015) Functional classification of CATH superfamilies: a domain-based approach for protein function annotation. Bioinformatics 31(21):3460–3467 http://bioinformatics.oxfordjournals.org/content/31/21/3460.abstract. Oxford University Press

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2803857&tool=pmcentrez&rendertype=abstract

  48. Lees JG, Lee D, Studer RA, Dawson NL, Sillitoe I, Das S et al (2014) Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis. Nucleic Acids Res 42(Database issue):D240–D245 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3965083&tool=pmcentrez&rendertype=abstract

  49. Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y et al (2014) DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res 42(Database issue):D1091–D1097 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3965102&tool=pmcentrez&rendertype=abstract

  50. Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C et al (2012) The IntAct molecular interaction database in 2012. Nucleic Acids Res 40(Database issue):D841–D846 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3245075&tool=pmcentrez&rendertype=abstract

  51. Supek F, Bošnjak M, Škunca N, Šmuc T (2011) REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One 6(7):e21800 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0021800

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Furnham N, Sillitoe I, Holliday GL, Cuff AL, Rahman SA, Laskowski RA et al (2012) FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies. Nucleic Acids Res 40(Database issue):D776–D782 http://dx.doi.org/10.1093/nar/gkr852Oxford University Press

    Article  CAS  PubMed  Google Scholar 

  53. Holliday GL, Almonacid DE, Bartlett GJ, O’Boyle NM, Torrance JW, Murray-Rust P et al (2007) MACiE (Mechanism, Annotation and Classification in Enzymes): novel tools for searching catalytic mechanisms. Nucleic Acids Res 35(Database issue):D515–D520 http://nar.oxfordjournals.org/content/35/suppl{_}1/D515.short

    Google Scholar 

  54. Rahman SA, Cuesta SM, Furnham N, Holliday GL, Thornton JM. (2014) EC-BLAST: a tool to automatically search and compare enzyme reactions. Nat Methods11(2):171–174 http://dx.doi.org/10.1038/nmeth.2803. Nature Publishing Group.

  55. Tamuri AU, Laskowski RA (2010) ArchSchema: a tool for interactive graphing of related Pfam domain architectures. Bioinformatics 26(9):1260–1261 http://www.ncbi.nlm.nih.gov/pubmed/20299327

  56. Sillitoe I, Dawson N, Thornton J, Orengo C (2015) The history of the CATH structural classification of protein domains. Biochimie http://www.sciencedirect.com/science/article/pii/S0300908415002515

  57. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25(1):25–29 Nature Publishing Group

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

N.L.D. acknowledges funding from the Wellcome Trust (Award number: 104960/Z/14/Z). I.S. acknowledges funding from the BBSRC (Award number: BB/K020013/1). J.G.L. acknowledges funding from the BBSRC (Award number: BB/L002817/1). S.D.L. acknowledges funding from the Malaysian Ministry of Education.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalie L. Dawson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this protocol

Cite this protocol

Dawson, N.L., Sillitoe, I., Lees, J.G., Lam, S.D., Orengo, C.A. (2017). CATH-Gene3D: Generation of the Resource and Its Use in Obtaining Structural and Functional Annotations for Protein Sequences. In: Wu, C., Arighi, C., Ross, K. (eds) Protein Bioinformatics. Methods in Molecular Biology, vol 1558. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6783-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6783-4_4

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6781-0

  • Online ISBN: 978-1-4939-6783-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics