Skip to main content

Protein Domain Architectures

  • Protocol
  • First Online:
Data Mining Techniques for the Life Sciences

Part of the book series: Methods in Molecular Biology ((MIMB,volume 609))

  • 3557 Accesses

Abstract

Proteins are composed of functional units, or domains, that can be found alone or in combination with other domains. Analysis of protein domain architectures and the movement of protein domains within and across different genomes provide clues about the evolution of protein function. The classification of proteins into families and domains is provided through publicly available tools and databases that use known protein domains to predict other members in new proteins sequences. Currently at least 80% of the main protein sequence databases can be classified using these tools, thus providing a large data set to work from for analyzing protein domain architectures. Each of the protein domain databases provide intuitive web interfaces for viewing and analyzing their domain classifications and provide their data freely for downloading. Some of the main protein family and domain databases are described here, along with their Web-based tools for analyzing domain architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Krishnadev, O., Rekha, N., Pandit, S. B., Abhiman, S., Mohanty, S., Swapna, L. S., Gore, S., Srinivasan, N. (2005) PRODOC: a resource for the comparison of tethered protein domain architectures with in-built information on remotely related domain families. Nucleic Acids Res 33, W126–W129.

    Article  CAS  PubMed  Google Scholar 

  2. Berman, H., Henrick, K., Nakamura, H., Markley, J. L. (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 35, D301–D303.

    Article  CAS  PubMed  Google Scholar 

  3. Gribskov, M., Luthy, R., Eisenberg, D. (1990) Profile analysis. Methods Enzymol 183, 146–159.

    Article  CAS  PubMed  Google Scholar 

  4. Krogh, A., Brown, M., Mian, I. S., Sjolander, K., Haussler, D. (1994) Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol 235(5), 1501–1531.

    Article  CAS  PubMed  Google Scholar 

  5. Finn, R. D., Mistry, J., Schuster-Bockler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S. R., Sonnhammer, E. L., Bateman, A. (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34, D247–D251.

    Article  CAS  PubMed  Google Scholar 

  6. Attwood, T. K., Bradley, P., Flower, D. R., Gaulton, A., Maudling, N., Mitchell, A. L., Moulton, G., Nordle, A., Paine, K., Taylor, P., Uddin, A., Zygouri, C. (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res 31, 400–402.

    Article  CAS  PubMed  Google Scholar 

  7. Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P. S., Pagni, M., Sigrist, C. J. (2006) The PROSITE database. Nucleic Acids Res 34, D227–D230.

    Article  CAS  PubMed  Google Scholar 

  8. Bru, C., Courcelle, E., Carrere, S., Beausse, Y., Dalmar, S., Kahn, D. (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res 33, D212–D215.

    Article  CAS  PubMed  Google Scholar 

  9. Letunic, I., Copley, R. R., Pils, B., Pinkert, S., Schultz, J., Bork, P. (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res 34, D257–D260.

    Article  CAS  PubMed  Google Scholar 

  10. Selengut, J. D., Haft, D. H., Davidsen, T., Ganapathy, A., Gwinn-Giglio, M., Nelson, W. C., Richter, A. R., White, O. (2007) TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res 35, D260–D264.

    Article  CAS  PubMed  Google Scholar 

  11. Wu, C. H., Nikolskaya, A., Huang, H., Yeh, L. S., Natale, D. A., Vinayaka, C. R., Hu, Z. Z., Mazumder, R., Kumar, S., Kourtesis, P., Ledley, R. S., Suzek, B. E., Arminski, L., Chen, Y., Zhang, J., Cardenas, J. L., Chung, S., Castro-Alvear, J., Dinkov, G., Barker, W. C. (2004) PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res 32, D112–D114.

    Article  CAS  PubMed  Google Scholar 

  12. Wilson, D., Madera, M., Vogel, C., Chothia, C., Gough, J. (2007) The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Res 35, D308–D313.

    Article  CAS  PubMed  Google Scholar 

  13. Yeats, C., Maibaum, M., Marsden, R., Dibley, M., Lee, D., Addou, S., Orengo, C. A.. (2006) Gene3D: modelling protein structure, function and evolution. Nucleic Acids Res 34, D281–D284.

    Article  CAS  PubMed  Google Scholar 

  14. Mi, H., Guo, N., Kejariwal, A., Thomas, P. D. (2007) PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways. Nucleic Acids Res 35, D247–D252.

    Article  CAS  PubMed  Google Scholar 

  15. Marchler-Bauer, A., Anderson, J. B., Cherukuri, P. F., DeWeese-Scott, C., Geer, L. Y., Gwadz, M., He, S., Hurwitz, D. I., Jackson, J. D., Ke, Z., Lanczycki, C. J., Liebert, C. A., Liu, C., Lu, F., Marchler, G. H., Mullokandov, M., Shoemaker, B. A., Simonyan, V., Song, J. S., Thiessen, P. A., Yamashita, R. A., Yin, J. J., Zhang, D., Bryant, S. H. (2005) CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res 33, D192–D196.

    Article  CAS  PubMed  Google Scholar 

  16. Portugaly, E., Linial, N., Linial, M. (2006) EVEREST: a collection of evolutionary conserved protein domains. Nucleic Acids Res 34, D1–D6.

    Article  Google Scholar 

  17. UniProt Consortium (2007) The Universal Protein Resource (UniProt). Nucleic Acids Res 35, D193–D197.

    Google Scholar 

  18. Geer, L. Y., Domrachev, M., Lipman, D. J., Bryant, S. H. (2002) CDART: protein homology by domain architecture. Genome Res 12(10), 1619–1623.

    Article  CAS  PubMed  Google Scholar 

  19. Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17), 3389–3402.

    Article  CAS  PubMed  Google Scholar 

  20. Tatusov, R. L., Natale, D. A., Garkavtsev, I. V., Tatusova, T. A., Shankavaram, U. T., Rao, B. S., Kiryutin, B., Galperin, M. Y., Fedorova, N. D., Koonin, E. V. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29(1), 22–28.

    Article  CAS  PubMed  Google Scholar 

  21. Mulder, N. J., Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Buillard, V., Cerutti, L., Copley, R., Courcelle, E., Das, U., Daugherty, L., Dibley, M., Finn, R., Fleischmann, W., Gough, J., Haft, D., Hulo, N., Hunter, S., Kahn, D., Kanapin, A., Kejariwal, A., Labarga, A., Langendijk-Genevaux, P. S., Lonsdale, D., Lopez, R., Letunic, I., Madera, M., Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Nikolskaya, A. N., Orchard, S., Orengo, C., Petryszak, R., Selengut, J. D., Sigrist, C. J., Thomas, P. D., Valentin, F., Wilson, D., Wu, C. H., Yeats, C. (2007) New developments in the InterPro database. Nucleic Acids Res 35, D224–D228.

    Article  CAS  PubMed  Google Scholar 

  22. Gene Ontology Consortium (2006) The Gene Ontology (GO) project in 2006. Nucleic Acids Res 34, D322–D326.

    Google Scholar 

  23. Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R., Lopez, R. (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33, W116–W120.

    Article  CAS  PubMed  Google Scholar 

  24. Andreeva, A., Howorth, D., Brenner, S. E., Hubbard, T. J., Chothia, C., Murzin, A. G. (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32, D226–D229.

    Article  CAS  PubMed  Google Scholar 

  25. Pearl, F., Todd, A., Sillitoe, I., Dibley, M., Redfern, O., Lewis, T., Bennett, C., Marsden, R., Grant, A., Lee, D., Akpor, A., Maibaum, M., Harrison, A., Dallman, T., Reeves, G., Diboun, I., Addou, S., Lise, S., Johnston, C., Sillero, A., Thornton, J., Orengo, C. (2005) The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 33, D247–D251.

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Mulder, N.J. (2010). Protein Domain Architectures. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 609. Humana Press. https://doi.org/10.1007/978-1-60327-241-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-241-4_5

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-60327-240-7

  • Online ISBN: 978-1-60327-241-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics