Protein Domain Architectures

Mulder, Nicola J.

doi:10.1007/978-1-60327-241-4_5

Nicola J. Mulder³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 609))

3557 Accesses

Abstract

Proteins are composed of functional units, or domains, that can be found alone or in combination with other domains. Analysis of protein domain architectures and the movement of protein domains within and across different genomes provide clues about the evolution of protein function. The classification of proteins into families and domains is provided through publicly available tools and databases that use known protein domains to predict other members in new proteins sequences. Currently at least 80% of the main protein sequence databases can be classified using these tools, thus providing a large data set to work from for analyzing protein domain architectures. Each of the protein domain databases provide intuitive web interfaces for viewing and analyzing their domain classifications and provide their data freely for downloading. Some of the main protein family and domain databases are described here, along with their Web-based tools for analyzing domain architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 159.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Krishnadev, O., Rekha, N., Pandit, S. B., Abhiman, S., Mohanty, S., Swapna, L. S., Gore, S., Srinivasan, N. (2005) PRODOC: a resource for the comparison of tethered protein domain architectures with in-built information on remotely related domain families. Nucleic Acids Res 33, W126–W129.
Article CAS PubMed Google Scholar
Berman, H., Henrick, K., Nakamura, H., Markley, J. L. (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 35, D301–D303.
Article CAS PubMed Google Scholar
Gribskov, M., Luthy, R., Eisenberg, D. (1990) Profile analysis. Methods Enzymol 183, 146–159.
Article CAS PubMed Google Scholar
Krogh, A., Brown, M., Mian, I. S., Sjolander, K., Haussler, D. (1994) Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol 235(5), 1501–1531.
Article CAS PubMed Google Scholar
Finn, R. D., Mistry, J., Schuster-Bockler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S. R., Sonnhammer, E. L., Bateman, A. (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34, D247–D251.
Article CAS PubMed Google Scholar
Attwood, T. K., Bradley, P., Flower, D. R., Gaulton, A., Maudling, N., Mitchell, A. L., Moulton, G., Nordle, A., Paine, K., Taylor, P., Uddin, A., Zygouri, C. (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res 31, 400–402.
Article CAS PubMed Google Scholar
Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P. S., Pagni, M., Sigrist, C. J. (2006) The PROSITE database. Nucleic Acids Res 34, D227–D230.
Article CAS PubMed Google Scholar
Bru, C., Courcelle, E., Carrere, S., Beausse, Y., Dalmar, S., Kahn, D. (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res 33, D212–D215.
Article CAS PubMed Google Scholar
Letunic, I., Copley, R. R., Pils, B., Pinkert, S., Schultz, J., Bork, P. (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res 34, D257–D260.
Article CAS PubMed Google Scholar
Selengut, J. D., Haft, D. H., Davidsen, T., Ganapathy, A., Gwinn-Giglio, M., Nelson, W. C., Richter, A. R., White, O. (2007) TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res 35, D260–D264.
Article CAS PubMed Google Scholar
Wu, C. H., Nikolskaya, A., Huang, H., Yeh, L. S., Natale, D. A., Vinayaka, C. R., Hu, Z. Z., Mazumder, R., Kumar, S., Kourtesis, P., Ledley, R. S., Suzek, B. E., Arminski, L., Chen, Y., Zhang, J., Cardenas, J. L., Chung, S., Castro-Alvear, J., Dinkov, G., Barker, W. C. (2004) PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res 32, D112–D114.
Article CAS PubMed Google Scholar
Wilson, D., Madera, M., Vogel, C., Chothia, C., Gough, J. (2007) The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Res 35, D308–D313.
Article CAS PubMed Google Scholar
Yeats, C., Maibaum, M., Marsden, R., Dibley, M., Lee, D., Addou, S., Orengo, C. A.. (2006) Gene3D: modelling protein structure, function and evolution. Nucleic Acids Res 34, D281–D284.
Article CAS PubMed Google Scholar
Mi, H., Guo, N., Kejariwal, A., Thomas, P. D. (2007) PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways. Nucleic Acids Res 35, D247–D252.
Article CAS PubMed Google Scholar
Marchler-Bauer, A., Anderson, J. B., Cherukuri, P. F., DeWeese-Scott, C., Geer, L. Y., Gwadz, M., He, S., Hurwitz, D. I., Jackson, J. D., Ke, Z., Lanczycki, C. J., Liebert, C. A., Liu, C., Lu, F., Marchler, G. H., Mullokandov, M., Shoemaker, B. A., Simonyan, V., Song, J. S., Thiessen, P. A., Yamashita, R. A., Yin, J. J., Zhang, D., Bryant, S. H. (2005) CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res 33, D192–D196.
Article CAS PubMed Google Scholar
Portugaly, E., Linial, N., Linial, M. (2006) EVEREST: a collection of evolutionary conserved protein domains. Nucleic Acids Res 34, D1–D6.
Article Google Scholar
UniProt Consortium (2007) The Universal Protein Resource (UniProt). Nucleic Acids Res 35, D193–D197.
Google Scholar
Geer, L. Y., Domrachev, M., Lipman, D. J., Bryant, S. H. (2002) CDART: protein homology by domain architecture. Genome Res 12(10), 1619–1623.
Article CAS PubMed Google Scholar
Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17), 3389–3402.
Article CAS PubMed Google Scholar
Tatusov, R. L., Natale, D. A., Garkavtsev, I. V., Tatusova, T. A., Shankavaram, U. T., Rao, B. S., Kiryutin, B., Galperin, M. Y., Fedorova, N. D., Koonin, E. V. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29(1), 22–28.
Article CAS PubMed Google Scholar
Mulder, N. J., Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Buillard, V., Cerutti, L., Copley, R., Courcelle, E., Das, U., Daugherty, L., Dibley, M., Finn, R., Fleischmann, W., Gough, J., Haft, D., Hulo, N., Hunter, S., Kahn, D., Kanapin, A., Kejariwal, A., Labarga, A., Langendijk-Genevaux, P. S., Lonsdale, D., Lopez, R., Letunic, I., Madera, M., Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Nikolskaya, A. N., Orchard, S., Orengo, C., Petryszak, R., Selengut, J. D., Sigrist, C. J., Thomas, P. D., Valentin, F., Wilson, D., Wu, C. H., Yeats, C. (2007) New developments in the InterPro database. Nucleic Acids Res 35, D224–D228.
Article CAS PubMed Google Scholar
Gene Ontology Consortium (2006) The Gene Ontology (GO) project in 2006. Nucleic Acids Res 34, D322–D326.
Google Scholar
Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R., Lopez, R. (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33, W116–W120.
Article CAS PubMed Google Scholar
Andreeva, A., Howorth, D., Brenner, S. E., Hubbard, T. J., Chothia, C., Murzin, A. G. (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32, D226–D229.
Article CAS PubMed Google Scholar
Pearl, F., Todd, A., Sillitoe, I., Dibley, M., Redfern, O., Lewis, T., Bennett, C., Marsden, R., Grant, A., Lee, D., Akpor, A., Maibaum, M., Harrison, A., Dallman, T., Reeves, G., Diboun, I., Addou, S., Lise, S., Johnston, C., Sillero, A., Thornton, J., Orengo, C. (2005) The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 33, D247–D251.
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

National Bioinformatics Network Node, Institute for Infectious Diseases and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Nicola J. Mulder

Authors

Nicola J. Mulder
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Max F. Perutz Laboratories GmbH, Universität Wien, Dr. Bohr-Gasse 9, Wien, 1030, Austria
Oliviero Carugo
Research (A*STAR), Agency for Science & Technology, Biopolis Street 30, Singapore, 138671, Singapore
Frank Eisenhaber

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Mulder, N.J. (2010). Protein Domain Architectures. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 609. Humana Press. https://doi.org/10.1007/978-1-60327-241-4_5

Download citation

DOI: https://doi.org/10.1007/978-1-60327-241-4_5
Published: 30 October 2009
Publisher Name: Humana Press
Print ISBN: 978-1-60327-240-7
Online ISBN: 978-1-60327-241-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics