The Protein Data Bank (PDB) is the repository for the three-dimensional structures of biological macromolecules, determined by experimental methods. The data in the archive are free and easily available via the Internet from any of the worldwide centers managing this global archive. These data are used by scientists, researchers, bioinformatics specialists, educators, students, and lay audiences to understand biological phenomena at a molecular level. Analysis of these structural data also inspires and facilitates new discoveries in science. This chapter describes the tools and methods currently used for deposition, processing, and release of data in the PDB. References to future enhancements are also included.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer Jr., E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T., and Tasumi, M. (1977) Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542.
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., and Bourne, P. E. (2000) The Protein Data Bank. Nucl. Acids Res. 28, 235–242.
Berman, H. M., Henrick, K., and Nakamura, H. (2003) Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 10, 980.
Berman, H., Henrick, K., Nakamura, H., and Markley, J. L. (2006) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucl. Acids Res. doi: 10.1093/nar/gkl971.
Ulrich, E. L., Markley, J. L., and Kyogoku, Y. (1989) Creation of a Nuclear Magnetic Resonance Data Repository and Literature Database. Protein Seq. Data Anal. 2, 23–37.
Deshpande, N., Addess, K. J., Bluhm, W. F., Merino-Ott, J. C., Townsend-Merino, W., Zhang, Q., Knezevich, C., Xie, L., Chen, L., Feng, Z., Kramer Green, R., Flippen-Anderson, J. L., Westbrook, J., Berman, H. M., and Bourne, P. E. (2005) The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucl. Acids Res. 33, D233–D37.
Kouranov, A., Xie, L., de la Cruz, J., Chen, L., Westbrook, J., Bourne, P. E., and Berman, H. M. (2006) The RCSB PDB information portal for structural genomics Nucl. Acids Res. 34, D302–D305.
Tagari, M., Tate, J., Swaminathan, G. J., Newman, R., Naim, A., Vranken, W., Kapopoulou, A., Hussain, A., Fillon, J., Henrick, K., and Velankar, S. (2006) E-MSD: improving data deposition and structure quality. Nucl. Acids Res. 34, D287–290.
Henrick, K., and Thornton, J. M. (1998) PQS: A Protein Quarternary File Server. Trends Biochem. Sci. 23, 358–361.
Kinoshita, K., and Nakamura, H. (2004) eF-site and PDBjViewer: database and viewer for protein functional sites. Bioinformatics 20, 1329–1330.
Standley, D. M., Toh, H., and Nakamura, H. (2005) GASH: an improved algorithm for maximizing the number of equivalent residues between two protein structures. BMC Bioinformatics 6, 221.
Wako, H., Kato, M., and Endo, S. (2004) ProMode: a database of normal mode analyses on protein molecules with a full-atom model. Bioinformatics 20, 2035–2043.
Stevens, R. C., Yokoyama, S., and Wilson, I. A. (2001) Global efforts in structural genomics. Science 294, 89–92.
Callaway, J., Cummings, M., Deroski, B., Esposito, P., Forman, A., Langdon, P., Libeson, M., McCarthy, J., Sikora, J., Xue, D., Abola, E., Bernstein, F., Manning, N., Shea, R., Stampf, D., and Sussman, J. (1996) Protein Data Bank Contents Guide: Atomic coordinate entry format description. Brookhaven National Laboratory, http://www.wwpdb.org/docs.html
Dutta, S., and Berman, H. M. (2005) Large macromolecular complexes in the Protein Data Bank: a status report. Structure 13, 381–3-88.
Yusupov, M. M., Yusupova, G. Z., Baucom, A., Lieberman, K., Earnest, T. N., Cate, J. H. D., and Noller, H. F. (2001) Crystal structure of the ribosome at 5.5 Å resolution. Science 282, 883–896.
Chen, B., Colgrave, M. L., Daly, N. L., Rosengren, K. J., Gustafson, K. R., Craik, D. J. (2005) Isolation and characterization of novel cyclotides from Viola heder-aceae: solution structure and anti-HIV activity of vhl-1, a leaf-specific expressed cyclotide. J. Biol. Chem. 280, 22395–22405.
Ciszak, E. M., Makal, A., Hong, Y. S., Vettaikkorumakankauv, A. K., Korotchkina, L. G., Patel, M. S. (2006) How dihydrolipoamide dehydrogenase-binding protein binds dihydrolipoamide dehydrogenase in the human pyruvate dehydrogenase complex. J. Biol. Chem. 281, 648–655.
Bourne, P. E., Berman, H. M., Watenpaugh, K., Westbrook, J. D., and Fitzgerald, P. M. D. (1997) The macromolecular Crystallographic Information File (mmCIF). Meth. Enzymol. 277, 571–590.
Fitzgerald, P. M. D., Westbrook, J. D., Bourne, P. E., McMahon, B., Watenpaugh, K. D., and Berman, H. M. (2005) Macromolecular dictionary (mmCIF), in (Hall, S. R., and McMahon, B., eds.), International Tables for Crystallography Vol. G. Definition and exchange of crystallographic data, pp. 295–443, Springer, Dordrecht, The Netherlands.
Westbrook, J., Henrick, K., Ulrich, E. L., and Berman, H. M. (2005) The Protein Data Bank exchange data dictionary, in (Hall, S. R., and McMahon, B., eds.), International Tables for Crystallography Vol. G. Definition and exchange of crystallographic data, pp. 195–198, Springer, Dordrecht, The Netherlands.
Westbrook, J. D., Berman, H. M., and Hall, S. R. (2005) Specification of a relational Dictionary Definition Language (DDL2), in (Hall, S. R., and McMahon, B., eds.), International Tables for Crystallography Vol. G. Definition and exchange of crystallographic data, pp. 61–72, Springer, Dordrecht, The Netherlands.
Westbrook, J., Ito, N., Nakamura, H., Henrick, K., and Berman, H. M. (2005) PDBML: The representation of archival macromolecular structure data in XML. Bioinformatics 21, 988–992.
Chen, L., Oughtred, R., Berman, H. M., and Westbrook, J. (2004) TargetDB: a target registration database for structural genomics projects. Bioinformatics 20, 2860–2862.
Albeck, S., Alzari, P., Andreini, C., Banci, L., Berry, I. M., Bertini, I., Cambillau, C., Canard, B., Carter, L., Cohen, S. X., Diprose, J. M., Dym, O., Esnouf, R. M., Felder, C., Ferron, F., Guillemot, F., Hamer, R., Jelloul, M. B., Laskowski, R. A., Laurent, T., Longhi, S., Lopez, R., Luchinat, C., Malet, H., Mochel, T., Morris, R. J., Moulinier, L., Oinn, T., Pajon, A., Peleg, Y., Perrakis, A., Poch, O., Prilusky, J., Rachedi, A., Ripp, R., Rosato, A., Silman, I., Stuart, D. I., Sussman, J. L., Thierry, J.-C., Thompson, J. D., Thornton, J. M., Unger, T., Vaughan, B., Vranken, W., Watson, J. D., Whamond, G., and Henrick, K. (2006) SPINE bioinformatics and data-management aspects of high-throughput structural biology. Acta Cryst. D62, 1184–1195.
Pajon, A., Ionides, J., Diprose, J., Fillon, J., Fogh, R., Ashton, A. W., Berman, H., Boucher, W., Cygler, M., Deleury, E., Esnouf, R., Janin, J., Kim, R., Krimm, I., Lawson, C. L., Oeuillet, E., Poupon, A., Raymond, S., Stevens, T., van Tilbeurgh, H., Westbrook, J., Wood, P., Ulrich, E., Vranken, W., Xueli, L., Laue, E., Stuart, D. I., and Henrick, K. (2005) Design of a data model for developing laboratory information management and analysis systems for protein production. Proteins 58, 278–284.
Winn, M. D., Ashton, A.W., Briggs, P.J., Ballarda C.C. and Patel, P. (2002) Ongoing developments in CCP4 for high-throughput structure determination. Acta Crystallogr. D Biol. Crystallogr. 58, 1929–1936.
Yang, H., Guranovic, V., Dutta, S., Feng, Z., Berman, H. M., and Westbrook, J. (2004) Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank. Acta Crystallogr. D Biol. Crystallogr. 60, 1833–1839.
Feng, Z., Chen, L., Maddula, H., Akcan, O., Oughtred, R., Berman, H. M., and Westbrook, J. (2004) Ligand Depot: a data warehouse for ligands bound to macro-molecules. Bioinformatics 20, 2153–2155.
Golovin, A., Oldfield, T. J., Tate, J. G., Velankar, S., Barton, G. J., Boutselakis, H., Dimitropoulos, D., Fillon, J., Hussain, A., Ionides, J. M., John, M., Keller, P. A., Krissinel, E., McNeil, P., Naim, A., Newman, R., Pajon, A., Pineda, J., Rachedi, A., Copeland, J., Sitnov, A., Sobhany, S., Suarez-Uruena, A., Swaminathan, G. J., Tagari, M., Tromm, S., Vranken, W., and Henrick, K. (2004) E-MSD: an integrated data resource for bioinformatics. Nucl. Acids Res. 32, D211–216.
Ihlenfeldt, W.-D., Voigt, J. H., Bienfait, B., Oellien, F., and Nicklaus, M. C. (2002) Enhanced CACTVS Browser of the Open NCI Database. J. Chem. Inf. Comput. Sci. 42, 46–57.
Ihlenfeldt, W. D., Takahashi, Y., Abe, H., and Sasaki, S. (1994) Computation and management of chemical properties in CACTVS: an extensible networked approach toward modularity and flexibility. J. Chem. Inf. Comp. Sci. 34, 109–116.
Wheeler, D. L., Chappey, C., Lash, A. E., Leipe, D. D., Madden, T. L., Schuler, G. D., Tatusova, T. A., and Rapp, B. A. (2000) Database resources of the National Center for Biotechnology Information. Nucl. Acids Res. 28, 10–14.
Phan, I. Q., Pilbout, S. F., Fleischmann, W., and Bairoch, A. (2003) NEWT, a new taxonomy portal. Nucl. Acids Res. 31, 3822–3823.
Bairoch, A., Apweiler, R., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M. J., Natale, D. A., O'Donovan, C., Redaschi, N., and Yeh, L. S. (2005) The Universal Protein Resource (UniProt). Nucl. Acids Res. 33, D154–159.
Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., and Wheeler, D. L. (2005) GenBank. Nucl. Acids Res. 33, D34–38.
Okubo, K., Sugawara, H., Gojobori, T., and Tateno, Y. (2006) DDBJ in preparation for overview of research activities behind data submissions. Nucl. Acids Res. 34, D6–9.
Kanz, C., Aldebert, P., Althorpe, N., Baker, W., Baldwin, A., Bates, K., Browne, P., Broek, A. v. d., Castro, M., Cochrane, G., Duggan, K., Eberhardt, R., Faruque, N., Gamble, J., Diez, F. G., Harte, N., Kulikova, T., Lin, Q., Lombard, V., Lopez, R., Mancuso, R., McHale, M., Nardone, F., Silventoinen, V., Sobhany, S., Stoehr, P., Tuli, M. A., Tzouvara, K., Vaughan, R., Wu, D., Zhu, W., and Apweiler, R. (2005) The EMBL Nucleotide Sequence Database. Nucl. Acids Res. 33, D29–33.
Krissinel, E., and Henrick, K. (2005) Detection of Protein Assemblies in Crystals, in (Berthold, M.R., Glen, R., Diederichs, K., Kohlbacher., O. Fischer., I. (eds.)), CompLife 2005, pp. 163–174, Springer-Verlag, Berlin, Heidelberg.
Hooft, R. W., Vriend, G., Sander, C., and Abola, E. E. (1996) Errors in protein structures. Nature 381, 272.
Laskowski, R. A., McArthur, M. W., Moss, D. S., and Thornton, J. M. (1993) PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291.
Lovell, S. C., Davis, I. W., Arendall, W. B., 3rd, de Bakker, P. I., Word, J. M., Prisant, M. G., Richardson, J. S., and Richardson, D. C. (2003) Structure validation by Calpha geometry: phi,psi and Cbeta deviation. Proteins 50, 437–450.
Westbrook, J., Feng, Z., Burkhardt, K., and Berman, H. M. (2003) Validation of protein structures for the Protein Data Bank. Meth. Enzymol. 374, 370–385.
Sayle, R., and Milner-White, E. J. (1995) RasMol: biomolecular graphics for all. Trends Biochem. Sci. 20, 374.
Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., and Ferrin, T. E. (2004) UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612.
Hartshorn, M. J. (2002) AstexViewer: a visualisation aid for structure-based drug design. J. Comput. Aided Mol. Des. 16, 871–881.
Vaguine, A. A., Richelle, J., and Wodak, S. J. (1999) SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallogr. D Biol. Crystallogr. 55, 191–205.
Kleywegt, G. J., Harris, M. R., Zou, J., Taylor, T. C., Wählby, A., and Jones, T. A. (2004) The Uppsala Electron-Density Server. The Uppsala Electron-Density Server D60, 2240–2249.
Doreleijers, J. F., Nederveen, A. J., Vranken, W., Lin, J., Bonvin, A. M., Kaptein, R., Markley, J. L., and Ulrich, E. L. (2005) BioMagResBank databases DOCR and FRED containing converted and filtered sets of experimental NMR restraints and coordinates from over 500 protein PDB structures. J. Biomol. NMR 32, 1–12.
Henrick, K., Newman, R., Tagari, M., and Chagoyen, M. (2003) EMDep: a web-based system for the deposition and validation of high-resolution electron microscopy macromolecular structural information. J. Struct. Biol. 144, 228–237.
Berman, H. M., Burley, S. K., Chiu, W., Sali, A., Adzhubei, A., Bourne, P. E., Bryant, S. H., Roland L., Dunbrack, J., Fidelis, K., Frank, J., Godzik, A., Henrick, K., Joachimiak, A., Heymann, B., Jones, D., Markley, J. L., Moult, J., Montelione, G. T., Orengo, C., Rossmann, M. G., Rost, B., Saibil, H., Schwede, T., Standley, D. M., and Westbrook, J. D. (2006) Outcome of a workshop on archiving structural models of biological macromolecules. Structure 14, 1211–1217.
Hempstead, P. D., Yewdall, S. J., Fernie, A. R., Lawson, D. M., Artymiuk, P. J., Rice, D. W., Ford, G. C., and Harrison, P. M. (1997) Comparison of the three-dimensional structures of recombinant human H and horse L ferritins at high resolution. J. Mol. Biol. 268, 424–448.
Acknowledgments
The authors acknowledge the staff of all wwPDB sites, and our advisory committees.
At the RCSB PDB, we acknowledge the programming staff consisting of Li Chen, Zukang Feng, Vladimir Guranovic, Andrei Kouranov, John Westbrook, Huanwang Yang; and the annotation staff consisting of Jaroslaw Blaszczyk, Guanghua Gao, Irina Persikova, Massy Rajabzadeh, Bohdan Schneider, Monica Sekharan, Monica Sundd, Jasmine Young, and Muhammed Yousufuddin.
At MSD-EBI, we acknowledge annotators Adamandia Kapopoulou, Richard Newman, Gaurav Sahni, Glen van Ginkel, Sanchayita Sen, and Sameer Velankar.
At PDBj, we acknowledge annotators Reiko Igarashi, Yumiko Kengaku, Kanna Matsuura and Yasuyo Morita.
The RCSB PDB is operated by Rutgers, The State University of New Jersey and the San Diego Supercomputer Center and the Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California, San Diego. It is supported by funds from the National Science Foundation, the National Institute of General Medical Sciences, the Office of Science, Department of Energy, the National Library of Medicine, the National Cancer Institute, the National Center for Research Resources, the National Institute of Biomedical Imaging and Bioengineering, the National Institute of Neurological Disorders and Stroke, and the National Institute of Diabetes and Digestive and Kidney Diseases.
EBI-MSD is supported by funds from the Wellcome Trust (GR062025MA), the European Union (TEMBLOR, NMRQUAL, SPINE, AUTOSTRUCT and IIMS awards), CCP4, the Biotechnology and Biological Sciences Research Council (UK), the Medical Research Council (UK) and European Molecular Biology Laboratory. PDBj is supported by grant-in-aid from the Institute for Bioinformatics Research and Development, Japan Science and Technology Agency (BIRD-JST), and the Ministry of Education, Culture, Sports, Science and Techno
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Dutta, S. et al. (2008). Data Deposition and Annotation at the Worldwide Protein Data Bank. In: Kobe, B., Guss, M., Huber, T. (eds) Structural Proteomics. Methods in Molecular Biology™, vol 426. Humana Press. https://doi.org/10.1007/978-1-60327-058-8_5
Download citation
DOI: https://doi.org/10.1007/978-1-60327-058-8_5
Publisher Name: Humana Press
Print ISBN: 978-1-58829-809-6
Online ISBN: 978-1-60327-058-8
eBook Packages: Springer Protocols