Skip to main content

Data Deposition and Annotation at the Worldwide Protein Data Bank

  • Protocol
Structural Proteomics

The Protein Data Bank (PDB) is the repository for the three-dimensional structures of biological macromolecules, determined by experimental methods. The data in the archive are free and easily available via the Internet from any of the worldwide centers managing this global archive. These data are used by scientists, researchers, bioinformatics specialists, educators, students, and lay audiences to understand biological phenomena at a molecular level. Analysis of these structural data also inspires and facilitates new discoveries in science. This chapter describes the tools and methods currently used for deposition, processing, and release of data in the PDB. References to future enhancements are also included.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer Jr., E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T., and Tasumi, M. (1977) Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542.

    Article  CAS  PubMed  Google Scholar 

  2. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., and Bourne, P. E. (2000) The Protein Data Bank. Nucl. Acids Res. 28, 235–242.

    Article  CAS  PubMed  Google Scholar 

  3. Berman, H. M., Henrick, K., and Nakamura, H. (2003) Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 10, 980.

    Article  CAS  PubMed  Google Scholar 

  4. Berman, H., Henrick, K., Nakamura, H., and Markley, J. L. (2006) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucl. Acids Res. doi: 10.1093/nar/gkl971.

    Google Scholar 

  5. Ulrich, E. L., Markley, J. L., and Kyogoku, Y. (1989) Creation of a Nuclear Magnetic Resonance Data Repository and Literature Database. Protein Seq. Data Anal. 2, 23–37.

    CAS  PubMed  Google Scholar 

  6. Deshpande, N., Addess, K. J., Bluhm, W. F., Merino-Ott, J. C., Townsend-Merino, W., Zhang, Q., Knezevich, C., Xie, L., Chen, L., Feng, Z., Kramer Green, R., Flippen-Anderson, J. L., Westbrook, J., Berman, H. M., and Bourne, P. E. (2005) The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucl. Acids Res. 33, D233–D37.

    Article  CAS  PubMed  Google Scholar 

  7. Kouranov, A., Xie, L., de la Cruz, J., Chen, L., Westbrook, J., Bourne, P. E., and Berman, H. M. (2006) The RCSB PDB information portal for structural genomics Nucl. Acids Res. 34, D302–D305.

    Article  CAS  PubMed  Google Scholar 

  8. Tagari, M., Tate, J., Swaminathan, G. J., Newman, R., Naim, A., Vranken, W., Kapopoulou, A., Hussain, A., Fillon, J., Henrick, K., and Velankar, S. (2006) E-MSD: improving data deposition and structure quality. Nucl. Acids Res. 34, D287–290.

    Article  CAS  PubMed  Google Scholar 

  9. Henrick, K., and Thornton, J. M. (1998) PQS: A Protein Quarternary File Server. Trends Biochem. Sci. 23, 358–361.

    Article  CAS  PubMed  Google Scholar 

  10. Kinoshita, K., and Nakamura, H. (2004) eF-site and PDBjViewer: database and viewer for protein functional sites. Bioinformatics 20, 1329–1330.

    Article  CAS  PubMed  Google Scholar 

  11. Standley, D. M., Toh, H., and Nakamura, H. (2005) GASH: an improved algorithm for maximizing the number of equivalent residues between two protein structures. BMC Bioinformatics 6, 221.

    Article  PubMed  Google Scholar 

  12. Wako, H., Kato, M., and Endo, S. (2004) ProMode: a database of normal mode analyses on protein molecules with a full-atom model. Bioinformatics 20, 2035–2043.

    Article  CAS  PubMed  Google Scholar 

  13. Stevens, R. C., Yokoyama, S., and Wilson, I. A. (2001) Global efforts in structural genomics. Science 294, 89–92.

    Article  CAS  PubMed  Google Scholar 

  14. Callaway, J., Cummings, M., Deroski, B., Esposito, P., Forman, A., Langdon, P., Libeson, M., McCarthy, J., Sikora, J., Xue, D., Abola, E., Bernstein, F., Manning, N., Shea, R., Stampf, D., and Sussman, J. (1996) Protein Data Bank Contents Guide: Atomic coordinate entry format description. Brookhaven National Laboratory, http://www.wwpdb.org/docs.html

  15. Dutta, S., and Berman, H. M. (2005) Large macromolecular complexes in the Protein Data Bank: a status report. Structure 13, 381–3-88.

    Article  CAS  PubMed  Google Scholar 

  16. Yusupov, M. M., Yusupova, G. Z., Baucom, A., Lieberman, K., Earnest, T. N., Cate, J. H. D., and Noller, H. F. (2001) Crystal structure of the ribosome at 5.5 Å resolution. Science 282, 883–896.

    Article  Google Scholar 

  17. Chen, B., Colgrave, M. L., Daly, N. L., Rosengren, K. J., Gustafson, K. R., Craik, D. J. (2005) Isolation and characterization of novel cyclotides from Viola heder-aceae: solution structure and anti-HIV activity of vhl-1, a leaf-specific expressed cyclotide. J. Biol. Chem. 280, 22395–22405.

    Article  CAS  PubMed  Google Scholar 

  18. Ciszak, E. M., Makal, A., Hong, Y. S., Vettaikkorumakankauv, A. K., Korotchkina, L. G., Patel, M. S. (2006) How dihydrolipoamide dehydrogenase-binding protein binds dihydrolipoamide dehydrogenase in the human pyruvate dehydrogenase complex. J. Biol. Chem. 281, 648–655.

    Article  CAS  PubMed  Google Scholar 

  19. Bourne, P. E., Berman, H. M., Watenpaugh, K., Westbrook, J. D., and Fitzgerald, P. M. D. (1997) The macromolecular Crystallographic Information File (mmCIF). Meth. Enzymol. 277, 571–590.

    Article  CAS  PubMed  Google Scholar 

  20. Fitzgerald, P. M. D., Westbrook, J. D., Bourne, P. E., McMahon, B., Watenpaugh, K. D., and Berman, H. M. (2005) Macromolecular dictionary (mmCIF), in (Hall, S. R., and McMahon, B., eds.), International Tables for Crystallography Vol. G. Definition and exchange of crystallographic data, pp. 295–443, Springer, Dordrecht, The Netherlands.

    Google Scholar 

  21. Westbrook, J., Henrick, K., Ulrich, E. L., and Berman, H. M. (2005) The Protein Data Bank exchange data dictionary, in (Hall, S. R., and McMahon, B., eds.), International Tables for Crystallography Vol. G. Definition and exchange of crystallographic data, pp. 195–198, Springer, Dordrecht, The Netherlands.

    Google Scholar 

  22. Westbrook, J. D., Berman, H. M., and Hall, S. R. (2005) Specification of a relational Dictionary Definition Language (DDL2), in (Hall, S. R., and McMahon, B., eds.), International Tables for Crystallography Vol. G. Definition and exchange of crystallographic data, pp. 61–72, Springer, Dordrecht, The Netherlands.

    Google Scholar 

  23. Westbrook, J., Ito, N., Nakamura, H., Henrick, K., and Berman, H. M. (2005) PDBML: The representation of archival macromolecular structure data in XML. Bioinformatics 21, 988–992.

    Article  CAS  PubMed  Google Scholar 

  24. Chen, L., Oughtred, R., Berman, H. M., and Westbrook, J. (2004) TargetDB: a target registration database for structural genomics projects. Bioinformatics 20, 2860–2862.

    Article  CAS  PubMed  Google Scholar 

  25. Albeck, S., Alzari, P., Andreini, C., Banci, L., Berry, I. M., Bertini, I., Cambillau, C., Canard, B., Carter, L., Cohen, S. X., Diprose, J. M., Dym, O., Esnouf, R. M., Felder, C., Ferron, F., Guillemot, F., Hamer, R., Jelloul, M. B., Laskowski, R. A., Laurent, T., Longhi, S., Lopez, R., Luchinat, C., Malet, H., Mochel, T., Morris, R. J., Moulinier, L., Oinn, T., Pajon, A., Peleg, Y., Perrakis, A., Poch, O., Prilusky, J., Rachedi, A., Ripp, R., Rosato, A., Silman, I., Stuart, D. I., Sussman, J. L., Thierry, J.-C., Thompson, J. D., Thornton, J. M., Unger, T., Vaughan, B., Vranken, W., Watson, J. D., Whamond, G., and Henrick, K. (2006) SPINE bioinformatics and data-management aspects of high-throughput structural biology. Acta Cryst. D62, 1184–1195.

    CAS  Google Scholar 

  26. Pajon, A., Ionides, J., Diprose, J., Fillon, J., Fogh, R., Ashton, A. W., Berman, H., Boucher, W., Cygler, M., Deleury, E., Esnouf, R., Janin, J., Kim, R., Krimm, I., Lawson, C. L., Oeuillet, E., Poupon, A., Raymond, S., Stevens, T., van Tilbeurgh, H., Westbrook, J., Wood, P., Ulrich, E., Vranken, W., Xueli, L., Laue, E., Stuart, D. I., and Henrick, K. (2005) Design of a data model for developing laboratory information management and analysis systems for protein production. Proteins 58, 278–284.

    Article  CAS  PubMed  Google Scholar 

  27. Winn, M. D., Ashton, A.W., Briggs, P.J., Ballarda C.C. and Patel, P. (2002) Ongoing developments in CCP4 for high-throughput structure determination. Acta Crystallogr. D Biol. Crystallogr. 58, 1929–1936.

    Article  CAS  Google Scholar 

  28. Yang, H., Guranovic, V., Dutta, S., Feng, Z., Berman, H. M., and Westbrook, J. (2004) Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank. Acta Crystallogr. D Biol. Crystallogr. 60, 1833–1839.

    Article  Google Scholar 

  29. Feng, Z., Chen, L., Maddula, H., Akcan, O., Oughtred, R., Berman, H. M., and Westbrook, J. (2004) Ligand Depot: a data warehouse for ligands bound to macro-molecules. Bioinformatics 20, 2153–2155.

    Article  CAS  PubMed  Google Scholar 

  30. Golovin, A., Oldfield, T. J., Tate, J. G., Velankar, S., Barton, G. J., Boutselakis, H., Dimitropoulos, D., Fillon, J., Hussain, A., Ionides, J. M., John, M., Keller, P. A., Krissinel, E., McNeil, P., Naim, A., Newman, R., Pajon, A., Pineda, J., Rachedi, A., Copeland, J., Sitnov, A., Sobhany, S., Suarez-Uruena, A., Swaminathan, G. J., Tagari, M., Tromm, S., Vranken, W., and Henrick, K. (2004) E-MSD: an integrated data resource for bioinformatics. Nucl. Acids Res. 32, D211–216.

    Article  CAS  PubMed  Google Scholar 

  31. Ihlenfeldt, W.-D., Voigt, J. H., Bienfait, B., Oellien, F., and Nicklaus, M. C. (2002) Enhanced CACTVS Browser of the Open NCI Database. J. Chem. Inf. Comput. Sci. 42, 46–57.

    CAS  PubMed  Google Scholar 

  32. Ihlenfeldt, W. D., Takahashi, Y., Abe, H., and Sasaki, S. (1994) Computation and management of chemical properties in CACTVS: an extensible networked approach toward modularity and flexibility. J. Chem. Inf. Comp. Sci. 34, 109–116.

    CAS  Google Scholar 

  33. Wheeler, D. L., Chappey, C., Lash, A. E., Leipe, D. D., Madden, T. L., Schuler, G. D., Tatusova, T. A., and Rapp, B. A. (2000) Database resources of the National Center for Biotechnology Information. Nucl. Acids Res. 28, 10–14.

    Article  CAS  PubMed  Google Scholar 

  34. Phan, I. Q., Pilbout, S. F., Fleischmann, W., and Bairoch, A. (2003) NEWT, a new taxonomy portal. Nucl. Acids Res. 31, 3822–3823.

    Article  CAS  PubMed  Google Scholar 

  35. Bairoch, A., Apweiler, R., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M. J., Natale, D. A., O'Donovan, C., Redaschi, N., and Yeh, L. S. (2005) The Universal Protein Resource (UniProt). Nucl. Acids Res. 33, D154–159.

    Article  CAS  PubMed  Google Scholar 

  36. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., and Wheeler, D. L. (2005) GenBank. Nucl. Acids Res. 33, D34–38.

    Article  CAS  PubMed  Google Scholar 

  37. Okubo, K., Sugawara, H., Gojobori, T., and Tateno, Y. (2006) DDBJ in preparation for overview of research activities behind data submissions. Nucl. Acids Res. 34, D6–9.

    Article  CAS  PubMed  Google Scholar 

  38. Kanz, C., Aldebert, P., Althorpe, N., Baker, W., Baldwin, A., Bates, K., Browne, P., Broek, A. v. d., Castro, M., Cochrane, G., Duggan, K., Eberhardt, R., Faruque, N., Gamble, J., Diez, F. G., Harte, N., Kulikova, T., Lin, Q., Lombard, V., Lopez, R., Mancuso, R., McHale, M., Nardone, F., Silventoinen, V., Sobhany, S., Stoehr, P., Tuli, M. A., Tzouvara, K., Vaughan, R., Wu, D., Zhu, W., and Apweiler, R. (2005) The EMBL Nucleotide Sequence Database. Nucl. Acids Res. 33, D29–33.

    Article  CAS  PubMed  Google Scholar 

  39. Krissinel, E., and Henrick, K. (2005) Detection of Protein Assemblies in Crystals, in (Berthold, M.R., Glen, R., Diederichs, K., Kohlbacher., O. Fischer., I. (eds.)), CompLife 2005, pp. 163–174, Springer-Verlag, Berlin, Heidelberg.

    Google Scholar 

  40. Hooft, R. W., Vriend, G., Sander, C., and Abola, E. E. (1996) Errors in protein structures. Nature 381, 272.

    Article  CAS  PubMed  Google Scholar 

  41. Laskowski, R. A., McArthur, M. W., Moss, D. S., and Thornton, J. M. (1993) PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291.

    Article  CAS  Google Scholar 

  42. Lovell, S. C., Davis, I. W., Arendall, W. B., 3rd, de Bakker, P. I., Word, J. M., Prisant, M. G., Richardson, J. S., and Richardson, D. C. (2003) Structure validation by Calpha geometry: phi,psi and Cbeta deviation. Proteins 50, 437–450.

    Article  CAS  PubMed  Google Scholar 

  43. Westbrook, J., Feng, Z., Burkhardt, K., and Berman, H. M. (2003) Validation of protein structures for the Protein Data Bank. Meth. Enzymol. 374, 370–385.

    Article  CAS  PubMed  Google Scholar 

  44. Sayle, R., and Milner-White, E. J. (1995) RasMol: biomolecular graphics for all. Trends Biochem. Sci. 20, 374.

    Article  CAS  PubMed  Google Scholar 

  45. Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., and Ferrin, T. E. (2004) UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612.

    Article  CAS  PubMed  Google Scholar 

  46. Hartshorn, M. J. (2002) AstexViewer: a visualisation aid for structure-based drug design. J. Comput. Aided Mol. Des. 16, 871–881.

    Article  CAS  PubMed  Google Scholar 

  47. Vaguine, A. A., Richelle, J., and Wodak, S. J. (1999) SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallogr. D Biol. Crystallogr. 55, 191–205.

    Article  CAS  Google Scholar 

  48. Kleywegt, G. J., Harris, M. R., Zou, J., Taylor, T. C., Wählby, A., and Jones, T. A. (2004) The Uppsala Electron-Density Server. The Uppsala Electron-Density Server D60, 2240–2249.

    CAS  Google Scholar 

  49. Doreleijers, J. F., Nederveen, A. J., Vranken, W., Lin, J., Bonvin, A. M., Kaptein, R., Markley, J. L., and Ulrich, E. L. (2005) BioMagResBank databases DOCR and FRED containing converted and filtered sets of experimental NMR restraints and coordinates from over 500 protein PDB structures. J. Biomol. NMR 32, 1–12.

    Article  CAS  PubMed  Google Scholar 

  50. Henrick, K., Newman, R., Tagari, M., and Chagoyen, M. (2003) EMDep: a web-based system for the deposition and validation of high-resolution electron microscopy macromolecular structural information. J. Struct. Biol. 144, 228–237.

    Article  CAS  PubMed  Google Scholar 

  51. Berman, H. M., Burley, S. K., Chiu, W., Sali, A., Adzhubei, A., Bourne, P. E., Bryant, S. H., Roland L., Dunbrack, J., Fidelis, K., Frank, J., Godzik, A., Henrick, K., Joachimiak, A., Heymann, B., Jones, D., Markley, J. L., Moult, J., Montelione, G. T., Orengo, C., Rossmann, M. G., Rost, B., Saibil, H., Schwede, T., Standley, D. M., and Westbrook, J. D. (2006) Outcome of a workshop on archiving structural models of biological macromolecules. Structure 14, 1211–1217.

    Article  CAS  PubMed  Google Scholar 

  52. Hempstead, P. D., Yewdall, S. J., Fernie, A. R., Lawson, D. M., Artymiuk, P. J., Rice, D. W., Ford, G. C., and Harrison, P. M. (1997) Comparison of the three-dimensional structures of recombinant human H and horse L ferritins at high resolution. J. Mol. Biol. 268, 424–448.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

The authors acknowledge the staff of all wwPDB sites, and our advisory committees.

At the RCSB PDB, we acknowledge the programming staff consisting of Li Chen, Zukang Feng, Vladimir Guranovic, Andrei Kouranov, John Westbrook, Huanwang Yang; and the annotation staff consisting of Jaroslaw Blaszczyk, Guanghua Gao, Irina Persikova, Massy Rajabzadeh, Bohdan Schneider, Monica Sekharan, Monica Sundd, Jasmine Young, and Muhammed Yousufuddin.

At MSD-EBI, we acknowledge annotators Adamandia Kapopoulou, Richard Newman, Gaurav Sahni, Glen van Ginkel, Sanchayita Sen, and Sameer Velankar.

At PDBj, we acknowledge annotators Reiko Igarashi, Yumiko Kengaku, Kanna Matsuura and Yasuyo Morita.

The RCSB PDB is operated by Rutgers, The State University of New Jersey and the San Diego Supercomputer Center and the Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California, San Diego. It is supported by funds from the National Science Foundation, the National Institute of General Medical Sciences, the Office of Science, Department of Energy, the National Library of Medicine, the National Cancer Institute, the National Center for Research Resources, the National Institute of Biomedical Imaging and Bioengineering, the National Institute of Neurological Disorders and Stroke, and the National Institute of Diabetes and Digestive and Kidney Diseases.

EBI-MSD is supported by funds from the Wellcome Trust (GR062025MA), the European Union (TEMBLOR, NMRQUAL, SPINE, AUTOSTRUCT and IIMS awards), CCP4, the Biotechnology and Biological Sciences Research Council (UK), the Medical Research Council (UK) and European Molecular Biology Laboratory. PDBj is supported by grant-in-aid from the Institute for Bioinformatics Research and Development, Japan Science and Technology Agency (BIRD-JST), and the Ministry of Education, Culture, Sports, Science and Techno

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Dutta, S. et al. (2008). Data Deposition and Annotation at the Worldwide Protein Data Bank. In: Kobe, B., Guss, M., Huber, T. (eds) Structural Proteomics. Methods in Molecular Biology™, vol 426. Humana Press. https://doi.org/10.1007/978-1-60327-058-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-058-8_5

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-809-6

  • Online ISBN: 978-1-60327-058-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics