Abstract
Biomolecular NMR chemical shift data are key information for the functional analysis of biomolecules and the development of new techniques for NMR studies utilizing chemical shift statistical information. Structural genomics projects are major contributors to the accumulation of protein chemical shift information. The management of the large quantities of NMR data generated by each project in a local database and the transfer of the data to the public databases are still formidable tasks because of the complicated nature of NMR data. Here we report an automated and efficient system developed for the deposition and annotation of a large number of data sets including 1H, 13C and 15N resonance assignments used for the structure determination of proteins. We have demonstrated the feasibility of our system by applying it to over 600 entries from the internal database generated by the RIKEN Structural Genomics/Proteomics Initiative (RSGI) to the public database, BioMagResBank (BMRB). We have assessed the quality of the deposited chemical shifts by comparing them with those predicted from the PDB coordinate entry for the corresponding protein. The same comparison for other matched BMRB/PDB entries deposited from 2001–2011 has been carried out and the results suggest that the RSGI entries greatly improved the quality of the BMRB database. Since the entries include chemical shifts acquired under strikingly similar experimental conditions, these NMR data can be expected to be a promising resource to improve current technologies as well as to develop new NMR methods for protein studies.
Similar content being viewed by others
References
Allen FH, Barnard JM, Cook APF, Hall SR (1995) The molecular information file (MIF): core specifications of a new standard format for chemical data. J Chem Inf Comput Sci 35:412–417
Baran MC, Moseley HN, Aramini JM, Bayro MJ, Monleon D, Locke JY, Montelione GT (2006) SPINS: a laboratory information management system for organizing and archiving intermediate and final results from NMR protein structure determinations. Proteins 62:843–851
Bhattacharya A, Tejero R, Monelione GT (2007) Evaluating protein structures determined by structural genomics consortia. Proteins 66:778–795
Cavalli A, Salvatella X, Dobson CM, Vendruscolo M (2007) Protein structure determination from NMR chemical shifts. Proc Natl Acad Sci USA 104:9615–9620
Cornilescu G, Delaglio F, Bax A (1999) Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J Biomol NMR 13:289–302
Delaglio F, Grzesiek S, Vuister GW, Zhu G, Pfeifer J, Bax A (1995) NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J Biomol NMR 6:277–293
Fogh R, Ionides J, Ulrich E, Boucher W, Vranken W, Linge JP, Habeck M, Rieping W, Bhat TN, Westbrook J, Henrick K, Gilliland G, Berman H, Thornton J, Nilges M, Markley J, Laue E (2002) The CCPN project: an interim report on a data model for the NMR community. Nat Struct Biol 9:416–418
Fogh RH, Boucher W, Vranken WF, Pajon A, Stevens TJ, Bhat TN, Westbrook J, Ionides JM, Laue ED (2005) A framework for scientific data modeling and automated software development. Bioinformatics 21:1678–1684
Fogh RH, Vranken WF, Boucher W, Stevens TJ, Laue ED (2006) A nomenclature and data model to describe NMR experiments. J Biomol NMR 36:147–155
Güntert P (2003) Automated NMR protein structure calculation. Prog NMR Spectrosc 43:105–125
Hall SR (1991) The STAR file: a new format for electronic data transfer and archiving. J Chem Inf Comput Sci 31:326–333
Hall SR, Spadaccini N (1994) The STAR file: detailed specifications. J Chem Inf Comput Sci 34:505–508
Huang YJ, Powers R, Montelione GT (2005) Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics. J Am Chem Soc 127:1665–1674
Johnson BA, Blebins RA (1994) NMRView: a computer program for the visualization and analysis of NMR data. J Biomol NMR 4:603–614
Kobayashi N, Iwahara J, Koshiba S, Tomizawa T, Tochio N, Güntert P, Kigawa T, Yokoyama S (2007) KUJIRA, a package of integrated modules for systematic and interactive analysis of NMR data directed to high-throughput NMR structure studies. J Biomol NMR 39:31–52
Moseley HN, Sahota G, Montelione GT (2004) Assignment validation software suite for the evaluation and presentation of protein resonance assignment data. J Biomol NMR 28:341–355
Neal S, Nip AM, Zhang H, Wishart DS (2003) Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts. J Biomol NMR 26:215–240
Penkett CJ, van Ginkel G, Velankar S, Swaminathan J, Ulrich EL, Mading S, Stevens TJ, Fogh RH, Gutmanas A, Kleywegt GJ, Henrick K, Vranken WF (2010) Straightforward and complete deposition of NMR data to the PDBe. J Biomol NMR 48:85–92
Seavey BR, Farr EA, Westler WM, Markley JL (1991) A relational database for sequence-specific protein NMR data. J Biomol NMR 1:217–236
Shen Y, Bax A (2007) Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology. J Biomol NMR 38:289–302
Shen Y, Bax A (2010) SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network. J Biomol NMR 48:13–22
Shen Y, Oliver L, Delaglio F, Rossi P, Aramini J, Liu G, Eletsky A, Wu Y, Singarapu KK, Lemak A, Ignatchenko A, Cheryl H, Arrowsmith CH, Szyperski T, Gaetano T, Montelione GT, Baker D, Bax A (2008) Consistent blind protein structure generation from NMR chemical shift data. Proc Natl Acad Sci USA 105:4685–4690
Shen Y, Delaglio F, Cornilescu G, Bax A (2009) TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J Biomol NMR 44:213–223
Ulrich EL, Markley JL, Kyogoku Y (1989) Creation of a nuclear magnetic resonance data repository and literature database. Protein Seq Data Anal 2:23–37
Ulrich EL, Akutsu H, Doreleijers JF, Harano Y, Ioannidis YE, Lin J, Livny M, Mading S, Maziuk D, Miller Z, Nakatani E, Schulte CF, Tolmie DE, Kent Wenger R, Yao H, Markley JL (2008) BioMagResBank. Nucleic Acids Res 36:D402–D408
Vranken WF, Boucher W, Stevens TJ, Fogh RH, Pajon A, Llinas M, Ulrich EL, Markley JL, Ionides J, Laue ED (2005) The CCPN data model for NMR spectroscopy: development of a software pipeline. Proteins 59:687–696
Wang L, Markley JL (2009) Empirical correlation between protein backbone 15N and 13C secondary chemical shifts and its application to nitrogen chemicalshift re-referencing. J Biomol NMR 44:95–99
Wang L, Eghbalnia HR, Bahrami A, Markley JL (2005) Linear analysis of carbon-13 chemical shift differences and its application to the detection and correction of errors in referencing and spin system identifications. J Biomol NMR 44:13–22
Wang B, Wang Y, Wishart DS (2010) A probabilistic approach for validating protein NMR chemical shift assignments. J Biomol NMR 47:85–99
Wishart DS, Arndt D, Berjanskii M, Tang P, Zhou J, Lin G (2008) CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data. Nucleic Acids Res 36:W496–W502
Yokoyama S, Hirota H, Kigawa T, Yabuki T, Shirouzu M, Terada T, Ito Y, Matsuo Y, Kuroda Y, Nishimura Y, Kyogoku Y, Miki K, Masui R, Kuramitsu S (2000) Structural genomics projects in Japan. Nat Struct Biol 7:943–945
Acknowledgments
This work was partially supported by National Bioscience Database Center (NBDC) in Japan Science and Technology Agency (JST). We are grateful to Prof. Haruki Nakamura for intensive encouragement to us and for contribution to discussions about this study. We thank Dr. J. Doreleijers for many valuable comments, suggestions and proofreading of the manuscripts and Dr. F. Delaglio for help in establishing the macro-file library for the NMR-Pipe data process. We also thank Mr. T. Iwata for his work in preparing the web-page for downloading the BMRB related tools.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Kobayashi, N., Harano, Y., Tochio, N. et al. An automated system designed for large scale NMR data deposition and annotation: application to over 600 assigned chemical shift data entries to the BioMagResBank from the Riken Structural Genomics/Proteomics Initiative internal database. J Biomol NMR 53, 311–320 (2012). https://doi.org/10.1007/s10858-012-9641-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10858-012-9641-6