Abstract
MolProbity is a powerful software program for validating structures of proteins and nucleic acids. Although MolProbity includes scripts for batch analysis of structures, because these scripts analyze structures one at a time, they are not well suited for the validation of a large dataset of structures. We have created a version of MolProbity (MolProbity-HTC) that circumvents these limitations and takes advantage of a high-throughput computing cluster by using the HTCondor software. MolProbity-HTC enables the longitudinal analysis of large sets of structures, such as those deposited in the PDB or generated through theoretical computation—tasks that would have been extremely time-consuming using previous versions of MolProbity. We have used MolProbity-HTC to validate the entire PDB, and have developed a new visual chart for the BioMagResBank website that enables users to easily ascertain the quality of each model in an NMR ensemble and to compare the quality of those models to the rest of the PDB.
Similar content being viewed by others
References
Bhattacharya A, Tejero R, Montelione GT (2007) Evaluating protein structures determined by structural genomics consortia. Proteins 66:778–795. doi:10.1002/prot.21165
Chen VB et al (2010) MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr Sect D: Biol Crystallogr 66:12–21. doi:10.1107/S0907444909042073
Couvares P, Kosar T, Roy A, Weber J, Wenger K (2007) Workflow management in condor. In: Taylor IJ, Deelman E, Gannon DB, Shields M (eds) Workflows for e-science. Springer, London. doi:10.1007/978-1-84628-757-2
Doreleijers JF et al (2012) CING: an integrated residue-based structure validation program suite. J Biomol NMR 54:267–283. doi:10.1007/s10858-012-9669-7
Grosse-Kunstleve RW, Sauter NK, Moriarty NW, Adams PD (2002) The computational crystallography toolbox: crystallographic algorithms in a reusable software framework. J Appl Crystallogr 35:126–136. doi:10.1107/S0021889801017824
Henderson R et al (2012) Outcome of the first electron microscopy validation task force meeting. Structure 20:205–214. doi:10.1016/j.str.2011.12.014
Hooft RW, Vriend G, Sander C, Abola EE (1996) Errors in protein structures. Nature 381:272. doi:10.1038/381272a0
Hoogstraten CG, Westler WM, Macura S, Markley JL (1993) improved measurement of longer proton-proton distances in proteins by relaxation network editing. J Mag Reson Ser B 102:232–235. doi:10.1006/jmrb.1993.1090
Kirchner DK, Guntert P (2011) Objective identification of residue ranges for the superposition of protein structures. BMC Bioinf 12:170. doi:10.1186/1471-2105-12-170
Lau TL, Partridge AW, Ginsberg MH, Ulmer TS (2008) Structure of the integrin beta3 transmembrane segment in phospholipid bicelles and detergent micelles. Biochemistry 47:4008–4016. doi:10.1021/bi800107a
Lee W, Stark JL, Markley JL (2014) PONDEROSA-C/S: client-server based software package for automated protein 3D structure determination. J Biomol NMR 60:73–75. doi:10.1007/s10858-014-9855-x
Lee W, Tonelli M, Markley JL (2015) NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy. Bioinformatics 31:1325–1327. doi:10.1093/bioinformatics/btu830
Linge JP, Williams MA, Spronk CA, Bonvin AM, Nilges M (2003) Refinement of protein structures in explicit solvent. Proteins 50:496–506. doi:10.1002/prot.10299
Montelione GT et al (2013) Recommendations of the wwPDB NMR validation task force. Structure 21:1563–1570. doi:10.1016/j.str.2013.07.021
Ramelot TA et al (2009) Improving NMR protein structure quality by Rosetta refinement: a molecular replacement study. Proteins 75:147–167. doi:10.1002/prot.22229
Read RJ et al (2011) A new generation of crystallographic validation tools for the protein data bank. Structure 19:1395–1412. doi:10.1016/j.str.2011.08.006
Suzuki S et al (2007) Structural characterization of the ribosome maturation protein RimM. J Bacteriol 189:6397–6406. doi:10.1128/JB.00024-07
Tannenbaum T, Wright D, Miller K, Livny M (2001) Condor: a distributed job scheduler. In: Sterling T (ed) Beowulf cluster computing with Linux. MIT Press, Cambridge
Ulrich EL et al (2008) BioMagResBank. Nucleic Acids Res 36:D402–D408. doi:10.1093/nar/gkm957
Wang Z, Song J, Milne TA, Wang GG, Li H, Allis CD, Patel DJ (2010) Pro isomerization in MLL1 PHD3-bromo cassette connects H3K4me readout to CyP33 and HDAC-mediated repression. Cell 141:1183–1194. doi:10.1016/j.cell.2010.05.016
Word JM, Lovell SC, Richardson JS, Richardson DC (1999) Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. J Mol Biol 285:1735–1747. doi:10.1006/jmbi.1998.2401
Acknowledgments
We thank Miron Livny for helpful and lively discussions about the use of HTCondor. We also thank Dimitri Maziuk for technical assistance and help with systems administration at the BMRB, and W. Milo Westler for discussions about NMR methodology. This work was supported by NIH Grants R01 GM109046 and P41GM103399. VBC has received partial support from NLM Grant 5T15LM007359.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, V.B., Wedell, J.R., Wenger, R.K. et al. MolProbity for the masses–of data. J Biomol NMR 63, 77–83 (2015). https://doi.org/10.1007/s10858-015-9969-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10858-015-9969-9