Detection of unrealistic molecular environments in protein structures based on expected electron densities
- 435 Downloads
Understanding the relationship between protein structure and biological function is a central theme in structural biology. Advances are severely hampered by errors in experimentally determined protein structures. Detection and correction of such errors is therefore of utmost importance. Electron densities in molecular structures obey certain rules which depend on the molecular environment. Here we present and discuss a new approach that relates electron densities computed from a structural model to densities expected from prior observations on identical or closely related molecular environments. Strong deviations of computed from expected densities reveal unrealistic molecular structures. Most importantly, structure analysis and error detection are independent of experimental data and hence may be applied to any structural model. The comparison to state-of-the-art methods reveals that our approach is able to identify errors that formerly remained undetected. The new technique, called RefDens, is accessible as a public web service at http://refdens.services.came.sbg.ac.at.
KeywordsProtein structure Error detection Electron density
Experimentally determined protein structures contain a variety of uncertainties and errors. These often originate from low resolution electron density maps derived from X-ray analysis, or insufficient and ambiguous distance constraints obtained from nuclear magnetic resonance (NMR) experiments. Even high resolution X-ray structures contain errors that originate from uncertainties in the interpretation of electron densities.
It is highly desirable that errors in protein structures are removed on a regular basis which requires that current protocols in experimental structure determination are supplemented by additional error correcting cycles. The primary output of available error recognition programs (Lüthy et al. 1992; Colovos and Yeates 1993; Vriend and Sander 1993; Sippl 1993; Hooft et al. 1996; Melo and Feytmans 1998; Weichenberger and Sippl 2006; Davis et al. 2007) consists of scores which indicate inconsistencies like unfavorable interactions, incorrect atom positions, unusual rotamers and so on. The interpretation of such scores is usually straightforward but requires some understanding of the principles and inner workings of the software.
Validation tools and quality measures based on experimental electron densities have been proposed for a long time (Branden and Jones 1990; Brünger 1992). A major step forward was accomplished when in the beginning of 2008 the inclusion of structure factor data became mandatory when depositing model coordinates derived by X-ray crystallography to the protein data bank (PDB) (Berman et al. 2000). Protein structure validation remains an area of active research. Recently, various investigations have been published underlining the importance of critically interpreting three-dimensional models of experimentally determined protein structures (Kleywegt 2009; Read and Kleywegt 2009; Saccenti and Rosato 2008; Tronrud and Matthews 2009; Brown and Ramaswamy 2007). It has further been shown that re-refinement of structural models from PDB with current structure determination software leads to improved models (Joosten et al. 2009a, b)
In our approach to structure validation we calculate the electron excess around a specific amino acid side-chain as compared to the expected electron density around the side-chain atoms. The result, expressed in electron excess per cubic Ångström (e−/Å3), is easy to comprehend and does not require an understanding of sophisticated error recognition strategies. Deviations between computed and expected electron densities are easily visualized in the form of density difference maps. Most importantly, the reference data used in the approach presented here is independent of experimental data used to define the coordinates of the analyzed protein model. Deviations from expected densities directly reflect discrepancies between the interpretation of experimental data obtained for a particular molecule and the average behavior of comparable molecular environments observed in many other cases.
RefDens is designed for structures where high-quality data is not available. This may be the case for several reasons including missing NMR data (overlapping signals, low signal-to-noise ratio), medium or low resolution electron density maps, structures deposited without associated experimental data, and predicted structures.
To demonstrate the power of this approach we perform two different evaluations. First, to analyze the quality of RefDens’s results, we investigate a set of 1,559 protein structures recently deposited with PDB. All structures were released in 2009, have a resolution better than or equal to 2 Å and the experimental data required to build the electron densities are available. In this high quality set of protein structures we detected 95 problem regions, each being corroborated by visual inspection of the associated experimental electron density. A total of 29% of the errors are not detected by MolProbity’s clashscore (Davis et al. 2007) and 85% still exist in the re-refined structures from PDBRedo (Joosten et al. 2009a, b). The calculated electron density difference maps highlight the location of the problematic regions in the environment of the analyzed amino acid side-chain. Second, to demonstrate RefDens’s usefulness especially for the NMR structure resolution process, we analyze a set of structures from the PDB which were solved by both NMR and X-ray experiments. RefDens identifies 322 steric problems in the NMR structures all of which are resolved in the corresponding X-ray structures. Therefore, the presented method can significantly increase the quality of structures measured by NMR spectroscopy.
RefDens is available to the scientific community via http://refdens.services.came.sbg.ac.at
Evaluation of X-ray structures
To demonstrate the error recognition ability of RefDens we evaluate all protein structures released between January 1st, 2009 and June 22nd, 2009 provided structure factors are available and the resolution is better than or equal to 2 Å. The query to obtain the PDB entries used for the benchmark was executed using the PDB Advanced Search interface (available at http://www.pdb.org/pdb/search/advSearch.do) which returned a list of 1,559 PDB entries. In total, 509,007 residues were analyzed. The vast majority (96%) has an excess of less than 1 e−/Å3 which we consider as “high quality”. However, a total of 95 residues, found in 71 PDB entries, have significant deviations from the expected density (excess electron density greater or equal than 3 e−/Å3).
Each of these 95 cases was analyzed in detail. This involves loading the protein structure and its associated experimental electron density maps in COOT (Emsley and Cowtan 2004), analyzing the agreement of the placement of the respective residue’s side-chain with the experimental data and searching for violations of stereochemical constraints. Five cases were excluded from the detailed analysis since the experimental data could not be loaded into COOT due to missing information in the corresponding structure factor files. In 80 of the remaining 90 cases the analysis reveals serious errors in the respective protein structure. Of these, 55 amino acid side-chains show significant disagreement with the experimentally determined electron density maps and for the remaining 25 cases there exist strong violations of stereochemical constraints. RefDens also detects four side-chains which have very unusual rotamers (frequency of at most 0.4% as according to the MolProbity web-interface). Finally, six amino acids are in agreement with the experimentally determined electron density and the stereochemistry seems reasonable. However, the side-chains are closely surrounded by many atoms, each slightly closer than expected and each contributing a small amount to the overall high electron excess. Since the reference densities are derived from known protein structures the excess electron density is highly unlikely to be observed in correct protein crystals.
The complete list of the 95 offending residues is publicly available as supplementary material via http://refdens.services.came.sbg.ac.at/results/results.php. The web page contains details on each of these cases and provides access to the corresponding PDB file and structure factor data.
Evaluation of NMR structures
Nuclear magnetic resonance structure determination is based on a variety of different measurements. As these measurements may vary strongly for different protein structures there is no common source of experimental data for reference. Therefore, we focus on a set of protein structures which were solved by both NMR and X-ray analysis. A list of PDB entries determined by multiple experimental methods, which is constructed based on sequence similarity, is available from the PDB website. RefDens was applied to all 788 NMR structures in this list. Our analysis reveals 691 errors in 224 NMR structures. These structures were further analyzed to identify those entries which can be reliably mapped to well resolved X-ray structures. This is guaranteed by two constraints: First, the NMR structure is required to share 85% or more equivalent residues with the X-ray structure according to the structure superposition program TopMatch (Sippl and Wiederstein 2008). Second, the X-ray structure has to be resolved at 2 Å or better. By applying these constraints, 79 NMR structures are mapped to an X-ray counterpart. In these NMR structures RefDens identifies 322 problematic side-chains. Out of the 322 corresponding residues in the respective X-ray structures, only two side-chains are marked as erroneous. However, these two residues, Asp-A-351 and Glu-A-428 in 2jqx (NMR) and 1p7t (X-ray), have strongly reduced electron excess (5.8 e−/Å3 reduced to 3.1 e−/Å3 and 7.1 e−/Å3 reduced to 3.0 e−/Å3, respectively) in the corresponding X-ray structure.
The list of all analyzed NMR structures and their X-ray counterparts is available in the supplementary material.
We have demonstrated that RefDens reliably identifies errors in side-chains of protein structures. The algorithm is based on the comparison of calculated electron density environments for specific amino acid side-chains to reference densities derived from a set of high quality protein structures. In the presented benchmark we evaluated protein structures determined by X-ray analysis with a resolution better than or equal to 2 Å. In view of the high resolution of these structures, the fact that we are able to identify 80 confirmed errors is somewhat unexpected.
An essential feature of RefDens is its independence of experimental data other than the model coordinates and therefore it can be applied to any structural model, like structures determined by NMR spectroscopy, since the electron density is computed from the input model by standard procedures as used in X-ray analysis. The evaluation of 79 NMR structures with high quality X-ray counterparts shows that RefDens reliably identifies unfavorable side-chain conformations in NMR structures.
We are convinced that RefDens is of considerable interest for researchers in NMR spectroscopy. The analysis of the electron density environment has several methodological advantages as compared to classic structure quality assessment approaches, including the independence from the frequency of certain atom types in the training set (e.g., seldomly observed ligands atoms…), the direct steric depiction of the problematic regions and the natural combination of multiple pairwise distance violations into a single score. The evaluation of a single side-chain using RefDens takes less than a quarter of a second on a standard desktop computer. Therefore it allows the calculation of electron excess scores for large number of different models. Further research will be concentrated on the inclusion of main chain densities and the automated correction of unrealistic densities.
RefDens is publicly available as a web service at http://refdens.services.came.sbg.ac.at. The server accepts PDB codes or a PDB formatted file and returns a list of residues with high excess electron density. Each residue can be displayed with its corresponding error density map in three dimensions.
The training set
Our approach is knowledge-based in the sense that we derive information on expected electron densities from a set of experimentally determined protein structures. To select a suitable non-redundant data set we use the COPS (Suhrer et al. 2009) classification of protein structures using the following constraints. Firstly, any two proteins in the set share at most 90% structural similarity. Secondly, each protein structure is solved by X-ray crystallography and has a resolution better than or equal to 1.5 Å. This results in a set of 1,258 high quality protein structures. The focus on non-redundant structures ensures an exhaustive sampling of the electron density environments at least for that part of fold space which is covered by currently known structures. The COPS classification is especially suited for our task, as it allows to focus solely on quantitive structural relationships which are calculated by the TopMatch algorithm (Sippl and Wiederstein 2008).
Handling alternate locations of atoms
Many protein structures in the training set contain alternate location indicators (ALTLOCs). They indicate that the associated atoms occupy more than one position in the crystal. Since the intensities measured by the X-ray experiment correspond to an ensemble average over the crystal it is generally impossible to determine the different suitable combinations of ALTLOCs. In fact, to obtain all possible models compatible with multiple sites of residues it is necessary to identify the possible sterically allowed combinations of residues and atoms that have ALTLOC indicators. This is computationally expensive, ambiguous and error prone. We follow common practice by choosing the first ALTLOC for every atom and derived only coordinates from atoms with this indicator.
Generation of the reference densities
Although the idea behind RefDens may be applied to any part of a protein, here we focus on analyzing amino acid side-chains as their conformation is a common source of error, even in high-quality protein structures. Therefore, alanine, glycine, and proline are excluded from the evaluations as their side-chains lack backbone-independent flexibility. In the present work we are particularly interested in the detection of difficult errors like incorrect electron densities which result from non-covalent atomic interactions as opposed to errors which are comparatively easy to detect, like deviations from standard bond lengths or valence angles.
Hydrogen atoms are added to the protein structure and heterogeneous groups using Reduce (Word et al. 1999).
- 2.Depending on the residue type a subset of terminal side-chain atoms (starting from the rotational bond of the last χ angle) is used to set up a reference coordinate system. These atoms are of particular interest since they are often involved in interactions to other side-chains, the backbone or ligands. The complete molecular environment is transformed to this coordinate system, including atoms from heterogeneous groups. Figure 8 shows the reference coordinate systems for several amino acid types.
In this reference coordinate system the electron density is computed according to Eq. 1 in the supplementary material. Figure 8 illustrates the extent of the computed electron density maps for selected residues.
For each of the 17 residue types the mean electron density is computed on an equidistant grid with step size of 0.5 Å.
The resulting averaged electron density maps for each of the 17 amino acids serve as a reference for the expected electron density environment of the different amino acid side-chains.
Evaluation of one protein structure
The analysis of the electron density of a particular protein is very similar to the reference density computation. We perform the first three steps described in the previous section, such that for each residue an electron density is computed. We then compare these densities with the pre-calculated reference densities of the respective residue type. This involves the calculation of the difference between the density from the target protein and the corresponding reference density on each grid point. The result is a difference electron density map similar to Fo-Fc maps used in X-ray crystallography, however, calculated in real space as the difference between the calculated density for the analyzed protein structure and the knowledge-based reference density. Finally, the electron excess is calculated as the numeric integral over the difference map. This serves as a score to express the deviation from the average density on a per-residue basis.
We thank Bernhard Rupp for his answers to questions about X-ray crystallography and Sandra Pühringer for fruitful discussions and suggestions. This work was supported by the the Austrian Science Fund (FWF), grant P21294-B12.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- Cromer DT, Waber JT (1974) International tables for X-ray crystallography, Vol. IV, Table 2.2 B. Kynoch Press, Birmingham (Present distributor Kluwer Academic Publishers, Dordrecht), pp 99–101Google Scholar
- Davis I, Leaver-Fay A, Chen VB, Block JN, Kapral GJ, Wang X, Murray LW, Arendall WB, Snoeyink J, Richardson JS, Richardson DC (2007) Molprobity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res 35(Web Server issue):W375–W383. doi: 10.1093/nar/gkm216
- DeLano W (2002) The PyMOL user’s manual. DeLano Scientific, Palo AltoGoogle Scholar
- Emsley P, Cowtan K (2004) COOT: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr 60(Pt 12 Pt 1):2126–2132. doi: 10.1107/S0907444904019158
- Joosten RP, Salzemann J, Bloch V, Stockinger H, Berglund AC, Blanchet C, Bongcam-Rudloff E, Combet C, Da Costa AL, Deleage G, Diarena M, Fabbretti R, Fettahi G, Flegel V, Gisel A, Kasam V, Kervinen T, Korpelainen E, Mattila K, Pagni M, Reichstadt M, Breton V, Tickle IJ, G V (2009a) PDB_REDO: automated re-refinement of X-ray structure models in the pdb. J Appl Crystallogr 42:376–384CrossRefGoogle Scholar
- Kleywegt GJ (2009) On vital aid: the why, what and how of validation. Acta Crystallogr D Biol Crystallogr 65(Pt 2):134–139. doi: 10.1107/S090744490900081X
- Kleywegt GJ, Harris MR, Zou JY, Taylor TC, Wählby A, Jones TA (2004) The uppsala electron-density server. Acta Crystallogr D Biol Crystallogr 60(Pt 12 Pt 1):2240–2249. doi: 1107/S0907444904013253