Abstract
Despite the formidable progress in Nuclear Magnetic Resonance (NMR) spectroscopy, quality assessment of NMR-derived structures remains as an important problem. Thus, validation of protein structures is essential for the spectroscopists, since it could enable them to detect structural flaws and potentially guide their efforts in further refinement. Moreover, availability of accurate and efficient validation tools would help molecular biologists and computational chemists to evaluate quality of available experimental structures and to select a protein model which is the most suitable for a given scientific problem. The 13Cα nuclei are ubiquitous in proteins, moreover, their shieldings are easily obtainable from NMR experiments and represent a rich source of encoded structural information that makes 13Cα chemical shifts an attractive candidate for use in computational methods aimed at determination and validation of protein structures. In this chapter, the basis of a novel methodology of computing, at the quantum chemical level of theory, the 13Cα shielding for the amino acid residues in proteins is described. We also identify and examine the main factors affecting the 13Cα-shielding computation. Finally, we illustrate how the information encoded in the 13C chemical shifts can be used for a number of applications, viz., from protein structure prediction of both α-helical and β-sheet conformations, to determination of the fraction of the tautomeric forms of the imidazole ring of histidine in proteins as a function of pH or to accurate detection of structural flaws, at a residue-level, in NMR-determined protein models.
You have full access to this open access chapter, Download chapter PDF
Similar content being viewed by others
1 Introduction
Before a protein structure can be analyzed in light of its biological function it is necessary to validate it, i.e., to have a clear understanding of its reliability in terms of both the overall structure and of its details at per-residue level. However, an accurate and fast validation of protein structures constitutes a long-standing problem in Nuclear Magnetic Resonance (NMR) spectroscopy [1–4]. For this reason, investigators have proposed a plethora of methods to determine the accuracy and reliability of protein structures in recent years [5–12]. Despite this progress, there is a growing need for more sophisticated, physics-based and fast structure-validation methods [1, 2, 6, 7, 11].
The 13Cα chemical shifts provide important information about conformations of peptides and proteins in solution [13–39] and, therefore, can be used as an exquisitely sensitive probe with which to assess the quality of protein models. We developed recently a new, physics-based methodology [34], that makes use of observed and computed {at the Density-functional theory (DFT) level of theory [40]} 13Cα chemical shifts for an accurate validation of protein structures in solution and in crystal [41]. The first step in the development of this new methodology involved determining the factors that affect 13Cα shielding calculations, such as the protonation/deprotonation state of distant ionizable groups, sequential nearest-neighbor or covalent geometry effects (i.e., due to variations in the bond lengths and bond angles of residues) and the sensitivity of the shielding/deshielding of 13Cα nuclei to changes in side-chain conformation. Once all these factors affecting 13Cα-shielding have been properly identified and considered, a very important test is to determine the accuracy and speed of the computation of the 13Cα-shielding as a function of the size of the basis set chosen and the Density Functional Theory (DFT) model adopted. These are important tests because DFT-based quantum mechanical (QM) calculations are very CPU demanding, despite the ever-increasing computational power available.
The new DFT-based method has been applied to study a number of problems, such as unblocked statistical-coil tetrapeptides in aqueous solution [32], polyproline II helix conformation in a proline-rich environment [31], the 13Cα and 13Cβ chemical shifts of cysteines in disulfide-bonded cysteine [42] or determination of the fraction of the tautomeric forms of histidine in proteins as a function of pH [43]. This new strategy also provides a unified, self-consistent method to determine high-quality protein structures, without relying on knowledge-based information [44]. Thus, a β-sheet or an all α-helical protein structure can be accurately determined by simply identifying a set of conformations which simultaneously satisfy a number of constraints, namely 13Cα-dynamically-derived torsional angle constraints and Nuclear Overhauser Effect (NOE) derived distance constraints [29, 44].
The currently used 13Cα chemical shift-based validation and determination protocol [29, 33, 44, 45, 34] exploits the following features: (a) the assignment of chemical shifts is a fundamental step in a protein structure determination by NMR spectroscopy [46], and no extra experimental work is needed; (b) in addition to the impact of the covalent structure, 13Cα chemical shifts are modulated mainly by the intraresidue backbone and side-chain dihedral angles [16, 17, 19, 20–22, 27, 47, 35, 39], with no significant influence of the amino acid sequence [48]; (c) 13Cα is ubiquitous in proteins; and, (d) 13Cα chemical shifts can be computed with high accuracy at the QM level of theory.
This chapter is intended to be an overview of the author’s contribution to the field of protein structure determination and validation using, mainly, information decoded from the 13Cα chemical shifts. Consequently, the chapter is organized as follows: first, the method used to compute the 13Cα chemical shifts and to analyze the results are briefly described; second, the main factors affecting the 13Cα chemical shifts computation are enumerated and discussed; third, the capabilities of the computed 13Cα chemical shifts, as a rich source of encoded structural information, are illustrated by a series of applications that involves, but is not limited to, the determination of protein structures; and finally a new protein-structure validation server, CheShift-2 [49], with which NMR spectroscopists can assess the quality of their protein models, before they are deposited in the Protein Data Bank (PDB) [50], is presented. It is worth noting that the theory, and details, behind alternative protein structure determination and validation methods are not discussed here and, hence, the reader is referred instead to an extensive collection of such methods [1, 5–12, 26, 51–61].
2 Methods
2.1 Calculation of 13Cα Chemical Shifts
All the experimentally determined conformations, unless noted otherwise, were regularized, i.e., all residues were replaced by the standard Empirical Conformational Energy Program for Peptides (ECEPP) [62] residues in which bond lengths and bond angles are fixed (rigid-body geometry approximation) at the standard values [62] and hydrogen atoms were added, if necessary.
Computations of the 13Cα chemical shifts involve a series of approximations. For each amino acid residue X in the protein sequence: (a) the 13Cα shielding depends, mainly, on its own backbone conformations [21, 27] and side-chain [19, 20, 35], with no significant influence of either the amino acid sequence or the position of the given residue in the sequence, except for residues preceding proline [48]; (b) each amino acid residue X in the protein sequence can be treated as a terminally-blocked tripeptide with the sequence Ac-GXG-NMe, with X in the conformation of the protein structure; (c) the 13Cα isotropic shielding values (σ) for each amino acid residue X can be computed at the OB98/6-311 + G(2d,p) level of theory [28] with the Gaussian 03 package [63]. The remaining residues in each tripeptide are treated at the OB98/3-21G level of theory, i.e., by using the locally-dense basis set approach [64]; (d) all ionizable residues can be considered neutral during the QM calculations [45], unless noted otherwise; (e) no geometry optimization is necessary because such optimization by ab initio (HF) or DFT methods has only a small effect on the computed chemical shifts [19].
The computed 13Cα shieldings (σsubst, th) are converted to 13Cα chemical shifts (δ) by employing the equation δth = σref – σsubst, th where the indices denote a theoretical (th) computation, the reference substance (ref), and the substance of interest (subst), i.e., the 13Cα shielding of a given amino acid residue X. The observed shielding value of tetramethylsilane (TMS) in the gas phase [65], namely 188.1 ppm, was adopted as an initial (see below) reference value. All the computed 13Cα shielding (σsubst, th) values are calculated using the Gauge-Invariant Atomic Orbital method at the DFT level of theory as implemented in the GAUSSIAN 03/09 suite of programs (Frisch et al., 2003). For all purposes, in this chapter, we have used only one exchange-correlation functional, OB98, because it was shown [30] to be one of the most accurate and fast functionals with which to reproduce the observed 13Cα chemical shifts of proteins in solution (see Sect. 3.2).
2.2 Determination of an Effective TMS Shielding Value
Determination of a proper TMS shielding value for each functional is crucial for an accurate computation of the 13Cα chemical shifts because it will enable us to minimize the presence of systematic errors which might bias the chemical shifts-based analysis. From this point of view the effective TMS value will provide the most accurate approach to solve the problem because it will not require further adjustments. Consequently computation of an effective TMS values is central to our calculations.
By adopting the observed TMS value of 188.1 ppm (Jameson and Jameson, 1987) as a reference it is possible to find for any functional, the characteristic mean (xo) and standard deviation (σ) of the Normal (or Gaussian) fit of the frequency of the errors distribution. For all functionals tested in our work the characteristic mean value (xo) appears displaced from its ideal value of 0.0 by a positive, or negative, amount, e.g., for OB98 a xo = + 3.6 ppm was found. Further analysis [30] indicates that for any of the 10 functionals tested a straightforward use of the observed TMS shielding value (188.1 ppm) is not appropriate, if no further corrections are introduced. Hence, for each functional and basis set chosen it is feasible to find an ‘effective’ TMS shielding value for which the Normal (or Gaussian) fit shows a zero displacement, i.e., an effective TMS value that gives a xo = 0.0. For example, use of OB98 with a large [6-311 + G(2d,p)/3-21G] basis set leads to an effective TMS of 184.5 ppm, i.e., by subtracting 3.6 ppm from 188.1 ppm [30], that gives a xo = 0.0 ppm. Likewise, use of a small (6-31G/3-21G) basis set leads to an effective TMS of 195.4 ppm.
2.3 Computation of the Ca-RMSD Model
The observed chemical shift for each residue i, 13C αobserved, i , represents contributions from an ensemble of rapidly interconverting conformers that coexist in solution. Then, an accurate comparison between the observed and computed 13Cα chemical shifts requires consideration of an ensemble of NMR-derived conformers, rather than of a single conformation [41, 33]. Consequently, for each amino acid residue in the sequence, i, the average of the chemical shifts calculated for the individual residues in the ensemble of Ω conformers representing the NMR structure, < 13Cα > i, is computed as:
where 13C α i, k is the computed chemical shift for residue i in conformer k, with 1 ≤ i ≤ N, where N is the number of residues in the sequence. Derivation of Eq. (1) was obtained through the following approximation: for each residue i the quantity to be computed must, in principle, be \( {<} {^{13} {\text{C}}^{\alpha }} {>}_{i} = \sum\nolimits_{k = 1}^{\varOmega } {\lambda_{k}^{13} {\text{C}}^{\alpha }_{i,k} } \), where λk is the Boltzmann factor for conformer k, with \( \sum\nolimits_{k = 1}^{\Omega } {\lambda_{k} } \equiv 1 \). But, computation of the Boltzmann factors at QM level of theory is not possible, with the existing computational facilities, because it would require computation of the total energy at the QM level of theory for each of the conformers in the ensemble used to represent the NMR structure. Therefore, the following approximation was used: λk = 1/Ω [48]; in other words, in this approximation each conformer contributes equally to the average chemical shift obtained by fast conformational averaging. Whether a computation of a Boltzmann average, rather than the arithmetic average, would lead to a more accurate representation of the 13Cα chemical shifts needs further investigation.
The < 13Cα > i value obtained from Εq. (1) is used to compute the conformational-average difference Δi between the observed and computed 13Cα chemical shifts for each amino acid residue i,
Hereafter, the conformational-average root-mean-square-deviation (rmsd) parameter, ca-rmsd [48], is obtained as:
which is a global property of the protein NMR structure given as the weighted average of the differences between the experimental 13Cα chemical shifts and the < 13Cα > i—values for all the residues in the protein.
2.4 13Cα-Based Protein Structure Determination Method
The 13Cα-based procedure used for determination of protein structures consists of three steps. The flow chart of this protocol [44] is shown in Fig. 1 and a brief description of each step follows.
Step 1: The Variable-Target-Function (VTF) approach with a simplified soft-sphere potential function [66] is used to generate an ensemble of conformations at random that simultaneously satisfy a set of long-range distance constraints derived from the experimental NOEs and (φ, ψ) torsional constraints, derived from the observed 13Cα and 13Cβ conformational shifts [27]. The derived torsional constraints are only for those amino acids residues in the sequence that pertain to a regular structure, i.e., to a α-helix or β-sheet. Consequently, these (φ,ψ)α,β torsional constraints (shown in Fig. 1) are limited to, on average, ~50% of the amino acids residues in proteins because the remaining ones populate non-regular structures.
Then, a clustering procedure, e.g., the Minimal Spanning Tree method [67], is used to select a small sub-set of the total number of the VTF-derived conformations, namely those possessing a maximum NOE-derived distance violation lower than some arbitrary fixed value. For each of these conformations the 13Cα chemical shifts are computed as described in Sect. 2.1. Examination of the chemical shifts of all the amino acids in the ensemble of conformations enables us to identify the amino acid at each position in the sequence whose computed chemical shifts most closely match the observed ones, among all these conformations. This identified set of individual amino acid conformations corresponds to only one conformation of the whole chain: the ‘theoretical minimal-rmsd model’ [33]. In this model, the 13Cα chemical shift of each residue individually best matched the experimental one, thereby providing a new set of ϕ, ψ, and χ torsional angle constraints for all amino acid residues in the sequence, i.e., not just for the amino acid residues in regular structures. Because the chemical shifts are a multivalued function of the ϕ, ψ, and χ torsional angles, the set of torsional angles derived from the ‘theoretical minimal-rmsd model’ does not, necessarily, represent a unique solution to a given set of observed 13Cα chemical shifts values.
Step 2: Only one conformation among all the conformations produced in Step 1 is selected, for example, the conformation possessing the lowest rmsd between the computed and observed 13Cα chemical shifts. The selected conformation is used as a starting one in a new conformational search with the Monte Carlo with Minimization (MCM) method [68, 69]. The MCM search is carried out with two types of constraints: the original set of NOE-derived distance constraints and the new set of ϕ, ψ, χ torsional angles derived in Step 1. This time the conformational search is carried out using a complete force-field including the internal potential energy described by ECEPP/05 [70], the solvent free energy calculated by using a solvent-accessible surface area model [71], and an additional energy terms aimed at penalizing violations of the distance and torsional angle constraints [72]. Convergence of the determination protocol is monitored using the ca-rmsd between the computed and observed 13Cα chemical shifts.
Step 3: If the computed ca-rmsd is lower than certain, arbitrary chosen, cutoff value (ξ), then the procedure is ended. Otherwise, the Step 2 is repeated using a new set of (ϕ,ψ,χ) derived from the minimal-rmsd-model of the previous step.
It is worth noting that after our physics-based protocol was published [44] an alternative knowledge-based method that makes use of 1H, 13Cα, 13Cβ and 15N chemical shifts as restraints, was successfully applied to structure determination of several proteins [53]. A blind test of computational methods, included several that use also chemical shifts as restraints, aimed at fully automated determination of protein structures has been carried out recently [60].
2.5 Computation of the 13Cα Chemical Shifts as Function of the PH
For a given residue i, of a protein in a conformation k, the average charge distribution, <ρi,k> , could be determined by solving the Poisson equation by considering the 2ξ ionization states, with ξ being the number of ionizable groups in the molecule. Regarding this problem, it is worth noting that ξ could be a large number because ~30% of all residues in a protein sequence are, on average, ionizable and, hence, an accurate solution would require a fast algorithm. Consequently, in all the applications mentioned in this chapter, we used the Multiple Boundary Element (MBE) method [73, 74], in which the free energy associated with the state of ionization of the ionizable groups at a fixed pH value, namely 6.5, is calculated with the general multi-site titration formalism [75, 76]. The charges and atomic radii from the PARSE (Parameters for Solvation Energy) algorithm [77] were used for the solvation free energy calculations using the MBE method, and the internal (εint) and solvent (εsolv) dielectric constants of 2 and 80, respectively [76] were adopted for the calculations of <ρi,k> . The value of εint = 2 is consistent with the use of PARSE charges [78] and is also commonly assumed as an adequate representation of the protein interior. Following these approximations, for a given conformation k, the average degree of ionization of the ith ionizable group of this conformation is computed as:
where Z is the partition function, kB is the Boltzmann constant, T is the absolute temperature, \( x_{k}^{n} = (\rho_{1,k}^{n} , \ldots ,\rho_{i,k}^{n} , \ldots ,\rho_{N,k}^{n} ) \) with \( \rho_{i,k}^{n} \) = (1 or 0) is the nth protonation microstate of conformation k for protein Pk. \( \Delta G(P_{k} ,x_{n}^{k} ) \) is the free energy of ionization of the nth microstate of protein Pk in conformation k [75].
It should be noted that for any ionizable residue i of a single conformation k, Eq. (4) can lead to a non-integer average degree of charge, although we know that such non-integer charges do not make physical sense. Due to the Boltzmann nature of the averaged value computed by Eq. (4), a fractional charge should physically be interpreted as follows: for a given conformation k, there are many identical replicas of such a conformation in solution and, hence, a fractional charge computed by Eq. (4), e.g., 0.75, means that 75% of these replicas possess the ionizable group i protonated/deprotonated with an integral charge while the remaining 25% of the replicas possess the same ionizable group as deprotonated/protonated, depending on whether the ionizable group is basic or acidic.
Assuming that the protonation/deprotonation reactions are instantaneous on the NMR time scale, i.e., microsecond to millisecond [79], the theoretical 13Cα chemical shifts, \( \delta_{i}^{computed} (pH) \), for a given residue i in the sequence (except for histidine that possess 2 tautomers) are computed as a function of the pH using the following equation:
where δ+,i,k and δ0,i,k are the computed 13Cα chemical shifts, for the amino acid i in conformation k, with fully charged and neutral side chains, respectively, Ω is the number of conformers in the protein ensemble, and < ρi,k> the averaged degree of charge, as given by Eq. (4).
3 Factors Affecting the Calculation of 13Cα Chemical Shifts
3.1 Transferability of the Results
The current methodology [33, 34] relies on a crucial observation: once residue conformations are established by their interactions with the rest of the protein the 13Cα shielding of each residue depends, mainly, on its backbone and side-chain conformations, with no significant influence by the nature of the nearest-neighbor amino acids, except for residues immediately preceding proline [48].
The above observation allows us to parallelize the 13Cα shielding calculations in proteins and, hence, to make them computationally feasible. Moreover, a given set of accurately-determined amino acid residue conformations representing the accessible conformational space for all the 20 naturally occurring amino acids and showing a good distribution of side-chain conformations will constitute a reasonable ensemble with which to carry out tests of the current methodology. The results of these tests should be transferable to proteins of any class or size. Consequently, we used structures of three proteins solved by NMR and X-ray, namely PDB id 1D3Z, 2JVD and 1NS1 to evaluate the performance of different DFT functionals and basis sets, as explained below.
3.2 Performance of Different DFT Functionals to Reproduce Observed 13Cα Chemical Shifts
DFT has become a method of choice for QM calculations of the electronic structure and properties of many molecular and solid systems. Because the exact exchange-correlation functional is unknown, a large number of approximations has been proposed in the literature making it essential to pursue more accurate and reliable approximate functional, a process which, on the other hand, depends on the applications. Selection of the most appropriate density functional model for a particular application becomes one of the main problems of the DFT method. For this reason we decided [28] to test several density functional models (namely B3LYP, OLYP, PBE1PBE, OPBE, O3LYP, OPW91, OB98, BPW91, BPBE and B971). The benchmarking was intended to find not only the most accurate functional with which to reproduce the observed 13Cα chemical shifts in solutions but also the fastest one, in terms of CPU time, because speed of DFT calculations could severely limit their applicability to proteins. The test was applied to 10 NMR-derived conformations of the 76-residue α/β protein ubiquitin (PDB id 1D3Z).
Comparison of the observed and computed 13Cα chemical shifts shows that there are five functionals, namely OPW91, OB98, OPBE, OLYP, and O3LYP, which are among the faster ones and, even more importantly, behave very similarly in their ability to reproduce accurately the observed 13Cα chemical shifts. In particular, we observe that OB98 appears to be slightly better than any other of the five functionals in terms of both the correlation coefficient, R, (or Pearson coefficient) between the observed and the conformational-averaged 13Cα chemical shifts and the standard deviation of the computed conformational-averaged 13Cα chemical shifts from a linear regression. Consequently, we chose the OB98 for all the applications [30].
We also compared the results obtained using OB98 with those obtained with B3LYP, a very popular functional that has been used extensively in our group, and elsewhere. The correlation existing between averaged 13Cα chemical shift values obtained for the 10 conformations of 1D3Z with OB98 and B3LYP functional, is excellent [30], i.e., showing a correlation coefficient R = 0.998 and standard deviation of 0.300 ppm. This test provides solid evidence that the results and conclusions obtained using B3LYP do not need to be revised if the OB98 functional is adopted [30].
3.3 Performance of Different Basis Sets to Reproduce Observed 13Cα Chemical Shifts
To study the dependence of the accuracy and speed of DFT calculations of the 13Cα chemical shifts in proteins on the size of the basis set used, six basis sets, viz., 6-31G/3-21G, 6-31G(d)/3-21G, 6-311G(d, p)/3-21G, 6-311 + G(d, p)/3-21G, and 6-311 + G(2d,p)/3-21G locally-dense basis-set approximations, and uniform 3-21G/3-21G set were initially applied [28] to 10 NMR-derived conformations ubiquitin [54]. For each of these six basis sets, combined with the OB98 functional, the 13Cα shielding was computed for 760 amino acid residues by treating each amino acid X in the sequence as a terminally-blocked tripeptide with the sequence Ac-GXG-NMe in the conformation of the regularized experimental protein structure. Analysis of the results [28], in terms of the agreement between the computed and observed 13Cα chemical shifts shows that the accuracy with which the observed 13Cα chemical shifts are reproduced by using either the small basis set (6-31G/3-21G) or the larger basis set [6-311 + G(2d,p)/3-21G] is very similar, although, use of the small basis set leads to a significant decrease in computational time.
The results also indicates that the 13Cα chemical shifts computed with the large [6-311 + G(2d,p)/3-21G] basis set, can be reproduced accurately (within an average error of ~0.4 ppm) and faster (by ~9 times) by using the small (6-31G/3-21G) basis set after extrapolating it with: \( {}^{13}C^{\alpha } = - 1.597 + 1.040 \times {}^{ 1 3}C_{\mu }^{\alpha } \). In effect, the correlation existing between averaged 13Cα chemical shift values computed for the 32 conformations of 1NS1 with these two basis sets, is excellent [28], i.e., showing a correlation coefficient R = 0.999 and standard deviation of 0.284 ppm. Even more important, an analysis of the magnitude of the errors and their distribution carried out for Val and Arg hypersurfaces, constructed by calculating a grid of 6864 and 6794 points, respectively, corresponding to different combinations of the ϕ, ψ, χ1, and χ2 (only for Arg) torsional angles, indicates that ~70% of them are within ~0.6 ppm and that the most populated regions of the Ramachandran map are not affected by errors higher than ~1.0 ppm [28].
In conclusion, the described analysis enabled us to select the smaller basis set (6-31G/3-21G) that provides accuracy similar to that of a ‘basis set limit’ [6-311 + G(2d,p)/3-21G] to reproduce the computed chemical shifts, but at a significantly lower computational cost [28].
3.4 Effect of Sequential Nearest-Neighbors on the 13Cα Chemical Shifts Calculations
The 13Cα chemical shifts for a residue X in the model peptide Ac-G-X-G-NMe has always been computed [44, 34] considering that all the torsional angles of the residue X are exactly those of the residue in the protein conformation and that the surrounding Gly residues and the end-blocking groups are free to rotate. It is implicit in this approach that the 13Cα chemical shifts of residue X do not depend on the identity of the nearest-neighbor residues. This assumption needs to be proved.
The structure of the Nucleic Acid Binding (NAB) protein of the SARS coronavirus [80], a 116-residue α/β protein containing 9 Prolines (Pro) and with 50% of its residues in loops and turns, was chosen to further evaluate the origin of differences between computed and observed 13Cα chemical shifts, as well as to study the influence of the nearest-neighbor residues on the computed 3Cα chemical shifts.
The results [48] indicate that computation of the 13Cα chemical shifts of a given residue in the sequence of the NAB protein is not influenced significantly, i.e., within ~0.5 ppm, by the nature of the nearest-neighbor amino acids, except for residues immediately preceding proline (see Fig. 2a). For such residues, Pro must be considered during the computation of the 13Cα chemical shifts; otherwise, an overestimation of the computed 13Cα chemical shifts by about +1.7 ppm occurs. This finding is in good agreement with both the experimental evidence [36, 81, 82] and the empirical observations [37, 81]. It is equally important to emphasize the physical nature of this effect: “…an imide bond formed by an Xxx–Pro pairing is generally thought to be much less electron-withdrawing than an amide bond…” [37].
Overall, except for the Pro effects, use of the Ac-G-X-G-NMe model peptide for the computation of the 13Cα chemical shifts of residue X is a good approximation because the computed values are accurate within ±0.5 ppm for all residue-types, if neither the subsequent nor precedent residue-type effects are taken into account (see Fig. 2).
3.5 Rigid-Geometry Approximation and Accuracy of the Calculations of 13Cα Chemical Shifts
Experimental protein structures are often solved using force fields which allow variation of bond lengths and bond angles. However, it is known that QM calculations are very sensitive to bond lengths and bond angles [16]. Therefore, we have explored the dependence of the computed 13Cα-chemical shifts on the bond lengths and bond angles to establish whether a rigid- rather than non-rigid geometry approximation is a more accurate representation with which to compute the chemical shifts.
For this test, the structure of ubiquitin deposited in the PDB (PDB id 1UBQ) was chosen because it possesses non-regularized geometry and has been solved by X-ray diffraction at 1.8 Å resolution [83]. We have also examined the corresponding structure with regularized geometry, i.e., the one with all the residues replaced by the standard ECEPP residue geometry [62], named here as 1UBQregular. Analysis of the differences between the computed and observed 13Cα chemical shifts for the 1UBQ and 1UBQregular structures, leads to rmsd of 3.28 ppm and 2.38 ppm, respectively. The better agreement obtained with 1UBQregular, rather than 1UBQ, is consistent with the long-time recognition that the bond lengths and bond angles of both X-ray and NMR-derived structures are not as highly accurately defined as in studies of small molecules [16], with which the ECEPP geometry [62] has been parameterized. Further analysis of the agreement of the two ubiquitin structures with the deposited electron density data [83] of 1UBQ, in terms of the R-factor, leads to 19.2 and 23.1% for 1UBQ and 1UBQregular, respectively; while the all-heavy-atom rmsd between these two structures is 0.142 Å [34].
Overall, the use of regularized geometry, i.e., ECEPP geometry, is an accurate approximation with which to compute the 13Cα chemical shifts in proteins and, hence, is used in most of the application discussed in this chapter.
3.6 13Cα Chemical Shifts as a Function of the Charge Distribution
Among the factors that affect 13Cα-shielding, which are important for an accurate computation of chemical shifts, is the sensitivity of 13Cα nuclei to the shielding/deshielding induced by changes in the protonation/deprotonation of distant ionizable groups [84–87]. However, these factors have not been taken into account explicitly in current computations of 13Cα chemical shifts in proteins at the QM level of theory because, usually, the calculations are carried out in the gas phase, and the ionizable residues are treated as neutral groups.
The question of whether the use of neutral, rather than charged, side chains is more accurate for computation of the 13Cα chemical shifts of ubiquitin, at a given fix pH, was investigated as follows [45]. For a given ionizable residue i in a conformation k, first, the average charge distribution, < ρi,k > , was computed by using Eq. (4), i.e., by explicit consideration of the 2ξ ionization states for every conformation [75], with ξ being the number of ionizable groups in the molecule, namely 22; and second, the 13Cα chemical shifts as a function of the pH,\( \delta_{i}^{{}} (pH) \), were computed by using Eq. (5). This analysis was applied to 139 conformations of ubiquitin: 138 (10 conformations from PDB id 1D3Z plus 128 conformations from PDB id 1XQQ) NMR-derived conformations [54, 88], while the remaining one is an X-ray structure (PDB id 1UBQ) solved at 1.8 Å resolution [83].
Additionally, an extra set of 50 randomly generated conformations for each amino acid residue X, in the terminally-blocked tripeptide with the sequence Ac-GXG-NMe, with X being Lysine (Lys), Ornithine (Orn), Diaminobutyric acid (Dab), Glutamic acid (Glu) or Aspartic (Asp) acid, were also obtained. This set of randomly generated conformations was used to determine: (i) the range of shielding/deshielding of the 13Cα nucleus of free acidic/basic amino acid residues in solution, in their fully charged and neutral forms, respectively; (ii) how these ranges of shielding/deshielding variations compare with those derived from 3058 ionizable groups of the 139 conformations of the protein ubiquitin; and (iii) how the computed shielding/deshielding range of variations are influenced by the distance between the charged side-chain group and the 13Cα nucleus (for example, there are two chemical bonds in Asp, rather than three in Glu, separating the deprotonated carboxyl group from the 13Cα nucleus). To examine an analogous effect for a basic side-chain group, such as Lys, use was made of the non-natural amino acids Orn and Dab because, for these amino acids, the protonated amino group is separated from the 13Cα nucleus by four and three chemical bonds, rather than by five in Lys.
The results of this study [45], based on the analysis of 139 conformations of ubiquitin at pH 6.5, indicate that use of neutral, rather than charged, amino acids is a significantly better approximation of the observed 13Cα chemical shifts in solution for the acidic groups, and a slightly better representation, though significantly less expensive computationally, for the basic groups (see Fig. 3).
Additionally, our analysis of Lys, Orn and Dab revealed a significantly greater deshielding of the 13Cα nucleus (due to the deprotonation of the acidic groups) than the shielding due to the protonation of the basic groups. The origin of such a difference can be found in the distance between the ionizable groups and the 13Cα nucleus, which is shorter for the acidic than for the basic groups.
3.7 13Cα Chemical Shifts as a Function of Side-Chain Flexibility
To what extent are the chemical shifts of the amino acid residues in a protein affected by the side-chain orientation? The basis for such a query arises from the fact that the three torsion angles ϕ, ψ and χ1 are not independent on each other over the whole range because they involve a common N-Cα bond [89, 90]. To find an answer to this question, the dependence of the 13C chemical shifts on side-chain orientation was investigated [35], at DFT level of theory, for two-strand antiparallel β-sheet model peptide with the amino acid sequence Ac-A3-X-A12-NH2 where X represents any of the 17 naturally-occurring amino acids considered here, i.e., not including alanine, glycine and proline. Because the majority of β-sheets are twisted, rather than planar, with a right-hand twist in the approximately ±30° range for the backbone dihedral angles [91–94] conformational parameters for β-sheets may deviate from those for planar pleated sheets and, hence, are difficult to model by using canonical values. The fact that β-sheets in proteins appear as parallel or antiparallel strands, or a combination of both, only exacerbates the modeling problem. For this reasons, the dihedral angles adopted for the backbone were taken, and kept fixed, from the experimental structure of an antiparallel β-sheet, specifically from the 16-residue segment (G41-G56) of the B3 binding domain of protein G (PDB id 1P7E).
For the 17 naturally occurring amino acids considered the analysis indicates that there is: (a) good agreement between computed and observed 13Cα and 13Cβ chemical shifts, i.e., with correlations coefficient, R, of 0.95 and 0.99, respectively; (b) significant variability of the computed 13Cα and 13Cβ chemical shifts as function of χ1 for all 17 residues, except for Ser; and (c) a smaller compared to χ1, although significant, dependence of the computed 13Cα chemical shifts of χξ (with ξ ≥ 2) for 11 out of 17 residues.
The above results obtained by Villegas et al. [35] for an antiparallel (16-residue segment) β-sheet were later validated on a 76 residues α/β protein, i.e., by exploring the effects of side-chain conformation on the computed 13Cα chemical shifts [45]. This validation process involved an exhaustive conformational search, starting from an arbitrary selected conformation of the NMR-determined ubiquitin protein (PDB id 1D3Z), in which only the torsional angles of the side chains were allowed to vary, i.e., all backbone dihedral angles (ϕ, ψ, ω) were fixed at their corresponding observed values. Furthermore, the correlation coefficient, R, between computed, by using the Karplus equation [95], and observed vicinal coupling constants 3JN-Cγ and 3JC′-Cγ of 17 valine, threonine and Isoleucine residues, was used to check the accuracy of the side-chain conformational search.
The obtained results on an antiparallel β-sheet segment and the ubiquitin protein enabled us to determine the role and impact of a proper side-chain conformation for an accurate computation of the observed 13Cα chemical shifts in solution.
4 Use of the Structural Information Decoded from 13C Chemical Shifts
We have chosen three examples to illustrate how the structural information decoded from the observed 13C chemical shifts can be used in practice: (1) to determine the fraction of the tautomeric forms of the imidazole ring of histidine (His) in proteins as a function of pH, provided that the observed 13Cγ and 13Cδ2 chemical shifts and the protein structure, or the fraction of H+ form are known; (2) to determine either all α-helical or all β-sheet protein structures in solution; and (3) to assess the reliability of NMR-determined protein models before they are published or deposited in the PDB. Each of these applications is described in the following subsections.
4.1 The Importance of Being His
In 1965 Mandel [96], in a pioneering NMR experiment, detected the imidazole (C2) protons of histidine (His) residues in Ribonuclease A and in 1966, Bradbury and Scheraga [97], were able to distinguish between the histidine residues of Ribonuclease A, i.e., they resolved the NMR-peaks of three out of four histidines of this enzyme. Subsequently, use of NMR spectroscopy, X-ray crystallography and theoretical studies, based on QM calculations, have continuously evolved in their ability to determine properties of the histidine residues in solution and in the solid state [43, 79, 98–116]. The reason for this persistent interest in His is due to the fact that this residue is unique among all 20 naturally occurring amino acids because ~50% of all enzymes use His in their active sites [117]. This is, mainly, because of the versatility of imidazole His ring, which includes two neutral, chemically-distinct forms, referred to as Nδ1-Η and Nε2-Η tautomers, and a protonated form, the charged H+ form, with one form favored over the other two by the protein environment and pH. In addition, His with a pK° of 6.6 [118] is the only ionizable residue that titrates around neutral pH, allowing the non-protonated nitrogen of its imidazole ring to serve as an effective ligand for metal binding [79], or to play a crucial role in the proton-transfer process [103].
Certainly, determination of the fraction of the tautomeric forms of the imidazole ring of His in proteins in solution is an important problem for a number of reasons. At a given fixed pH proteins in solution exist as an ensemble of conformations and, hence, the form of each His residue among different protein conformers may vary significantly because the tautomeric equilibrium is determined by the environment [43]. Moreover, because the exchange between different protonation states is assumed to occur in the fast exchange regime [79], the NMR resonances of a given nucleus, which include rotation, protonation and tautomerization, merge into a single average signal. Decoding the information from these exchange processes offers possibility to determine the extent to which the His residues in proteins behave as free His, where the Nε2-H tautomer is favored over the Nδ1-H tautomer in a ratio of 4:1 [108].
To find a solution to this long-standing problem in the biophysical chemistry of proteins, first, each form of His was treated as a terminally-blocked model tripeptide with the sequence: Ac-GHξG-NMe, with Hξ in the Nδ1-H, the Nε2-H tautomeric form or the protonated form H+, respectively. For each of the forms, a set of ~35,000 conformations, representing a uniform sampling of the whole Ramachandran map as function of ϕ, ψ, ω, χ1 and χ2 torsional angles, was generated. Afterward, the gas-phase, isotropic shielding value was computed using the method described in Sect. 2.1. Finally, the distribution of the computed shielding of the imidazole ring of His was analyzed in terms of all 13C nuclei, namely 13Cγ, 13Cδ2, and 13Cε1 (see Fig. 4). Specifically, the histogram of the shielding distribution (among all ~35,000 conformations) was fit by a Gaussian function with a mean value σo (shown as bars in Fig. 4) and standard deviation sd (data not shown). A visual inspection of the histogram shown in Fig. 4 revealed that the mean σo shielding values obtained for the 13Cε1 nucleus is not sensitive to changes in the form of the imidazole ring and, therefore, we confine our interest to those nuclei that are sensitive to such changes, namely 13Cδ2 and 13Cγ.
Use of first-order shielding differences for a pair of selected nuclei, 13Cδ2 and 13Cγ, rather than chemical shifts, is a very convenient approach because the experimental referencing problem may be a source of errors [99]. Consequently, we define the first-order shielding difference, Δξ, as Δξ = |σ δ2ο – σ γο |ξ, with ξ denoting the form of the imidazole ring, and σ δ2ο and σ γο are the computed mean values of the shielding distribution for the 13Cδ2 and 13Cγ nuclei, respectively. In other words, the following convention is adopted: ξ = δ, ε, or +, to designate the Nδ1-H, Nε2-H or the H+ form, respectively.
Analysis of the first-order shielding differences indicates that the following inequality holds: Δε > Δ+ > Δδ, and Δδ ~0. Therefore, once the fraction of protonated H+ form, f + = < ρ > , computed with Eq. (4), and Δobs = |13Cδ2 – 13Cγ|, with 13Cδ2 and 13Cγ being the observed chemical shifts in solution, at a given pH, are known, the fraction of the Nε2-H tautomer (f ε) can be obtained assuming: (a) that all forms are in fast exchange on the NMR chemical shift time-scale [79], i.e., as: Δobs= f ε Δε + f + Δ+ + f δ Δδ; and (b) that Δδ ≡ 0.
Using these assumptions, together with some physical constraints, enable us to find an analytical expression with which to compute f ε, namely as:\( f^{\varepsilon } = \frac{{\Delta^{obs} (1 - \langle \rho \rangle )}}{{\Delta^{\varepsilon } }} \), with Δε the single-valued first-order shielding difference computed for the Nε2-H tautomer (Δε ~ 31 ppm). The fraction of the f δ tautomer is obtained straightforwardly as: \( f^{\delta } \, = \,1 - < \rho > - f^{\varepsilon } \).
The above formulation was used to determine the tautomeric forms of His for each of 8 selected proteins for which both the structure and the 13Cδ2 and 13Cγ chemical shifts of the imidazole ring of His, are available. In each of these applications the average degree of protonation < ρ > for all ionizable residues was computed by using Eq. (4). The tautomeric forms of His are determined by using the expressions for f δ and f ε given above [43]. Likewise, using the observed values, Δobs, obtained from solid-state NMR for unblocked dipeptides, with the sequence His-Leu, His-Met, Gly-His, Leu-His, His-Ala, His-Glu, Ala-His and His-Asp [99], we also determined the tautomeric fractions of the imidazole ring of His for each of these 8 compounds.
Results obtained from the 8 proteins indicate that the protonated form is the most populated one while the distribution of the tautomeric forms for the imidazole ring varies significantly among different histidine residues in the same protein (see Fig. 5a). Thus, His226 and His250 show comparable degree of protonation, < ρ >, although the tautomeric distribution is very different (see Fig. 5a), i.e., showing the importance of the environment of the histidines in determining the tautomeric forms. Let us explain the origin of this observation. On one hand, the Nδ1 nucleus of H250 is located only 2.9 Å from the carbonyl backbone oxygen of S248 (see Fig. 5b), presumably forming a hydrogen-bond (green dots in Fig. 5b), while the Nε2 nucleus is exposed to the solvent but the imidazole ring is surrounded by fully protonated R264 and R266 (data not shown) and, hence, lowering the probability that a proton binds to Nε2, in good agreement with the computed tautomeric distribution for H250 in Fig. 5a. On the other hand, the Nε2 nucleus of the imidazole ring of H226 is at 3.3 Å from a backbone carbonyl oxygen of W246 (see Fig. 5c), while the Nδ1 is at 3.1 Å from a backbone amino group of H226 (see Fig. 5c). As a result, a preference of Nε2-H over the Nδ1-H tautomeric form for H226 is expected, in agreement with the computed tautomeric fractions for this residue in Fig. 5a.
In addition, our results show that for ~70% of the neutral histidine-containing dipeptides the method leads to fairly good agreement between the calculated and the experimental tautomeric form. Co-existence of different tautomeric forms in the same crystal structure may explain the disagreement obtained for the remaining 30% of dipeptides.
4.2 Protein Structure Determination
In this section we illustrate, with two examples, how the structural information encoded in the 13Cα chemical shifts can be used to determine an ensemble of conformations, provided that a set of NOE-derived distance constraints, is available. However, since the chemical shifts are sensitive to the dynamics of a protein on the microsecond time scale [88] the question whether a single rather than an ensemble of conformations is a better representation of the NMR observables, such as the chemical shifts, must be investigated first.
4.2.1 The Crystallographer Dilemma: A Single Structure or an Ensemble of Conformations?
In protein crystallography it is conventional to represent the conformation of a protein by a single structure, although proteins are very flexible in solution, and, hence, the question whether a single structure, rather than an ensemble of conformations, is a more accurate representation of the observed 13Cα chemical shifts in solution deserves to be investigated.
Proteins in solution are flexible molecules which exhibit anisotropic motion and exist as a dynamic ensemble of conformations. Although, protein flexibility in the crystalline state is reduced (compared to solution) as a result of crystal packing, some dynamics and heterogeneity still remain [119, 120] because of the high solvent content in most protein crystals [104]. Despite this, protein structures solved by X-ray diffraction are traditionally represented by a single conformation. Crystallographic temperature (B) factors, which contain information about atomic displacements arising from the combined effects of dynamic, static and lattice disorders within the crystal lattice, provide an important indication of protein motions in the crystalline state.
Consequently, consideration of an ensemble of protein conformations generated by using B-factor values as a guide may potentially improve the agreement between the NMR- and X-ray-derived protein models in terms of some NMR observables, such as 13Cα chemical shifts. To explore such possibility we selected ubiquitin, an α/β 76 residues protein. The structure of this protein was solved by X-ray (PDB id 1UBQ [83]), and NMR (PDB id 1D3Z [54]) methods, with the latter providing the available 13Cα chemical shifts.
Since the deposited PDB structures of 1UBQ were solved and refined by using software and force-field parameters different from those employed in our method, a new set of conformations was generated using MCM and rigid geometry starting from the corresponding regularized experimental X-ray structure (1UBQregular). During the MCM search, variations of the (ϕ, ψ, χ) torsional angles were allowed for all the residues in the sequence. The reported B-factors for 1UBQ were used to estimate the upper limit of the torsional angle variation adopted (±10°). The generated set of conformations was subjected to several rounds of refinement using a standard procedure in X-ray crystallography, i.e., the Crystallography and NMR System (CNS) program [51, 52]. As a result 5 conformations were selected.
All the 5 generated models are quite different among themselves and from the corresponding starting structure, with an all-atom rmsd of 0.36–1.13 Å. Moreover, for all 5 models, no residues were in disallowed regions of the Ramachandran plot [8] and all unfavorable contacts occur between the atoms from the last five residues in the sequence, which were not visible in the electron-density map. In addition, the R and Rfree factors of the 5 models are equivalent to or better than those of the one obtained for a Simulated Annealing Refined (SAR) structure of PDB 1UBQ. This refinement of the deposited 1UBQ structure i.e., named SAR structure, is a necessary step for a consistent comparison between the chemical shifts of the generated 5 models and the PDB structure, because C13 chemical shifts are very sensitive to small differences in bond lengths and bond angles [16].
Figure 6 shows the rmsd values between the observed and computed 13Cα chemical shifts obtained for each of the 5 new models (light-grey bars) and the SAR structure (black-filled bar). The ca-rmsd, computed from the ensemble of 5 new models, is shown as a horizontal solid line in Fig. 6. The ca-rmsd (2.36 ppm) is lower than the value for the SAR structure (2.74 ppm) or for any of the new models. These results obtained for ubiquitin demonstrate that consideration of an ensemble of 5 conformations, derived from the regularized experimental X-ray (1UBQregular) structure, leads to better agreement with the observed 13Cα chemical shifts than does a single conformation (the SAR structure).
The above conclusion is in line with the suggestion of crystallographers’ that “…a more suitable representation of a macromolecular crystal structure would be an ensemble of models...” [121]. Analysis of NMR-determined ensemble of conformations also lead to similar conclusion, i.e., use of the ca-rmsd value led to closer agreement with the observed 13Cα chemical shifts in solution than when individual, or the mean, rmsd is used [33]. In other words, proteins in solution are conformationally labile, as indicated by both the ca-rmsd and the theoretical minimal-rmsd model analyses, and this must be taken into account to predict the 13Cα chemical shifts most accurately.
4.2.2 Determination of β-Sheet Structures
Evidence obtained from the probability-based secondary structure identification method of Wang and Jardetzky [122] suggests that the reliability to distinguish an α-helix from a statistical coil based on chemical shift information follows, for the heavy nuclei only, the ranking: 13Cα > 13C′ > 13Cβ > 15N, whereas a different trend (13Cβ > 13Cα ~ 13C′ ~ 15N) was found for the corresponding reliability to distinguish a β-strand conformation from a statistical coil. This trend raises the question whether a mainly 13Cα-driven methodology can be used to predict predominantly β-sheet structures and, if so, how well the corresponding 13Cβ chemical shift predictions would be.
To answer this question, our recently-introduced physics-based protocol (see Fig. 1) was applied to determine the structure a 20-residue peptide capable of forming a three-stranded antiparallel β-sheet in aqueous solution, i.e., the BS2 peptide with the sequence: TWIQNDPGTKWYQNDPGTKIYT, for which both a complete set of 13Cα chemical shifts and a reduced number of NOEs were reported. The experimental structure determination of small proteins and peptides, which are able to fold as monomers and do not contain disulfide bonds, is very valuable because such determinations can provide important information for force-field development and evaluation or improvement of search algorithms aimed at an efficient exploration of the conformational space [123–126].
The results obtained indicate that an accurate all β-sheet structure can be determined by simply identifying a set of conformations which simultaneously satisfy a set of constraints including 13Cα-dynamically-derived torsional angle constraints for all amino acid residues in the sequence and a fixed set of NOE-derived distance constraints [29]. Among the thousands of conformations generated by the VTF approach, i.e., during the step 1 of the protein structure determination protocol shown in Fig. 1, 25 of them (see Fig. 7a) were selected by using a clustering procedure. This small set of conformation was used to determine the theoretical minimal-rmsd model that provides us with a set of ϕ, ψ, and χ torsional angle constraints for all the residues in the sequence not just for those in α-helix or β-sheet regions. Using this set of torsional angle constraints (ϕ, ψ, χ), combined with different number of NOE-derived constraints, 2 sets of conformations of the BS2 peptide were determined after the step 2 of the protocol. One set of 20 conformations (shown in Fig. 7b) was obtained by using 118 NOE-derived distance constraints, while the other set of 10 conformations (shown in Fig. 7c) was obtained by using 130 NOE-derived distance constraints. Regardless of the number of the NOE’s-derived distance constraints used, addition of the 13Cα-derived torsional constraints led to a noticeably lower ca-rmsd’s (2.2 and 3.5 ppm, for the set of 20 and 10 conformations, respectively) compared to the 20 models obtained by Santiveri et al. [127] who used a full set of 130 NOE’s-derived distance constraints but no 13Cα chemical shift information (4.6 ppm). In line with this finding, graphical inspection of the results shown in Fig. 7b–c also indicated that use of 13Cα-derived torsional constraints led to sets of conformations with less side-chain torsional angle spreading, i.e., as can be seen from comparison of Fig. 7b and c against 7d, with the latter obtained by Santiveri et al [127]. In addition, the correlation coefficient, R, between the observed and computed 13Cβ chemical shifts was somewhat better for the two sets obtained using the 13Cα-based determination protocol (shown in Fig. 1). Thus, R is 0.99 and 0.98 for the 20 and 10 conformation sets, respectively, while R is 0.97 for the set of conformation derived by Santiveri et al [127].
Overall, analysis of the ca-rmsd, the NOE-derived distance violations, the 13Cβ chemical shifts, and some stereo chemical quality factors for these sets, as a measure of the closeness with which the calculations reproduce the structure in solution, indicates that our self-consistent physics-based method is able to produce a more accurate set of conformations (shown in Fig. 7b and c) than that obtained with the traditional methods [127] [shown in Fig. 7d]. Our results also suggest that for a flexible molecule in solution, like BS2, it may not be possible to determine a single structure that would satisfy all the constraints simultaneously. This is a consequence of the well-known fact that NMR parameters, such as the observed NOE-derived distances and the 13Cα chemical shifts, correspond to a dynamic ensemble of conformations and, therefore, may not be reproduced exactly by a limited set of static structures [44, 128].
Characterization of the structural flexibility of molecules in solution is of fundamental importance for the study of biological function, stability and folding [129, 130]. Therefore, additional analysis of the per-residue average 13Cα conformational shifts was carried out and the results indicated that the third, C-terminal, strand in the β-sheet of the BS2 peptide is the most flexible strand, although less flexible than the turns. In addition, a 20 ns molecular dynamics simulations (MD) using the AMBER 8.0 package [131] were performed. The MD runs yielded a plausible atomic description of the motion of BS2 peptide in solution, as revealed by both the pattern of hydrogen bonds and the generalized Lindemann parameter [132]. The MD results were in line with the per-residue average 13Cα conformational shifts analysis, providing additional evidence of greater flexibility of the C-terminal strand.
The fact that the observed 13Cα chemical shifts, supplemented only by NOE-derived distance constraints, provide accurate information for validation and refinement of protein structures, as well as site-specific information about the flexibility of a molecule in solution, may be very useful for NMR spectroscopists and theoreticians interested in analysis of the stability and protein-folding mechanism.
4.2.3 A Blind Test to Determine an α-Helical Structure
The solution NMR structures of both full length (residues 1–77) and truncated (residues 1–46) forms of YnzC protein (PDB id 2JVD) from Bacillus subtilis [133], that is part of the small yneA SOS response operon that regulates cell division in this organism [134], have been determined recently [135]. The corresponding X-ray crystal structure (PDB ID, 3BHP) was solved by Kuzin et al. [133] at 2.0 Å resolution. The unique two-helix monomeric structure of YnzC, with no disulfide bonds, makes it an attractive subject for testing our physics-based methodology for protein structure determination.
The goal of this application is two two-fold. First, as a blind test, we attempted to determine whether it is possible to obtain an ensemble of conformations for which each individual conformer simultaneously satisfies the NOE-derived distance constraints and the 13Cα-derived torsional constraints for the YnzC protein in solution [136]. Although the solution NMR structure [135] of this protein had been solved at the time of this blind test, the only information provided was a full set of both the observed 13Cα chemical shifts and the NOE-derived distance constraints. In particular, no information about the coordinates of the solved structures of the YnzC protein [135] or the heteronuclear 15N-1H NOE data was provided at the moment of the test.
Our second goal was to carry out a cross-validation test of high-quality sets of conformations obtained for the YnzC protein in solution by using alternative determination methods, namely, the solution NMR set of conformations (PDB id, 2JVD) obtained by using NOE-derived distance constraints, dihedral-angle constraints and hydrogen-bond constraints [135], and the 2.0-Å X-ray crystal structure (PDB id, 3BHP) (Kuzin et al. [133]. For this second goal, several validation scores were used [136], including: (i) Recall, Precision, F-measure (RPF) analysis [6]; (ii) several global quality score indicators provided by Verify3D [10], ProsaII [137], Procheck [8], and MolProbity [5]; (iii) the ca-rmsd and rmsd between observed 13Cα chemical shifts and those computed at the DFT level, and (iv) the backbone rmsd between these refined structures and the mathematical average coordinates of the ensemble of NMR structures of YnzC(1–48) deposited in the PDB.
By carrying out a blind test we demonstrated [136] that an accurate all α-helical set of protein structures can be determined by simply identifying conformations which simultaneously satisfy a set of constraints, including 13Cα-dynamically-derived torsional angle constraints for all amino acid residues in the sequence and a fixed set of 1022 NOE-derived distance constraints. The protein structure determination was carried out as follows: after generation of thousands of conformations using the VTF procedure (step 1) 10 of them, shown in Fig. 8b, were selected, i.e., those possessing a maximum NOE-derived distance violation lower than some fixed cutoff value; only one of the 10 conformations produced in step 1 was selected. The selected conformation was used as a starting one in a conformational search carried out with two types of constraints: the original fixed limited NOE-derived distance constraints and the set of ϕ, ψ, χ torsional angles derived from step 1. The resulting new set of 10 conformations is shown in Fig. 8c. Repetition of the step 2 with a tighter tolerance range, than in the previous iteration, for the torsional angle constraints enabled us to determine the final set of 10 conformations shown in Fig. 8d, i.e., the so-called Set-NOE-CS.
A comparative analysis of the rmsd, between the computed and observed 13Cα chemical shifts values for the residues 1–46, for all three sets of conformations is shown in Fig. 8a as a bar diagram, viz., the Set-NOE-CS (shown in Fig. 8d), 2JVD (shown in Fig. 8e) and the three chains of the X-ray crystallography structure 3HBP (shown in Fig. 8f). The results shown in Fig. 8a reveals that the two NMR-derived ensembles of structures (2JVD and Set-NOE-CS) are a better representation for the observed 13Cα chemical shifts in solution in terms of the ca-rmsd (solid horizontal black and red lines in Fig. 8a), than any single conformer (red or yellow bars in Fig. 8a), or any single chain of the X-ray structure (black, cyan and green bars in Fig. 8a). This result is in line with previous calculations for 10 NMR-derived conformations (PDB id 1D3Z) and the X-ray structure (PDB id 1UBQ) of ubiquitin.
Since the ca-rmsd analysis might be biased by the fact that the 10 conformations of Set-NOE-CS were computed using a 13Cα-based method while the others were not, a cross-validation quality test was also carried out. These structures consistently show good values for the RFP and DP-scores as well as for global structure quality indicators. This analysis reveals that all three sets of structures analyzed here display very good agreement with the experimental NOE data, as well as dihedral angle distributions and atomic clash scores typical of good quality protein structures. Taken together, these results indicate that the 20 conformations from the 2JVD set, the DFT-computed 10 conformations from Set-NOE-CS, and each of the three chains of the X-ray structure are highly-accurate sets of conformations which represent the YnzC protein in solution.
4.3 Protein Structure Validation
The PDB is the most important archive of experimental protein structures solved by X-ray crystallography and NMR spectroscopy. The large number of structures deposited in PDB constitutes an extraordinary source of information that has been, and continuously is, used for a wide range of applications in structural drug design, molecular modeling, force-field parameterization, molecular biology applications, etc. Some deposited protein structures, showing few, or a large number, of flaws, are formally withdrawn from the data-base and, hence, considered as obsolete, even though their coordinates remain available in PDB. In most cases, a successor (or superseded) structure replaces the old obsolete one. The large number of obsolete structure indicates that development of accurate validation protocols remains an important task.
4.3.1 A Chemical-Shift-Based Server
An ideal validation method should meet two requirements. First, it should be strong rather than weak. A validation method is considered ‘strong’ if it is able to assess how well a structure, or an ensemble of structures, predicts experimental data not used in the structure-determination process; otherwise it should be considered ‘weak’, since it is limited to reproducing the observed experimental data used in the determination of the protein models [138]. Second, it should be able to detect fast and accurately, at residue level, the existence of structural flaws. With these goals in mind a new server (CheShift) has been developed recently to predict 13Cα chemical shifts of protein structures. It is based on a database of chemical shifts computed for 696,916 conformations as a function of the ϕ, ψ, ω, χ1 and χ2 torsional angles for all 20 naturally occurring amino acids. The 13Cα chemical shifts were computed at the DFT level of theory using the methodology described in Sect. 2.1. Because of the large number of conformations, the computed shielding values were obtained using a small basis set (6-31G/3-21G) and later extrapolated to a large basis set [6-311 + G(2d,p)/3-21G], as described in Methods section.
An analysis of the accuracy and sensitivity of the CheShift predictions, in terms of the correlation coefficient R between the observed and predicted 13Cα chemical shifts, was carried out on 36 X-ray-derived protein structures solved at 2.3 Å, or better, resolution. Results indicate that for all the proteins the R values obtained using the CheShift, SHIFTX [24], SPARTA [25], SHIFTS [38, 39], and PROSHIFT [23] servers were comparable, although the CheShift values were systematically lowest. This raises the following question: do these servers provide a more sensitive validation than CheShift? To answer this question we choose protein 1RGE, solved at 1.15 Å resolution [139]. The corresponding crystal structure of this protein contains two chemically identical but crystallographically independent molecules in the asymmetric unit, named here as A and B [139]. The main structural difference between molecules A and B (with an all-heavy-atom rmsd of 1.1 Å) is due to differences in side chain conformations, especially those occupying different rotameric states. For this test, that do not require a comparison with the observed 13Cα chemical shifts, we computed the correlation coefficient R between the 13Cα chemical-shift predictions obtained for molecules A and B, respectively, by using five servers listed above. The results of this test give the following R values: 0.96, 1.00, 1.00, 0.98, and 1.00 for CheShift, SHIFTX, SPARTA, SHIFTS, and PROSHIFT, respectively. Except for CheShift (0.96) and SHIFTS (0.98), none of the servers is able to discriminate, beyond doubt, between molecules A and B. From a statistical point of view the R values obtained from SHIFTX (1.00), SPARTA (1.00) and PROSHIFT (1.00) servers indicate that molecules A and B are practically indistinguishable protein models. Therefore a lower R value between the predicted and observed 13Cα chemical shifts does not necessarily mean poorer accuracy but it could mean higher sensitivity to subtle structural differences. This conclusion can be confirmed by a similar analysis carried out at a higher level of accuracy, for example, by using a larger basis set and the actual geometry of chains A and B, i.e., without need for any torsional angle interpolations as with the CheShift server. In this case, the R value (0.93) computed with the larger basis set was significantly lower than the R value obtained with CheShift (0.96), or any other server, namely, 1.00, 1.00, 0.98, and 1.00 for SHIFTX, SPARTA, SHIFTS, and PROSHIFT, respectively.
So far, we have shown that the QM basis of the CheShift server enables us to predict the 13Cα chemical shifts with reasonable accuracy in seconds. Our results suggest that CheShift can provide a standard with which to evaluate the quality of protein structures solved by either X-ray crystallography or NMR-spectroscopy, if the experimentally observed 13Cα chemical shifts are available.
4.3.2 CheShift-2: A Picture Is Worth a Thousand Words
Differences between the observed and CheShift-predicted 13Cα chemical shifts can be used as a sensitive probe with which to detect possible local flaws in NMR-determined protein structures; hence, a graphical user interface has been added to the CheShift-2 server [49] to render such flaws easily visible. CheShift was originally developed to return a list of 13Cα predicted chemical-shift values, one for each amino acid in the sequence of a protein, except for the first and last residues [28, 33]. The validation process, i.e., the comparison between the predicted and the observed 13Cα chemical-shift values, is left to the user of the server who can use the provided information to determine the quality of the NMR structure as a whole, e.g., by computing the ca-rmsd [33]. However, it is a highly-desirable goal of any accurate validation method [11, 34] to identify the existence of local flaws in the sequence rather than only the global quality. Therefore, we added a graphical user interface (GUI) to the CheShift server. As a result, it will be possible to facilitate the validation process by displaying the differences between the observed and computed 13Cα chemical shifts by using a three-color code mapped onto a 3D protein model. This graphic validation method, far from being only an aesthetic improvement, will enable users of CheShift-2 to detect local flaws in proteins on a per-residue basis fast and accurately without the need for the user to carry out the extensive DFT calculations on which the server is based.
The CheShfit-2 server [49] makes use of the following sequential steps: (i) for each amino acid residue i the average difference between the observed and predicted 13Cα chemical-shifts, Δi, is computed by using Eq. (2); (ii) the Δi value is smoothed by averaging it over the values of the two nearest-neighbor residues (< Δi>); (iii) the resulting nearest-neighbor averaged value, < Δi> , is discretized, i.e., it is assigned an integer value of 1, 0 or −1, depending on the magnitude of < Δi > ; and (iv) these discrete values are mapped onto the 3D protein model and color coded as blue, white and red, respectively. This color-code assignment is based on the assumption that < Δi> values which are within ~1.7 ppm (blue), are considered as small; within ~3.4 ppm (white), as medium; and beyond 3.4 ppm (red), as large. Differences corresponding to blue and white colors are considered acceptable, while red color indicates possible flaws in the structure. In addition, the yellow color was adopted to specify the absence of observed or computed 13Cα chemical shifts [49].
When more than one protein model exists the averaged Δi values are computed considering all the deposited conformations, although the colored representation is illustrated by using only the first model. This situation is illustrated in Fig. 9 for the 20 NMR-determined conformations (see Fig. 9a) of Bacillus Cereus, a membrane associate protein, PDB id 2K5Q. The large dispersion of conformation in the loops and at the N- and C-termini shown in Fig. 9a, rather than being poor representation of the protein, reflects the flexibility of these segments of the molecules in solution, as is clearly shown by the CheShift-2 validation of 2K5Q (see Fig. 9b).
4.3.3 Global Versus Local Validation of Proteins
The NMR-determined ensembles of dynein light chain 2A protein, PDB id 1TGQ and 2B95, respectively, show different fold, with one of them, namely 1TGQ (now obsolete) having a wrong fold; while the other one, 2B95 (that replaced the obsolete 1TGQ in the PDB), showing a correct fold. This difference is a result of the oligomeric state assumed during the protein-structure determination, namely a monomer for 1TGQ, and a homodimer for 2B95, as pointed out by Nabuurs et al. [11].
Validation of both protein ensembles, as a whole, shows that 2B95 is a slightly better representation of the observed 13Cα chemical shifts, in terms of the ca-rmsd [34], than 1TGQ, viz., ca-rmsd = 2.08 and 2.35 ppm, for 2B95 and 1TGQ, respectively. However, the ca-rmsd difference between these two ensembles (~0.30 ppm) is not large enough to assure, unambiguously, that the 1TGQ ensemble needs further refinement. In fact, a similar difference in terms of rmsd, i.e., within a range of ~0.30 ppm, was found among 5 new models of the protein ubiquitin (see grey bars in Fig. 6), all of which fit X-ray diffraction data with R and Rfree factors similar to those for the deposited X-ray structure, PDB id1UBQ, solved at 1.8 Å resolution [41]. Certainly, these 5 new models can be considered to be of comparable structural quality. Consequently, variations of ca-rmsd ~0.30 ppm cannot be used as a universal criterion to unequivocally determine if a protein, such as 1TGQ, needs further refinement.
Analysis of dynein light chain 2A protein illustrates that validation of a protein as a whole (global validation), e.g., with the ca-rmsd, may not enable us to determine unambiguously whether one protein model is of better quality than another model of the same protein, while the validation at a per-residue basis (local validation), e.g., as with the CheShift-2 server, does (see Fig. 10). To further test the ability of CheShift-2 server to detect small differences between protein models, a small set of 15 obsolete/successor pairs of proteins was also considered (see Supplementary Data of [49]. The results indicate that the CheShift-2 server constitutes a fast and accurate validation tool with which to determine, at the per-residue basis, the existence of local flaws in protein models even for conformations that differ in small details, as for the obsolete and successor models of Membrane-bound Lytic Murein Transglycosylase D (fragment Lysm Domain) (see Fig. 11).
In general, pairs of obsolete and successor proteins present in PDB can be used as a benchmark set with which to test validation methods. These ensembles of obsolete/successor pairs of proteins are very appealing because their members possess different topology and numbers of residues and a complete sets of 13Cα chemical shifts are available for a large number of them from the Bio Magnetic Resonance Data Bank (BMRB) [117].
5 Conclusions and Future Directions
In this chapter we have illustrated how the information encoded in the 13C chemical shifts can be used for an assorted number of applications, namely, from protein structure prediction to accurate detection of structural flaws, at a residue-level, in NMR-determined protein models.
The ability to detect and accurately characterize the mobility of the surface side chains by computing 13Cα chemical shifts constitutes one of the strengths of the current methodology. Hence, we are planning to focus our research on the development of new physics-based algorithms for a fast and accurate determination and validation of side-chain conformations, with the goal to improve the quality of NMR-determined protein models. Since NMR spectroscopy provides chemical shifts for several other nuclei, besides 13Cα, feasibility of their DFT-computation and benefits of including the information encoded in these data in structure determination protocols is currently under investigation in our group. In general, new developments in the field of NMR spectroscopy are needed in order to develop protocols for high-throughput NMR determination of high-quality protein structures in solution.
References
Bhattacharya, A., Tejero, R., Montelione, G.T.: Evaluating protein structures determined by structural genomics consortia. Proteins 66, 778–795 (2007)
Billeter, M., Wagner, G., Wüthrich, K.: Solution NMR structure determination of proteins revisited. J. Biomol. NMR 42, 155–158 (2008)
Williamson, M.P., Craven, C.J.: Automated protein structure calculation from NMR data. J. Biomol. NMR 43, 131–143 (2009)
Williamson, M.P., Kikuchi, J., Asajura, T.: Application of 1H-NMR chemical-shifts to measure the quality of protein structures. J. Mol. Biol. 247, 541–546 (1995)
Davis, I.W., Leaver-Fay, A., Chen, V.B., Block, J.N., Kapral, G.J., Wang, X., Murray, L.W., Arendall III, W.B., Snoeyink, J., Richardson, J.S., Richardson, D.C.: MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res. 35, W375–W383 (2007)
Huang, Y.J., Powers, R., Montelione, G.T.: Protein NMR Recall, Precision, and F-measure scores (RPF scores): Structure quality assessment measures based on information retrieval statistics. J. Am. Chem. Soc. 127, 1665–1674 (2005)
Huang, Y.J., Tejero, R., Powers, R., Montelione, G.T.: A topology-constrained distance network algorithm for protein structure determination from NOESY data. Proteins 62, 587–603 (2006)
Laskowski, R.A., MacArthur, M.W., Moss, D.S., Thornton, J.: PROCHECK—a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291 (1993)
Lovell, S.C., Davis, I.W., Arendall III, W.B., de Bakker, P.I.W., Word, J.M., Prisant, M.G., Richardson, J.S., Richardson, D.C.: Structure validation by Cα geometry: ϕ, ψ, and Cβ deviation. Proteins 50, 437–450 (2003)
Lüthy, R., Bowie, J.U., Eisenberg, D.: Assessment of protein models with three-dimensional profiles. Nature 356, 83–85 (1992)
Nabuurs, S.B., Spronk, C.A.E.M., Vuister, G.W., Vriend, G.: Tradional biomolecular structure determination by NMR spectroscopy allows for major errors PLOS. Comp. Biol. 2, 71–79 (2006)
Vriend, G.: WHAT IF: a molecular modeling and drug design program. J. Mol. Graph. 8, 52–56 (1990)
Berjanskii, M., Wishart, D.S.: A simple method to predict protein flexibility using secondary chemical shifts. J. Am. Chem. Soc. 127, 14970–14971 (2005)
Berjanskii, M., Wishart, D.S.: The RCI server: rapid and accurate calculation of protein flexibility using chemical shifts. Nucleic Acids Res. 35, W531–W537 (2007)
Cornilescu, G., Delaglio, F., Bax, A.: Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J. Biomol. NMR 13, 289–302 (1999)
de Dios, A.C., Pearson, J.G., Oldfield, E.: Chemical shifts in proteins: An ab initio study of carbon-13 nuclear magnetic resonance chemical shielding in glycine alanine and valine residues. J. Am. Chem. Soc. 115, 9768–9773 (1993)
de Dios, A.C., Pearson, J.G., Oldfield, E.: Secondary and tertiary structural effects on protein NMR chemical shifts: An ab initio approach. Science 260, 1491–1496 (1993)
Frank, A., Möller, H.M., Exner, T.H.: Toward the quantum chemical calculation of NMR chemical shifts of proteins. 2 Level of theory, basis set, and solvent model dependence. J. Chem. Theory Comput. 8, 1480–1492 (2012)
Havlin, R.H., Le, H., Laws, D.D., de Dios, A.C., Oldfield, E.: An ab initio quantum chemical investigation of carbon–13 NMR shielding tensors in glycine, alanine, valine, isoleucine, serine, and threonine: Comparisons between helical and sheet tensors, and effects of χ1 on shielding. J. Am. Chem. Soc. 119, 11951–11958 (1997)
Iwadate, M., Asakura, T., Williamson, M.P.: Cα and Cβ carbon-13 chemical shifts in proteins from an empirical database. J. Biomol. NMR 13, 199–211 (1999)
Kuszewski, J., Qin, J., Gronenborn, A.M., Clore, M.: The impact of direct refinement against 13Cα and 13Cβ chemical shifts on protein structure determination by NMR. J. Magn. Reson. Ser. B 106, 92–96 (1995)
Luginbühl, P., Szyperski, T., Wüthrich, K.: Statistical basis for the use of 13Cα chemical shift in protein structure determination. J. Magn. Reson. 109, 229–233 (1995)
Meiler, J.: PROSHIFT: protein chemical shift prediction using artificial neural networks. J. Biomol. NMR 26, 25–37 (2003)
Neal, S., Nip, A.M., Zhang, H., Wishart, D.S.: Rapid and accurate calculation of protein 1H, 13C and 15 N chemical shifts. J. Biomol. NMR 26, 215–240 (2003)
Shen, Y., Bax. Ad.: Protein backbone chemical shifts predicted from searching a database for torsional angle and sequence homology. J. Biomol. NMR, 38, 289–302 (2007)
Shen, Y., Lange, O., Delaglio, F., Rossi, P., Aramini, J.M., Liu, G., Eletsky, A., Wu, Y., Singarapu, K.K., Lemak, A., et al.: Consistent blind protein structure generation from NMR chemical shift data. Proc. Natl. Acad. Sci. U. S. A. 105, 4685–4690 (2008)
Spera, S., Bax, A.: Empirical correlation between protein backbone conformation and Cα and Cβ 13C nuclear magnetic resonance chemical shifts. J. Am. Chem. Soc. 113, 5490–5492 (1991)
Vila, J.A., Arnautova, Y.A., Martin, O.A., Scheraga, H.A.: Quantum-mechanics-derived 13Cα chemical shift server (CheShift) for Protein Structure validation. Proc. Natl. Acad. Sci. U. S. A 106, 16972–16977 (2009)
Vila, J.A., Arnautova, Y.A., Scheraga, H.A.: Use of 13Cα chemical shifts for accurate determination of β-sheet structures in solution. Proc. Natl. Acad. Sci. U. S. A. 105, 1891–1896 (2008)
Vila, J.A., Aramini, J.M., Rossi, P., Kuzin, A., Su, M., Seetharaman, J., Xiao, R., Tong, L., Montelione, G.T., Scheraga, H.A.: Quantum chemical 13Cα chemical shift calculations for protein NMR structure determination. refinement, and validation. Proc. Natl. Acad. Sci. U. S. A. 105, 14389–14394 (2008)
Vila, J.A., Baldoni, H.A., Ripoll, D.R., Ghosh, A., Scheraga, H.A.: Polyproline II helix conformation in a proline-rich environment: a theoretical Study. Biophys. J. 86, 731–742 (2004)
Vila, J.A., Baldoni, H.A., Ripoll, D.R., Scheraga, H.A.: Unblocked statistical-coil tetrapeptides in aqueous solution: quantum-chemical computation of the carbon-13 NMR chemical shifts. J. Biomol. NMR 26, 113–130 (2003)
Vila, J.A., Villegas, M.E., Baldoni, H.A., Scheraga, H.A.: Predicting 13Cα chemical shifts for validation of protein structures. J. Biomol. NMR 38, 221–235 (2007)
Vila, J.A., Scheraga, H.A.: Assessing the accuracy of protein structures by quantum mechanical computations of 13Cα chemical shifts. Acc. Chem. Res. 42, 1545–1553 (2009)
Villegas, M.E., Vila, J.A., Scheraga, H.A.: Effects of side-chain orientation on the 13C chemical shifts of antiparallel β-sheet model peptides. J. Biomol. NMR 37, 137–146 (2007)
Wishart, D., Bigam, C.G., Yao, J., Abildgaard, F., Dyson, H., Oldfield, E., Markley, J., Sykes, B.: 1H, 13C and 15 N chemical shift referencing in biomolecular NMR. J. Biomol. NMR 6, 135–140 (1995)
Wishart, D., Bigam, C.G., Holm, A., Hodges, R.S., Sykes, B.D.: 1H, 13C and 15 N random coil NMR chemical shifts of the common amino acids. I Investigation of nearest-neigbor effects. J. Biomol. NMR 5, 67–81 (1995)
Xu, X.-P., Case, D.A.: Probing multiple effects on 15 N, 13Cα, 13Cβ and 13C′ chemical shifts in peptides using density functional theory. Biopolymers 65, 408–423 (2002)
Xu, X.-P., Case, D.A.: Automated prediction of 15 N, 13Cα, 13Cβ and 13C’ chemical shifts in proteins using a density functional database. J. Biomol. NMR 21, 321–333 (2001)
Parr, R.G., Yang, W.: Density functional theory of atoms and molecules. Oxford University Press, New York (1989)
Arnautova, Y.A., Vila, J.A., Martin, O.A., Scheraga, H.A.: What can we learn by computing 13Cα chemical shifts for X-ray protein models? Acta Crystallogr. D D65, 697–703 (2009)
Martin, O.A., Villegas, M.E., Vila, J.A., Scheraga, H.A.: Analysis of 13Cα and 13Cβ chemical shifts of cysteine and cystine residues in proteins: A quantum chemical approach. J. Biomol. NMR 46, 217–225 (2010)
Vila, J.A., Arnautova, Y.A.: Vorobjev and Scheraga HA. Assessing the fractions of tautomeric forms of the imidazole ring of histidine in proteins as a function of pH. Proc. Natl. Acad. Sci. U. S. A. 108, 5602–5607 (2011)
Vila, J.A., Ripoll, D.R., Scheraga, H.A.: Use of 13Cα chemical shifts in protein structure determination. J. Phys. Chem. B 111, 6577–6585 (2007)
Vila, J.A., Scheraga, H.A.: Factors affecting the use of 13Cα chemical shifts to determine, refine, and validate protein structures. Proteins: structure. Funct. Bioinformatics 71, 641–654 (2008)
Wüthrich, K.: NMR of Proteins and Nucleic Acids. Wiley, New York, NY, U. S. A. (1986)
Sun, H., Sanders, L.K., Oldfield, E.: Carbon-13 NMR shielding in the twenty common amino acids: comparisons with experimental results in proteins. J. Am. Chem. Soc. 124, 5486–5495 (2002)
Vila, J.A., Serrano, P., Wüthrich, K., Scheraga, H.A.: Sequential nearest-neighbor effects on computed 13Cα chemical shifts. J. Biomol. NMR 48, 23–30 (2010)
Martin, O.A., Vila, J.A., Scheraga, H.A.: CheShift-2: graphic validation of protein structures. Bioinformatics 28, 1538–1539 (2012)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: Protein Data Bank Nucleic Acids Res. 28, 235–242 (2000)
Brünger, A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros, P., Grosse-Kunstleve, R.W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N.S., Read, R.J., Rice, L.M., Simonson, T., Warren, G.L.: Crystallography and NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr D 54, 905–921 (1998)
Brünger, A.T.: Version 1.2 of the Crystallography and NMR system. Nat. Protoc. 2, 2728–2733 (2007)
Cavalli, A., Salvatella, X., Dobson, C.M., Vendruscolo, M.: Protein structure determination from NMR chemical shifts. Proc. Natl. Acad. Sci. U.S.A. 104, 9615–9620 (2007)
Cornilescu, G., Marquardt, J.L., Ottiger, M., Bax, A.: Validation of protein structure from anisotropic carbonyl chemical shifts in a dilute liquid crystalline phase. J. Am. Chem. Soc. 120, 6836–6837 (1998)
Frank, A., Onila, I., Moller, H.M., Exner, T.E.: Toward the quantum chemical calculation of nuclear magnetic resonance chemical shifts of proteins. Proteins 79(2189), 2202 (2011)
Guerry, P., Herrmann, T.: Advances in automated NMR protein structure determination. Q. Rev. Biophys. 44, 257–309 (2011)
Güntert, P.: Structure calculation of biological macromolecules from NMR data. Q. Rev. Biophys. 31, 145–237 (1998)
Güntert, P.: Automated structure determination from NMR spectra. Eur. Biophys. J. 38, 129–143 (2009)
Güntert, P., Braun, W., Wüthrich, K.: Efficient computation of threedimensional protein structures in solution from nuclear magnetic resonance data using the program DIANA and the supporting programs CALIBA, HABAS and GLOMSA. J. Mol. Biol. 217, 517–530 (1991)
Rosato, A., Aramini, J.M., Arrowsmith, C., Bagaria, A., Baker, D., Cavalli, A., Doreleijers, J.F., Eletsky, A., Giachetti, A., Guerry, P., et al.: Blind testing of routine, fully automated determination of protein structures from NMR data. Structure 20, 227–236 (2012)
Rosato, A., Bagaria, A., Baker, D., Bardiaux, B., Cavalli, A., Doreleijers, J.F., Giachetti, A., Guerry, P., Guntert, P., Herrmann, T., et al.: CASDNMR: critical assessment of automated structure determination by NMR. Nat. Methods 6, 625–626 (2009)
Némethy, G., Gibson, K.D., Palmer, K.A., Yoon, C.N., Paterlini, G., Zagari, A., Rumsey, S., Scheraga, H.A.: Energy parameters in polypeptides. 10. Improved geometrical parameters and nonbonded interactions for use in the ECEPP/3 algorithm, with application to praline-containing peptides. J. Phys. Chem. 96, 6472–6484 (1992)
Frisch, M.J., Trucks, G.W., Schlegel, H.B., Scuseria, G.E., Robb, M.A., Cheeseman, J.R., Zakrzewski, V.G., Montgomery, J.A., Jr Stratmann, R.E., Burant, J.C., et al.: Gaussian 03, Revision E.01, Gaussian, Inc., Wallingford CT (2003)
Chesnut, D.B., Moore, K.D.: Locally dense basis-sets for chemical-shift calculations. J. Comp. Chem. 10, 648–659 (1989)
Jameson, A.K., Jameson, C.J.: Gas-phase 13C chemical shifts in the zero-pressure limit: Refinements to the absolute shielding scale for 13C J. Chem. Phys. Lett. 134, 461–466 (1997)
Vásquez, M., Scheraga, H.A.: Variable-target-function and buildup procedures for the calculation of protein conformation—application to bovine pancreatic trypsin-inhibitor using limited simulated nuclear magnetic-resonance data. J. Biomol. Struct. Dyn. 5, 757–784 (1988)
Kruskal Jr., J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7, 48–50 (1956)
Li, Z., Scheraga, H.A.: Monte Carlo minimization approach to the multiple minima problem in protein folding. Proc. Natl. Acad. Sci. U. S. A. 84, 6611–6615 (1987)
Li, Z., Scheraga, H.A.: Structure and free energy of complex thermodynamic systems. J. Molec. Str. (Theochem) 179, 333–352 (1998)
Arnautova, Y.A., Jagielska, A., Scheraga, H.A.: A new force field (ECEPP05) for peptides proteins and organic molecules. J. Phys. Chem. B 110, 5025–5044 (2006)
Vila, J., Williams, R.L., Vásquez, M., Scheraga, H.A.: Empirical solvation models can be used to differentiate native from near-native conformations of bovine pancreatic trypsin inhibitor Proteins: structure. Funct. Genet. 10, 199–218 (1991)
Ripoll, D.R., Ni, F.: Refinement of the thrombin-bound structure of a hirudin peptide by a restrained electrostatically driven monte-carlo method. Biopolymers 32, 359–365 (1992)
Vorobjev, Y.N., Scheraga, H.A.: A fast adaptive multigrid boundary element method for macromolecule electrostatic computations in solvent. J. Comp. Chem. 18, 569–583 (1997)
Vorobjev, Y.N., Vila, J.A., Scheraga, H.A.: FAMBE-pH: a fast and accurate method to compute the total solvation free energies of proteins. J. Phys. Chem. B 112, 11122–11136 (2008)
Ripoll, D.R., Vorobjev, Y.N., Liwo, A., Vila, J.A., Scheraga, H.A.: Coupling between folding and ionization equilibria: Effects of pH on the conformational preferences of polypeptides. J. Mol. Biol. 264, 770–783 (1996)
Vila, J.A., Ripoll, D.R., Arnaturova, Y.A., Vorobjev, Y.N., Scheraga, H.A.: Coupling between conformation and proton binding in proteins. Proteins 61, 56–68 (2005)
Sitkoff, D., Sharp, K.A., Honig, B.: Accurate calculation of hydration free energies using macroscopic solvent models. J. Phys. Chem. 98, 1978–1988 (1994)
Barth, P., Alber, T., Harbury, P.B.: Accurate, conformation-dependent predictions of solvent effects on protein ionization constants. Proc. Natl. Acad. Sci. U. S.A. 104, 4898–4903 (2007)
Hass, M.A.S., Hansen, D.F., Christensen, H.E.M., Led, J.J., Kay, L.E.: Characterization of conformational exchange of a histidine side chain: protonation, rotamerization, and tautomerization of His61 plastocyanin from Anabaena variabilis. J. Am. Chem. Soc. 130, 8460–8470 (2008)
Serrano, P., Johnson, M.A., Chatterjee, A., Neuman, B., Joseph, J.S., Buchmeier, M.J., Kuhn, P., Wüthrich, K.: NMR structure of the nucleic acid-binding domain of the SARS coronavirus nonstructural protein 3. J. Virol. 83, 12998–13008 (2009)
Schwarzinger, S., Kroon, G.J.A., Foss, T.R., Chung, J., Wright, P.E., Dyson, H.J.: Sequence-dependent correction of random coil NMR chemical shifts. J. Am. Chem. Soc. 123, 2970–2978 (2001)
Wang, Y., Jardetzky, O.: Investigation of the neighboring residue effects on protein chemical shifts. J. Am. Chem. Soc. 12, 14075–14084 (2002)
Vijay-Kumar, S., Bugg, C.E., Cook, W.J.: Structure of ubiquitin refined at 1.8 Å resolution. J. Mol. Biol. 194, 531–544 (1987)
Quirt, A.R., Lyerla Jr., J.R., Peat, I.R., Cohen, J.S.: Reynolds WF and freedman MH Carbon-13 nuclear magnetic resonance titration shifts in amino acids. J. Am. Chem. Soc. 96, 570–574 (1974)
Rabenstein, D.L., Sayer, T.L.: Carbon-13 shifts parameters for amines, carboxylic acids and amino acids. J. Magn. Res. 24, 27–39 (1976)
Sayer, T.L., Rabenstein, D.L.: Nuclear magnetic resonance studies of the acid-base chemistry of amino acids and peptides. III Determination of the microscopic and macroscopic acid dissociation constants of α, ω-diaminocarboxylic acids Can. J. Chem. 54, 3392–3400 (1976)
Surprenant, H.L., Sarneski, J.E., Key, R.R., Byrd, J.T., Reilley, C.N.: Carbon-13 studies of amino acids: chemical shifts, protonation shifts, microscopic protonation behavior. J. Magn. Res. 40, 231–243 (1980)
Lindorff-Larsen, K., Best, R.B., Depristo, M.A., Dobson, C.M., Vendruscolo, M.: Simultaneous determination of protein structure and dynamics. Nature 433, 128–132 (2005)
Chakrabarti, P., Pal, D.: Main-chain conformational features at different conformations of the side-chains in proteins. Protein Eng. 11, 631–647 (1998)
Dumbrack Jr., R.L., Karplus, M.: Conformational analysis of the backbone-dependent rotamer preferences of protein sidechains. J. Mol. Biol. 230, 543–574 (1993)
Chothia, C., Levitt, M., Richardson, D.: Structure of proteins: packing of α-helices and β-sheets. Proc. Natl. Acad. Sci. U. S. A. 74, 4130–4134 (1977)
Chou, K.-C., Pottle, M., Némethy, G., Ueda, Y., Scheraga, H.A.: Structure of β sheets. Origin of the right handed twist and of the increased stability of antiparallel over parallel sheets. J. Mol. Biol. 162, 89–112 (1982)
Chou, K.-C., Scheraga, H.A.: Origin of the right handed twist of β sheets of poly(L Val) chains. Proc. Natl. Acad. Sci. USA 79, 7047–7051 (1982)
Creighton, T.E.: Proteins: Structure and Molecular Properties, pp. 186, 223. W.E. Freeman and Company, New York (1984)
Karplus, M.: Contact electron-spin coupling of nuclear magnetic moments. J. Chem. Phys. 30, 11–15 (1959)
Mandel, M.: Proton Magnetic resonance spectra of some proteins: I. Ribonuclease, oxidized ribonuclease, lysozyme, and cytochrome c. J. Biol Chem. 240, 1586–1592 (1965)
Bradbury, J.H., Scheraga, H.A.: Structural studies of ribonuclease. XXIV. The application of nuclear magnetic resonance spectroscopy to distinguish between the histidine residues of ribonuclease. J. Am. Chem. Soc. 88, 4240–4246 (1966)
Bachovchin, W.W.: 15 N NMR spectroscopy of hydrogen-bonding interactions in the active site of serine proteases: evidence for a moving histidine mechanism. Biochemistry 25, 7751–7759 (1986)
Cheng, F., Sun, H., Zhang, Y., Mukkamala, D., Oldfield, E.: A solid state 13C NMR, crystallographic, and quantum chemical investigation of chemical shifts and hydrogen bonding in histidine dipeptides. J. Am. Chem. Soc. 127, 12544–12554 (2005)
Farr-Jones, S., Wong, W.Y.L., Gutheil, W.G., Bachovchin, W.W.: Direct observation of the tautomeric forms of histidine in 15 N NMR spectra at low temperatures. Comments on intramolecular hydrogen bonding on tautomeric equilibrium. J. Am. Chem. Soc. 115, 6813–6819 (1993)
Harbison, G., Herzfeld, J.: Griffin RGJ Nitrogen-15 chemical shifts tensors in L-histidine hydrochloride monohydrate. J. Am. Chem. Soc. 103, 4752–4754 (1981)
Hass, M.A.S., Yilmaz, A., Christensen, H.E.M., Led, J.J.: Histidine side-chain dynamics and protonation monitored by 13C CPMG NMR relaxation dispersion. J. Biomol. NMR 44, 225–233 (2009)
Hu, F., Wenbin, L., Hong, M.: Mechanism of proton conduction and gating in influenza M2 proton channels from solid-state NMR. Science 330, 505–508 (2010)
Jensen, M.R., Has, M.A.S., Hansen, D.F., Led, J.J.: Investigating metal-binding in proteins by nuclear magnetic resonance. Cell. Mol. Life Sci. 64, 1085–1104 (2007)
Markley, J.L.: Observation of histidine residues in proteins by means of nuclear magnetic resonance spectroscopy. Acc. Chem. Res. 8, 70–80 (1974)
Meadows, D.H., Jardetzky, O., Epand, R.M., Ruterjans, H.H., Scheraga, H.A.: Proc. Natl. Acad. Sci. U.S.A. 60, 766–772 (1968)
Pelton, J.G., Torchia, D.A., Meadow, N.D., Roseman, S.: Tautomeric states of the active-site histidine of phosphorylated and unphosphorylated IIIGlc, a signal-transducing protein from Escherichia coli, using two-dimensional heteronuclear NMR techniques ProtSci 2, 543–558 (1993)
Reynolds, W.F., Peat, I.R., Freedman, M.H., LyerlaJr, J.R.: Determination of the tautomeric form of the imidazole ring of L-Histidine in basic solution by carbon-13 magnetic resonance spectroscopy. J. Am. Chem. Soc. 95, 328–331 (1973)
Schuster, I.I., Roberts, J.D.: Nitrogen-15 nuclear magnetic resonance spectroscopy. Effects of hydrogen bonding and protonation on nitrogen chemical shifts in imidazoles. J. Org. Chem. 44, 3864–3867 (1979)
Shimba, N., Serber, Z., Lewidge, R., Miller, S.M., Craik, C.S., Dotsch, V.: Quantitative identification of the protonation state of histidine in vitro and in vivo. Biochem 42, 9227–9234 (2003)
Shimba, N., Takahashi, H., Sakakura, M., Fuji, I., Shimada, I.: Determination of protonation and deprotonation forms and tautomeric states of histidine residues in large proteins using nitrogen-carbon J couplings in imidazole ring. J. Am. Chem. Soc. 120, 10988–10989 (1998)
Steiner, T.: L-Histidyl-L-alanine dehydrate. Acta. Cryst. C 52, 2554–2556 (1996)
Steiner, T., Koellner, G.: Coexistence of both histidines tautomers in the solid state and stabilization of the unfavorable Nδ-H form by intramolecular hydrogen bonding: rystalline L-His-Gly hemihydrates. Chem. Commun. 13, 1207–1208 (1997)
Strohmeier, M., Stueber, D., Grant, D.M.: Accurate 13C and 15 N chemical shift and 14 N quadrupolar coupling constant calculations in amino acid crystals: Zwitterionic, hydrogen-bonded systems. J. Phys. Chem. A 107, 7629–7642 (2003)
Sudmeier, J.L., Bradshaw, E.M., Coffman Haddad, K.E., Day, R.M., Thalhauser, C.J., Bullock, P.A., Bachovchin, W.W.: Identification of histidine tautomers in proteins by 2D 1H/13Cδ2 one-bond correlated NMR. J. Am. Chem. Soc. 125, 8430–8431 (2003)
Wüthrich, K.: NMR in Biological Research: Peptides and Proteins. North-Holland, Amsterdam (1976)
Ulrich, E.L., Akutsu, H., Doreleijers, J.F., Harano, Y., Ioannidis, Y.E., Lin, J., Livny, M., Mading, S., Maziuk, D., Miller, Z., Nakatani, E., Schulte, C.F., Tolmie, D.E., Wenger, R.K., Yao, H., Markley, J.L.: BioMagResBank nucleic. Acids Res. 36, D402–D408 (2008)
Demchuk, E., Wade, R.C.: Improving the continuum dielectric approach to calculating pKas of ionizeable groups in proteins. J. Phys. Chem. 100, 17373–17387 (1996)
DePristo, M.A., de Bakker, P.I.W., Blundell, T.L.: Heterogeneity and inaccuracy in protein structures solved by X-ray crystallography. Structure 12, 831–838 (2004)
Ringe, D., Petsko, G.A.: Study of protein dynamics by X-ray diffraction Methods in Emzymology 131, 389–433 (1986)
Furnham, N., Blundell, T.L., DePristo, M.A., Terwilliger, T.C.: Is one solution good enough? Nature Struct. Mol. Biol. 13, 184–185 (2006)
Wang, Y., Jardetzky, O.: Probability-based protein secondary structure identification using combined NMR chemical-shift data. Prot Sci 11, 852–861 (2002)
Höfinger, S., Almeida, B., Hansmann, U.H.E.: Parallel tempering molecular dynamics folding simulation of a signal peptide in explicit water. Proteins 68, 662–669 (2007)
Jang, S., Kim, E., Pak, Y.: Free energy surfaces of miniproteins with a beta beta alpha motif: replica exchange molecular dynamics simulation with an implicit solvation model. Proteins 62, 663–671 (2006)
Mohanty, S., Hansmann, U.H.E.: Folding of proteins with diverse folds. Biophy. J. 91, 3573–3578 (2006)
Zhou, R.: Free energy landscape of protein folding in water: Explicit versus implicit solvent. Proteins 53, 148–161 (2003)
Santiveri, C.M., Santoro, J., Rico, M., Jiménez, M.A.: Factors involved in the stability of isolated beta-sheets: turn sequence, beta-sheet twisting, and hydrophobic surface burial. Prot. Sci. 13, 1134–1147 (2004)
Zhao, D., Jardetzky, O.: An assessment of the precision and accuracy of protein structures determined by NMR–dependence on distance errors. J. Mol. Biol. 239, 601–607 (1994)
Korzhnev, D.M., Orekhov, V.Y., Arseniev, A.S.: Model-free approach beyond the borders of its applicability. J. Mag. Res. 127, 184–191 (1997)
Palmer III, A.G.: NMR characterization of the dynamics of biomacromolecules. Chem. Rev. 104, 3623–3640 (2004)
Case, D.A., Darden, T.A., Cheatham, T.E., III, Simmerling, C.L., Wang, J., Duke, R.E., Luo, R., Merz, K.M., Wang, B., Pearlman, D.A., et al.: AMBER 8 University of California, San Francisco (2004)
Zhou, Y., Vitkup, D., Karplus, M.: Native proteins are surface-molten solids: Application of the Lindemann criterion for the solid versus liquid state. J. Mol. Biol. 285, 1371–1375 (1999)
Kuzin, A.P., Su M., Seetharaman, J., Janjua, H., Cunningham, K., Maglaqui, M., Owens, L.A., Zhao, L., Xiao, R., Baran, M.C., Acton, T.B., Rost, B., Montelione, G.T., Hunt, J.F., Tong, L.: Crystal structure of UPF0291 protein ynzC from Bacillus subtilis at resolution 2.0 A. (2008) Northeast Structural Genomics Consortium target SR384. https://doi.org/10.2210/pdb3bhp/pdb
Kawai, Y., Moriya, S., Ogasawara, N.: Identification of a protein YneA, responsible for cell division suppression during the SOS response in Bacillus subtilis. Mol. Microbiol. 47, 1113–1122 (2003)
Aramini, J.M., Sharma, S., Huang, Y.J., Swapna, G.V.T., Ho, C.K., Shetty, K., Cunningham, K., Ma, L.-C., Zhao, L., Owens, L.A., Jiang, M., Xiao, R., Liu, J., Baran, M.C., Acton, T.B., Rost, B., Montelione, G.T.: Solution NMR structure of the SOS response protein YnzC from Bacillus subtilis Proteins: Structure. Funct. Bioinformatics 72, 526–530 (2008)
Vila, J. A., Baldoni, H. A., Scheraga, H. A.: performance of density functional models to reproduce observed 13Cα chemical shifts of proteins in solution. J. Comp. Chem. 38, 884–892 (2008b)
Sippl, M.J.: Recognition of errors in three-dimensional structures of proteins. Proteins 17, 355–362 (1993)
Kleywegt, G.J.: On vital aid: the why, what and how of validation Acta. Cryst, D 65, 134–139 (2009)
Sevcik, J., Dauter, Z., Lamzin, V.S., Wilson, K.S.: Ribonuclease from streptomyces aureofaciens at atomic resolution. Acta Cryst D D52, 327–344 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Vila, J.A., Arnautova, Y.A. (2019). 13C Chemical Shifts in Proteins: A Rich Source of Encoded Structural Information. In: Liwo, A. (eds) Computational Methods to Study the Structure and Dynamics of Biomolecules and Biomolecular Processes. Springer Series on Bio- and Neurosystems, vol 8. Springer, Cham. https://doi.org/10.1007/978-3-319-95843-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-95843-9_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95842-2
Online ISBN: 978-3-319-95843-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)