13C Chemical Shifts in Proteins: A Rich Source of Encoded Structural Information

Vila, Jorge A.; Arnautova, Yelena A.

doi:10.1007/978-3-319-95843-9_20

Jorge A. Vila^3,4 &
Yelena A. Arnautova⁵

Part of the book series: Springer Series on Bio- and Neurosystems ((SSBN,volume 8))

1411 Accesses

Abstract

Despite the formidable progress in Nuclear Magnetic Resonance (NMR) spectroscopy, quality assessment of NMR-derived structures remains as an important problem. Thus, validation of protein structures is essential for the spectroscopists, since it could enable them to detect structural flaws and potentially guide their efforts in further refinement. Moreover, availability of accurate and efficient validation tools would help molecular biologists and computational chemists to evaluate quality of available experimental structures and to select a protein model which is the most suitable for a given scientific problem. The ¹³C^α nuclei are ubiquitous in proteins, moreover, their shieldings are easily obtainable from NMR experiments and represent a rich source of encoded structural information that makes ¹³C^α chemical shifts an attractive candidate for use in computational methods aimed at determination and validation of protein structures. In this chapter, the basis of a novel methodology of computing, at the quantum chemical level of theory, the ¹³C^α shielding for the amino acid residues in proteins is described. We also identify and examine the main factors affecting the ¹³C^α-shielding computation. Finally, we illustrate how the information encoded in the ¹³C chemical shifts can be used for a number of applications, viz., from protein structure prediction of both α-helical and β-sheet conformations, to determination of the fraction of the tautomeric forms of the imidazole ring of histidine in proteins as a function of pH or to accurate detection of structural flaws, at a residue-level, in NMR-determined protein models.

You have full access to this open access chapter, Download chapter PDF

13C Chemical Shifts in Proteins: A Rich Source of Encoded Structural Information

An Overview on Protein Structure Determination by NMR: Historical and Future Perspectives of the use of Distance Geometry Methods

Quantitative Protein Disorder Assessment Using NMR Chemical Shifts

1 Introduction

Before a protein structure can be analyzed in light of its biological function it is necessary to validate it, i.e., to have a clear understanding of its reliability in terms of both the overall structure and of its details at per-residue level. However, an accurate and fast validation of protein structures constitutes a long-standing problem in Nuclear Magnetic Resonance (NMR) spectroscopy [1–4]. For this reason, investigators have proposed a plethora of methods to determine the accuracy and reliability of protein structures in recent years [5–12]. Despite this progress, there is a growing need for more sophisticated, physics-based and fast structure-validation methods [1, 2, 6, 7, 11].

The ¹³C^α chemical shifts provide important information about conformations of peptides and proteins in solution [13–39] and, therefore, can be used as an exquisitely sensitive probe with which to assess the quality of protein models. We developed recently a new, physics-based methodology [34], that makes use of observed and computed {at the Density-functional theory (DFT) level of theory [40]} ¹³C^α chemical shifts for an accurate validation of protein structures in solution and in crystal [41]. The first step in the development of this new methodology involved determining the factors that affect ¹³C^α shielding calculations, such as the protonation/deprotonation state of distant ionizable groups, sequential nearest-neighbor or covalent geometry effects (i.e., due to variations in the bond lengths and bond angles of residues) and the sensitivity of the shielding/deshielding of ¹³C^α nuclei to changes in side-chain conformation. Once all these factors affecting ¹³C^α-shielding have been properly identified and considered, a very important test is to determine the accuracy and speed of the computation of the ¹³C^α-shielding as a function of the size of the basis set chosen and the Density Functional Theory (DFT) model adopted. These are important tests because DFT-based quantum mechanical (QM) calculations are very CPU demanding, despite the ever-increasing computational power available.

The new DFT-based method has been applied to study a number of problems, such as unblocked statistical-coil tetrapeptides in aqueous solution [32], polyproline II helix conformation in a proline-rich environment [31], the ¹³C^α and ¹³C^β chemical shifts of cysteines in disulfide-bonded cysteine [42] or determination of the fraction of the tautomeric forms of histidine in proteins as a function of pH [43]. This new strategy also provides a unified, self-consistent method to determine high-quality protein structures, without relying on knowledge-based information [44]. Thus, a β-sheet or an all α-helical protein structure can be accurately determined by simply identifying a set of conformations which simultaneously satisfy a number of constraints, namely ¹³C^α-dynamically-derived torsional angle constraints and Nuclear Overhauser Effect (NOE) derived distance constraints [29, 44].

The currently used ¹³C^α chemical shift-based validation and determination protocol [29, 33, 44, 45, 34] exploits the following features: (a) the assignment of chemical shifts is a fundamental step in a protein structure determination by NMR spectroscopy [46], and no extra experimental work is needed; (b) in addition to the impact of the covalent structure, ¹³C^α chemical shifts are modulated mainly by the intraresidue backbone and side-chain dihedral angles [16, 17, 19, 20–22, 27, 47, 35, 39], with no significant influence of the amino acid sequence [48]; (c) ¹³C^α is ubiquitous in proteins; and, (d) ¹³C^α chemical shifts can be computed with high accuracy at the QM level of theory.

This chapter is intended to be an overview of the author’s contribution to the field of protein structure determination and validation using, mainly, information decoded from the ¹³C^α chemical shifts. Consequently, the chapter is organized as follows: first, the method used to compute the ¹³C^α chemical shifts and to analyze the results are briefly described; second, the main factors affecting the ¹³C^α chemical shifts computation are enumerated and discussed; third, the capabilities of the computed ¹³C^α chemical shifts, as a rich source of encoded structural information, are illustrated by a series of applications that involves, but is not limited to, the determination of protein structures; and finally a new protein-structure validation server, CheShift-2 [49], with which NMR spectroscopists can assess the quality of their protein models, before they are deposited in the Protein Data Bank (PDB) [50], is presented. It is worth noting that the theory, and details, behind alternative protein structure determination and validation methods are not discussed here and, hence, the reader is referred instead to an extensive collection of such methods [1, 5–12, 26, 51–61].

2 Methods

2.1 Calculation of ¹³C^α Chemical Shifts

All the experimentally determined conformations, unless noted otherwise, were regularized, i.e., all residues were replaced by the standard Empirical Conformational Energy Program for Peptides (ECEPP) [62] residues in which bond lengths and bond angles are fixed (rigid-body geometry approximation) at the standard values [62] and hydrogen atoms were added, if necessary.

Computations of the ¹³C^α chemical shifts involve a series of approximations. For each amino acid residue X in the protein sequence: (a) the ¹³C^α shielding depends, mainly, on its own backbone conformations [21, 27] and side-chain [19, 20, 35], with no significant influence of either the amino acid sequence or the position of the given residue in the sequence, except for residues preceding proline [48]; (b) each amino acid residue X in the protein sequence can be treated as a terminally-blocked tripeptide with the sequence Ac-GXG-NMe, with X in the conformation of the protein structure; (c) the ¹³C^α isotropic shielding values (σ) for each amino acid residue X can be computed at the OB98/6-311 + G(2d,p) level of theory [28] with the Gaussian 03 package [63]. The remaining residues in each tripeptide are treated at the OB98/3-21G level of theory, i.e., by using the locally-dense basis set approach [64]; (d) all ionizable residues can be considered neutral during the QM calculations [45], unless noted otherwise; (e) no geometry optimization is necessary because such optimization by ab initio (HF) or DFT methods has only a small effect on the computed chemical shifts [19].

The computed ¹³C^α shieldings (σ_{subst, th}) are converted to ¹³C^α chemical shifts (δ) by employing the equation δ_th = σ_ref – σ_{subst, th} where the indices denote a theoretical (th) computation, the reference substance (ref), and the substance of interest (subst), i.e., the ¹³C^α shielding of a given amino acid residue X. The observed shielding value of tetramethylsilane (TMS) in the gas phase [65], namely 188.1 ppm, was adopted as an initial (see below) reference value. All the computed ¹³C^α shielding (σ_{subst, th}) values are calculated using the Gauge-Invariant Atomic Orbital method at the DFT level of theory as implemented in the GAUSSIAN 03/09 suite of programs (Frisch et al., 2003). For all purposes, in this chapter, we have used only one exchange-correlation functional, OB98, because it was shown [30] to be one of the most accurate and fast functionals with which to reproduce the observed ¹³C^α chemical shifts of proteins in solution (see Sect. 3.2).

2.2 Determination of an Effective TMS Shielding Value

Determination of a proper TMS shielding value for each functional is crucial for an accurate computation of the ¹³C^α chemical shifts because it will enable us to minimize the presence of systematic errors which might bias the chemical shifts-based analysis. From this point of view the effective TMS value will provide the most accurate approach to solve the problem because it will not require further adjustments. Consequently computation of an effective TMS values is central to our calculations.

By adopting the observed TMS value of 188.1 ppm (Jameson and Jameson, 1987) as a reference it is possible to find for any functional, the characteristic mean (x_o) and standard deviation (σ) of the Normal (or Gaussian) fit of the frequency of the errors distribution. For all functionals tested in our work the characteristic mean value (x_o) appears displaced from its ideal value of 0.0 by a positive, or negative, amount, e.g., for OB98 a x_o = + 3.6 ppm was found. Further analysis [30] indicates that for any of the 10 functionals tested a straightforward use of the observed TMS shielding value (188.1 ppm) is not appropriate, if no further corrections are introduced. Hence, for each functional and basis set chosen it is feasible to find an ‘effective’ TMS shielding value for which the Normal (or Gaussian) fit shows a zero displacement, i.e., an effective TMS value that gives a x_o = 0.0. For example, use of OB98 with a large [6-311 + G(2d,p)/3-21G] basis set leads to an effective TMS of 184.5 ppm, i.e., by subtracting 3.6 ppm from 188.1 ppm [30], that gives a x_o = 0.0 ppm. Likewise, use of a small (6-31G/3-21G) basis set leads to an effective TMS of 195.4 ppm.

2.3 Computation of the Ca-RMSD Model

The observed chemical shift for each residue i, ¹³C ^α_{observed,
i} , represents contributions from an ensemble of rapidly interconverting conformers that coexist in solution. Then, an accurate comparison between the observed and computed ¹³C^α chemical shifts requires consideration of an ensemble of NMR-derived conformers, rather than of a single conformation [41, 33]. Consequently, for each amino acid residue in the sequence, i, the average of the chemical shifts calculated for the individual residues in the ensemble of Ω conformers representing the NMR structure, < ¹³C^α > _i, is computed as:

$$ < {^{13}{\text{C}}^{\alpha }} >_{i} = \left( {1/\varOmega } \right)\sum\limits_{k = 1}^{\varOmega } {^{13} {\text{C}}^{\alpha }_{i, \, k} ,} $$

(1)

where ¹³C ^α_{i, k} is the computed chemical shift for residue i in conformer k, with 1 ≤ i ≤ N, where N is the number of residues in the sequence. Derivation of Eq. (1) was obtained through the following approximation: for each residue i the quantity to be computed must, in principle, be $ {<} {^{13} {\text{C}}^{\alpha }} {>}_{i} = \sum\nolimits_{k = 1}^{\varOmega } {\lambda_{k}^{13} {\text{C}}^{\alpha }_{i,k} } $, where λ_k is the Boltzmann factor for conformer k, with $ \sum\nolimits_{k = 1}^{\Omega } {\lambda_{k} } \equiv 1 $. But, computation of the Boltzmann factors at QM level of theory is not possible, with the existing computational facilities, because it would require computation of the total energy at the QM level of theory for each of the conformers in the ensemble used to represent the NMR structure. Therefore, the following approximation was used: λ_k = 1/Ω [48]; in other words, in this approximation each conformer contributes equally to the average chemical shift obtained by fast conformational averaging. Whether a computation of a Boltzmann average, rather than the arithmetic average, would lead to a more accurate representation of the ¹³C^α chemical shifts needs further investigation.

The < ¹³C^α > _i value obtained from Εq. (1) is used to compute the conformational-average difference Δ_i between the observed and computed ¹³C^α chemical shifts for each amino acid residue i,

$$ \Delta_{i} = \left( {{}^{13}C_{observed,i}^{\alpha } - < {}^{13}C_{{}}^{\alpha } >_{i} } \right) $$

(2)

Hereafter, the conformational-average root-mean-square-deviation (rmsd) parameter, ca-rmsd [48], is obtained as:

$$ ca - rmsd = [\left( {1/N} \right)\sum\limits_{i = 1}^{N} {\Delta_{ \, i}^{2} } ]^{1/2} , $$

(3)

which is a global property of the protein NMR structure given as the weighted average of the differences between the experimental ¹³C^α chemical shifts and the < ¹³C^α > _i—values for all the residues in the protein.

2.4 ¹³C^α-Based Protein Structure Determination Method

The ¹³C^α-based procedure used for determination of protein structures consists of three steps. The flow chart of this protocol [44] is shown in Fig. 1 and a brief description of each step follows.

Step 1: The Variable-Target-Function (VTF) approach with a simplified soft-sphere potential function [66] is used to generate an ensemble of conformations at random that simultaneously satisfy a set of long-range distance constraints derived from the experimental NOEs and (φ, ψ) torsional constraints, derived from the observed ¹³C^α and ¹³C^β conformational shifts [27]. The derived torsional constraints are only for those amino acids residues in the sequence that pertain to a regular structure, i.e., to a α-helix or β-sheet. Consequently, these (φ,ψ)_α,β torsional constraints (shown in Fig. 1) are limited to, on average, ~50% of the amino acids residues in proteins because the remaining ones populate non-regular structures.

Then, a clustering procedure, e.g., the Minimal Spanning Tree method [67], is used to select a small sub-set of the total number of the VTF-derived conformations, namely those possessing a maximum NOE-derived distance violation lower than some arbitrary fixed value. For each of these conformations the ¹³C^α chemical shifts are computed as described in Sect. 2.1. Examination of the chemical shifts of all the amino acids in the ensemble of conformations enables us to identify the amino acid at each position in the sequence whose computed chemical shifts most closely match the observed ones, among all these conformations. This identified set of individual amino acid conformations corresponds to only one conformation of the whole chain: the ‘theoretical minimal-rmsd model’ [33]. In this model, the ¹³C^α chemical shift of each residue individually best matched the experimental one, thereby providing a new set of ϕ, ψ, and χ torsional angle constraints for all amino acid residues in the sequence, i.e., not just for the amino acid residues in regular structures. Because the chemical shifts are a multivalued function of the ϕ, ψ, and χ torsional angles, the set of torsional angles derived from the ‘theoretical minimal-rmsd model’ does not, necessarily, represent a unique solution to a given set of observed ¹³C^α chemical shifts values.

Step 2: Only one conformation among all the conformations produced in Step 1 is selected, for example, the conformation possessing the lowest rmsd between the computed and observed ¹³C^α chemical shifts. The selected conformation is used as a starting one in a new conformational search with the Monte Carlo with Minimization (MCM) method [68, 69]. The MCM search is carried out with two types of constraints: the original set of NOE-derived distance constraints and the new set of ϕ, ψ, χ torsional angles derived in Step 1. This time the conformational search is carried out using a complete force-field including the internal potential energy described by ECEPP/05 [70], the solvent free energy calculated by using a solvent-accessible surface area model [71], and an additional energy terms aimed at penalizing violations of the distance and torsional angle constraints [72]. Convergence of the determination protocol is monitored using the ca-rmsd between the computed and observed ¹³C^α chemical shifts.

Step 3: If the computed ca-rmsd is lower than certain, arbitrary chosen, cutoff value (ξ), then the procedure is ended. Otherwise, the Step 2 is repeated using a new set of (ϕ,ψ,χ) derived from the minimal-rmsd-model of the previous step.

It is worth noting that after our physics-based protocol was published [44] an alternative knowledge-based method that makes use of ¹H, ¹³C^α, ¹³C^β and ¹⁵N chemical shifts as restraints, was successfully applied to structure determination of several proteins [53]. A blind test of computational methods, included several that use also chemical shifts as restraints, aimed at fully automated determination of protein structures has been carried out recently [60].

2.5 Computation of the ¹³C^α Chemical Shifts as Function of the PH

For a given residue i, of a protein in a conformation k, the average charge distribution, <ρ_i,k> , could be determined by solving the Poisson equation by considering the 2^ξ ionization states, with ξ being the number of ionizable groups in the molecule. Regarding this problem, it is worth noting that ξ could be a large number because ~30% of all residues in a protein sequence are, on average, ionizable and, hence, an accurate solution would require a fast algorithm. Consequently, in all the applications mentioned in this chapter, we used the Multiple Boundary Element (MBE) method [73, 74], in which the free energy associated with the state of ionization of the ionizable groups at a fixed pH value, namely 6.5, is calculated with the general multi-site titration formalism [75, 76]. The charges and atomic radii from the PARSE (Parameters for Solvation Energy) algorithm [77] were used for the solvation free energy calculations using the MBE method, and the internal (ε_int) and solvent (ε_solv) dielectric constants of 2 and 80, respectively [76] were adopted for the calculations of <ρ_i,k> . The value of ε_int = 2 is consistent with the use of PARSE charges [78] and is also commonly assumed as an adequate representation of the protein interior. Following these approximations, for a given conformation k, the average degree of ionization of the ith ionizable group of this conformation is computed as:

$$ < \rho_{i,k} > = Z^{ - 1} \sum\limits_{n = 1}^{{2^{\xi } }} {\rho_{i,k}^{n} } [ - \Delta G(P_{k} ,x_{k}^{n} )/k_{B} T] $$

(4)

where Z is the partition function, k_B is the Boltzmann constant, T is the absolute temperature, $ x_{k}^{n} = (\rho_{1,k}^{n} , \ldots ,\rho_{i,k}^{n} , \ldots ,\rho_{N,k}^{n} ) $ with $ \rho_{i,k}^{n} $ = (1 or 0) is the nth protonation microstate of conformation k for protein P_k. $ \Delta G(P_{k} ,x_{n}^{k} ) $ is the free energy of ionization of the nth microstate of protein P_k in conformation k [75].

It should be noted that for any ionizable residue i of a single conformation k, Eq. (4) can lead to a non-integer average degree of charge, although we know that such non-integer charges do not make physical sense. Due to the Boltzmann nature of the averaged value computed by Eq. (4), a fractional charge should physically be interpreted as follows: for a given conformation k, there are many identical replicas of such a conformation in solution and, hence, a fractional charge computed by Eq. (4), e.g., 0.75, means that 75% of these replicas possess the ionizable group i protonated/deprotonated with an integral charge while the remaining 25% of the replicas possess the same ionizable group as deprotonated/protonated, depending on whether the ionizable group is basic or acidic.

Assuming that the protonation/deprotonation reactions are instantaneous on the NMR time scale, i.e., microsecond to millisecond [79], the theoretical ¹³C^α chemical shifts, $ \delta_{i}^{computed} (pH) $, for a given residue i in the sequence (except for histidine that possess 2 tautomers) are computed as a function of the pH using the following equation:

$$ \delta_{i}^{computed} (pH) = (1/\Omega )\sum\limits_{k = 1}^{\Omega } {\{ < \rho_{i,k} } > \delta^{ + ,i,k} + (1 - < \rho_{i,k} > )\delta^{0,i,k} \} $$

(5)

where δ^+,i,k and δ^0,i,k are the computed ¹³C^α chemical shifts, for the amino acid i in conformation k, with fully charged and neutral side chains, respectively, Ω is the number of conformers in the protein ensemble, and < ρ_i,k> the averaged degree of charge, as given by Eq. (4).

3 Factors Affecting the Calculation of ¹³C^α Chemical Shifts

3.1 Transferability of the Results

The current methodology [33, 34] relies on a crucial observation: once residue conformations are established by their interactions with the rest of the protein the ¹³C^α shielding of each residue depends, mainly, on its backbone and side-chain conformations, with no significant influence by the nature of the nearest-neighbor amino acids, except for residues immediately preceding proline [48].

The above observation allows us to parallelize the ¹³C^α shielding calculations in proteins and, hence, to make them computationally feasible. Moreover, a given set of accurately-determined amino acid residue conformations representing the accessible conformational space for all the 20 naturally occurring amino acids and showing a good distribution of side-chain conformations will constitute a reasonable ensemble with which to carry out tests of the current methodology. The results of these tests should be transferable to proteins of any class or size. Consequently, we used structures of three proteins solved by NMR and X-ray, namely PDB id 1D3Z, 2JVD and 1NS1 to evaluate the performance of different DFT functionals and basis sets, as explained below.

3.2 Performance of Different DFT Functionals to Reproduce Observed ¹³C^α Chemical Shifts

DFT has become a method of choice for QM calculations of the electronic structure and properties of many molecular and solid systems. Because the exact exchange-correlation functional is unknown, a large number of approximations has been proposed in the literature making it essential to pursue more accurate and reliable approximate functional, a process which, on the other hand, depends on the applications. Selection of the most appropriate density functional model for a particular application becomes one of the main problems of the DFT method. For this reason we decided [28] to test several density functional models (namely B3LYP, OLYP, PBE1PBE, OPBE, O3LYP, OPW91, OB98, BPW91, BPBE and B971). The benchmarking was intended to find not only the most accurate functional with which to reproduce the observed ¹³C^α chemical shifts in solutions but also the fastest one, in terms of CPU time, because speed of DFT calculations could severely limit their applicability to proteins. The test was applied to 10 NMR-derived conformations of the 76-residue α/β protein ubiquitin (PDB id 1D3Z).

Comparison of the observed and computed ¹³C^α chemical shifts shows that there are five functionals, namely OPW91, OB98, OPBE, OLYP, and O3LYP, which are among the faster ones and, even more importantly, behave very similarly in their ability to reproduce accurately the observed ¹³C^α chemical shifts. In particular, we observe that OB98 appears to be slightly better than any other of the five functionals in terms of both the correlation coefficient, R, (or Pearson coefficient) between the observed and the conformational-averaged ¹³C^α chemical shifts and the standard deviation of the computed conformational-averaged ¹³C^α chemical shifts from a linear regression. Consequently, we chose the OB98 for all the applications [30].

We also compared the results obtained using OB98 with those obtained with B3LYP, a very popular functional that has been used extensively in our group, and elsewhere. The correlation existing between averaged ¹³C^α chemical shift values obtained for the 10 conformations of 1D3Z with OB98 and B3LYP functional, is excellent [30], i.e., showing a correlation coefficient R = 0.998 and standard deviation of 0.300 ppm. This test provides solid evidence that the results and conclusions obtained using B3LYP do not need to be revised if the OB98 functional is adopted [30].

3.3 Performance of Different Basis Sets to Reproduce Observed ¹³C^α Chemical Shifts

To study the dependence of the accuracy and speed of DFT calculations of the ¹³C^α chemical shifts in proteins on the size of the basis set used, six basis sets, viz., 6-31G/3-21G, 6-31G(d)/3-21G, 6-311G(d, p)/3-21G, 6-311 + G(d, p)/3-21G, and 6-311 + G(2d,p)/3-21G locally-dense basis-set approximations, and uniform 3-21G/3-21G set were initially applied [28] to 10 NMR-derived conformations ubiquitin [54]. For each of these six basis sets, combined with the OB98 functional, the ¹³C^α shielding was computed for 760 amino acid residues by treating each amino acid X in the sequence as a terminally-blocked tripeptide with the sequence Ac-GXG-NMe in the conformation of the regularized experimental protein structure. Analysis of the results [28], in terms of the agreement between the computed and observed ¹³C^α chemical shifts shows that the accuracy with which the observed ¹³C^α chemical shifts are reproduced by using either the small basis set (6-31G/3-21G) or the larger basis set [6-311 + G(2d,p)/3-21G] is very similar, although, use of the small basis set leads to a significant decrease in computational time.

The results also indicates that the ¹³C^α chemical shifts computed with the large [6-311 + G(2d,p)/3-21G] basis set, can be reproduced accurately (within an average error of ~0.4 ppm) and faster (by ~9 times) by using the small (6-31G/3-21G) basis set after extrapolating it with: $ {}^{13}C^{\alpha } = - 1.597 + 1.040 \times {}^{ 1 3}C_{\mu }^{\alpha } $. In effect, the correlation existing between averaged ¹³C^α chemical shift values computed for the 32 conformations of 1NS1 with these two basis sets, is excellent [28], i.e., showing a correlation coefficient R = 0.999 and standard deviation of 0.284 ppm. Even more important, an analysis of the magnitude of the errors and their distribution carried out for Val and Arg hypersurfaces, constructed by calculating a grid of 6864 and 6794 points, respectively, corresponding to different combinations of the ϕ, ψ, χ1, and χ2 (only for Arg) torsional angles, indicates that ~70% of them are within ~0.6 ppm and that the most populated regions of the Ramachandran map are not affected by errors higher than ~1.0 ppm [28].

In conclusion, the described analysis enabled us to select the smaller basis set (6-31G/3-21G) that provides accuracy similar to that of a ‘basis set limit’ [6-311 + G(2d,p)/3-21G] to reproduce the computed chemical shifts, but at a significantly lower computational cost [28].

3.4 Effect of Sequential Nearest-Neighbors on the ¹³C^α Chemical Shifts Calculations

The ¹³C^α chemical shifts for a residue X in the model peptide Ac-G-X-G-NMe has always been computed [44, 34] considering that all the torsional angles of the residue X are exactly those of the residue in the protein conformation and that the surrounding Gly residues and the end-blocking groups are free to rotate. It is implicit in this approach that the ¹³C^α chemical shifts of residue X do not depend on the identity of the nearest-neighbor residues. This assumption needs to be proved.

The structure of the Nucleic Acid Binding (NAB) protein of the SARS coronavirus [80], a 116-residue α/β protein containing 9 Prolines (Pro) and with 50% of its residues in loops and turns, was chosen to further evaluate the origin of differences between computed and observed ¹³C^α chemical shifts, as well as to study the influence of the nearest-neighbor residues on the computed ³C^α chemical shifts.

The results [48] indicate that computation of the ¹³C^α chemical shifts of a given residue in the sequence of the NAB protein is not influenced significantly, i.e., within ~0.5 ppm, by the nature of the nearest-neighbor amino acids, except for residues immediately preceding proline (see Fig. 2a). For such residues, Pro must be considered during the computation of the ¹³C^α chemical shifts; otherwise, an overestimation of the computed ¹³C^α chemical shifts by about +1.7 ppm occurs. This finding is in good agreement with both the experimental evidence [36, 81, 82] and the empirical observations [37, 81]. It is equally important to emphasize the physical nature of this effect: “…an imide bond formed by an Xxx–Pro pairing is generally thought to be much less electron-withdrawing than an amide bond…” [37].

Overall, except for the Pro effects, use of the Ac-G-X-G-NMe model peptide for the computation of the ¹³C^α chemical shifts of residue X is a good approximation because the computed values are accurate within ±0.5 ppm for all residue-types, if neither the subsequent nor precedent residue-type effects are taken into account (see Fig. 2).

3.5 Rigid-Geometry Approximation and Accuracy of the Calculations of ¹³C^α Chemical Shifts

Experimental protein structures are often solved using force fields which allow variation of bond lengths and bond angles. However, it is known that QM calculations are very sensitive to bond lengths and bond angles [16]. Therefore, we have explored the dependence of the computed ¹³C^α-chemical shifts on the bond lengths and bond angles to establish whether a rigid- rather than non-rigid geometry approximation is a more accurate representation with which to compute the chemical shifts.

For this test, the structure of ubiquitin deposited in the PDB (PDB id 1UBQ) was chosen because it possesses non-regularized geometry and has been solved by X-ray diffraction at 1.8 Å resolution [83]. We have also examined the corresponding structure with regularized geometry, i.e., the one with all the residues replaced by the standard ECEPP residue geometry [62], named here as 1UBQ^regular. Analysis of the differences between the computed and observed ¹³C^α chemical shifts for the 1UBQ and 1UBQ^regular structures, leads to rmsd of 3.28 ppm and 2.38 ppm, respectively. The better agreement obtained with 1UBQ^regular, rather than 1UBQ, is consistent with the long-time recognition that the bond lengths and bond angles of both X-ray and NMR-derived structures are not as highly accurately defined as in studies of small molecules [16], with which the ECEPP geometry [62] has been parameterized. Further analysis of the agreement of the two ubiquitin structures with the deposited electron density data [83] of 1UBQ, in terms of the R-factor, leads to 19.2 and 23.1% for 1UBQ and 1UBQ^regular, respectively; while the all-heavy-atom rmsd between these two structures is 0.142 Å [34].

Overall, the use of regularized geometry, i.e., ECEPP geometry, is an accurate approximation with which to compute the ¹³C^α chemical shifts in proteins and, hence, is used in most of the application discussed in this chapter.

3.6 ¹³C^α Chemical Shifts as a Function of the Charge Distribution

Among the factors that affect ¹³C^α-shielding, which are important for an accurate computation of chemical shifts, is the sensitivity of ¹³C^α nuclei to the shielding/deshielding induced by changes in the protonation/deprotonation of distant ionizable groups [84–87]. However, these factors have not been taken into account explicitly in current computations of ¹³C^α chemical shifts in proteins at the QM level of theory because, usually, the calculations are carried out in the gas phase, and the ionizable residues are treated as neutral groups.

The question of whether the use of neutral, rather than charged, side chains is more accurate for computation of the ¹³C^α chemical shifts of ubiquitin, at a given fix pH, was investigated as follows [45]. For a given ionizable residue i in a conformation k, first, the average charge distribution, < ρ_i,k > , was computed by using Eq. (4), i.e., by explicit consideration of the 2^ξ ionization states for every conformation [75], with ξ being the number of ionizable groups in the molecule, namely 22; and second, the ¹³C^α chemical shifts as a function of the pH,$ \delta_{i}^{{}} (pH) $, were computed by using Eq. (5). This analysis was applied to 139 conformations of ubiquitin: 138 (10 conformations from PDB id 1D3Z plus 128 conformations from PDB id 1XQQ) NMR-derived conformations [54, 88], while the remaining one is an X-ray structure (PDB id 1UBQ) solved at 1.8 Å resolution [83].

Additionally, an extra set of 50 randomly generated conformations for each amino acid residue X, in the terminally-blocked tripeptide with the sequence Ac-GXG-NMe, with X being Lysine (Lys), Ornithine (Orn), Diaminobutyric acid (Dab), Glutamic acid (Glu) or Aspartic (Asp) acid, were also obtained. This set of randomly generated conformations was used to determine: (i) the range of shielding/deshielding of the ¹³C^α nucleus of free acidic/basic amino acid residues in solution, in their fully charged and neutral forms, respectively; (ii) how these ranges of shielding/deshielding variations compare with those derived from 3058 ionizable groups of the 139 conformations of the protein ubiquitin; and (iii) how the computed shielding/deshielding range of variations are influenced by the distance between the charged side-chain group and the ¹³C^α nucleus (for example, there are two chemical bonds in Asp, rather than three in Glu, separating the deprotonated carboxyl group from the ¹³C^α nucleus). To examine an analogous effect for a basic side-chain group, such as Lys, use was made of the non-natural amino acids Orn and Dab because, for these amino acids, the protonated amino group is separated from the ¹³C^α nucleus by four and three chemical bonds, rather than by five in Lys.

The results of this study [45], based on the analysis of 139 conformations of ubiquitin at pH 6.5, indicate that use of neutral, rather than charged, amino acids is a significantly better approximation of the observed ¹³C^α chemical shifts in solution for the acidic groups, and a slightly better representation, though significantly less expensive computationally, for the basic groups (see Fig. 3).

Additionally, our analysis of Lys, Orn and Dab revealed a significantly greater deshielding of the ¹³C^α nucleus (due to the deprotonation of the acidic groups) than the shielding due to the protonation of the basic groups. The origin of such a difference can be found in the distance between the ionizable groups and the ¹³C^α nucleus, which is shorter for the acidic than for the basic groups.

3.7 ¹³C^α Chemical Shifts as a Function of Side-Chain Flexibility

To what extent are the chemical shifts of the amino acid residues in a protein affected by the side-chain orientation? The basis for such a query arises from the fact that the three torsion angles ϕ, ψ and χ1 are not independent on each other over the whole range because they involve a common N-C^α bond [89, 90]. To find an answer to this question, the dependence of the ¹³C chemical shifts on side-chain orientation was investigated [35], at DFT level of theory, for two-strand antiparallel β-sheet model peptide with the amino acid sequence Ac-A₃-X-A₁₂-NH2 where X represents any of the 17 naturally-occurring amino acids considered here, i.e., not including alanine, glycine and proline. Because the majority of β-sheets are twisted, rather than planar, with a right-hand twist in the approximately ±30° range for the backbone dihedral angles [91–94] conformational parameters for β-sheets may deviate from those for planar pleated sheets and, hence, are difficult to model by using canonical values. The fact that β-sheets in proteins appear as parallel or antiparallel strands, or a combination of both, only exacerbates the modeling problem. For this reasons, the dihedral angles adopted for the backbone were taken, and kept fixed, from the experimental structure of an antiparallel β-sheet, specifically from the 16-residue segment (G41-G56) of the B3 binding domain of protein G (PDB id 1P7E).

For the 17 naturally occurring amino acids considered the analysis indicates that there is: (a) good agreement between computed and observed ¹³C^α and ¹³C^β chemical shifts, i.e., with correlations coefficient, R, of 0.95 and 0.99, respectively; (b) significant variability of the computed ¹³C^α and ¹³C^β chemical shifts as function of χ¹ for all 17 residues, except for Ser; and (c) a smaller compared to χ¹, although significant, dependence of the computed ¹³C^α chemical shifts of χ^ξ (with ξ ≥ 2) for 11 out of 17 residues.

The above results obtained by Villegas et al. [35] for an antiparallel (16-residue segment) β-sheet were later validated on a 76 residues α/β protein, i.e., by exploring the effects of side-chain conformation on the computed ¹³C^α chemical shifts [45]. This validation process involved an exhaustive conformational search, starting from an arbitrary selected conformation of the NMR-determined ubiquitin protein (PDB id 1D3Z), in which only the torsional angles of the side chains were allowed to vary, i.e., all backbone dihedral angles (ϕ, ψ, ω) were fixed at their corresponding observed values. Furthermore, the correlation coefficient, R, between computed, by using the Karplus equation [95], and observed vicinal coupling constants ³J_N-Cγ and ³J_C′-Cγ of 17 valine, threonine and Isoleucine residues, was used to check the accuracy of the side-chain conformational search.

The obtained results on an antiparallel β-sheet segment and the ubiquitin protein enabled us to determine the role and impact of a proper side-chain conformation for an accurate computation of the observed ¹³C^α chemical shifts in solution.

4 Use of the Structural Information Decoded from ¹³C Chemical Shifts

We have chosen three examples to illustrate how the structural information decoded from the observed ¹³C chemical shifts can be used in practice: (1) to determine the fraction of the tautomeric forms of the imidazole ring of histidine (His) in proteins as a function of pH, provided that the observed ¹³C^γ and ¹³C^δ2 chemical shifts and the protein structure, or the fraction of H⁺ form are known; (2) to determine either all α-helical or all β-sheet protein structures in solution; and (3) to assess the reliability of NMR-determined protein models before they are published or deposited in the PDB. Each of these applications is described in the following subsections.

4.1 The Importance of Being His

In 1965 Mandel [96], in a pioneering NMR experiment, detected the imidazole (C2) protons of histidine (His) residues in Ribonuclease A and in 1966, Bradbury and Scheraga [97], were able to distinguish between the histidine residues of Ribonuclease A, i.e., they resolved the NMR-peaks of three out of four histidines of this enzyme. Subsequently, use of NMR spectroscopy, X-ray crystallography and theoretical studies, based on QM calculations, have continuously evolved in their ability to determine properties of the histidine residues in solution and in the solid state [43, 79, 98–116]. The reason for this persistent interest in His is due to the fact that this residue is unique among all 20 naturally occurring amino acids because ~50% of all enzymes use His in their active sites [117]. This is, mainly, because of the versatility of imidazole His ring, which includes two neutral, chemically-distinct forms, referred to as N^δ1-Η and N^ε2-Η tautomers, and a protonated form, the charged H⁺ form, with one form favored over the other two by the protein environment and pH. In addition, His with a pK° of 6.6 [118] is the only ionizable residue that titrates around neutral pH, allowing the non-protonated nitrogen of its imidazole ring to serve as an effective ligand for metal binding [79], or to play a crucial role in the proton-transfer process [103].

Certainly, determination of the fraction of the tautomeric forms of the imidazole ring of His in proteins in solution is an important problem for a number of reasons. At a given fixed pH proteins in solution exist as an ensemble of conformations and, hence, the form of each His residue among different protein conformers may vary significantly because the tautomeric equilibrium is determined by the environment [43]. Moreover, because the exchange between different protonation states is assumed to occur in the fast exchange regime [79], the NMR resonances of a given nucleus, which include rotation, protonation and tautomerization, merge into a single average signal. Decoding the information from these exchange processes offers possibility to determine the extent to which the His residues in proteins behave as free His, where the N^ε2-H tautomer is favored over the N^δ1-H tautomer in a ratio of 4:1 [108].

To find a solution to this long-standing problem in the biophysical chemistry of proteins, first, each form of His was treated as a terminally-blocked model tripeptide with the sequence: Ac-GH^ξG-NMe, with H^ξ in the N^δ1-H, the N^ε2-H tautomeric form or the protonated form H⁺, respectively. For each of the forms, a set of ~35,000 conformations, representing a uniform sampling of the whole Ramachandran map as function of ϕ, ψ, ω, χ1 and χ2 torsional angles, was generated. Afterward, the gas-phase, isotropic shielding value was computed using the method described in Sect. 2.1. Finally, the distribution of the computed shielding of the imidazole ring of His was analyzed in terms of all ¹³C nuclei, namely ¹³C^γ, ¹³C^δ2, and ¹³C^ε1 (see Fig. 4). Specifically, the histogram of the shielding distribution (among all ~35,000 conformations) was fit by a Gaussian function with a mean value σ_o (shown as bars in Fig. 4) and standard deviation sd (data not shown). A visual inspection of the histogram shown in Fig. 4 revealed that the mean σ_o shielding values obtained for the ¹³C^ε1 nucleus is not sensitive to changes in the form of the imidazole ring and, therefore, we confine our interest to those nuclei that are sensitive to such changes, namely ¹³C^δ2 and ¹³C^γ.

Use of first-order shielding differences for a pair of selected nuclei, ¹³C^δ2 and ¹³C^γ, rather than chemical shifts, is a very convenient approach because the experimental referencing problem may be a source of errors [99]. Consequently, we define the first-order shielding difference, Δ^ξ, as Δ^ξ = |σ ^δ2_ο – σ ^γ_ο |^ξ, with ξ denoting the form of the imidazole ring, and σ ^δ2_ο and σ ^γ_ο are the computed mean values of the shielding distribution for the ¹³C^δ2 and ¹³C^γ nuclei, respectively. In other words, the following convention is adopted: ξ = δ, ε, or +, to designate the N^δ1-H, N^ε2-H or the H⁺ form, respectively.

Analysis of the first-order shielding differences indicates that the following inequality holds: Δ^ε > Δ⁺ > Δ^δ, and Δ^δ ~0. Therefore, once the fraction of protonated H⁺ form, f ⁺ = < ρ > , computed with Eq. (4), and Δ^obs = |¹³C^δ2 – ¹³C^γ|, with ¹³C^δ2 and ¹³C^γ being the observed chemical shifts in solution, at a given pH, are known, the fraction of the N^ε2-H tautomer (f ^ε) can be obtained assuming: (a) that all forms are in fast exchange on the NMR chemical shift time-scale [79], i.e., as: Δ^obs= f ^ε Δ^ε + f ⁺ Δ⁺ + f ^δ Δ^δ; and (b) that Δ^δ ≡ 0.

Using these assumptions, together with some physical constraints, enable us to find an analytical expression with which to compute f ^ε, namely as:$ f^{\varepsilon } = \frac{{\Delta^{obs} (1 - \langle \rho \rangle )}}{{\Delta^{\varepsilon } }} $, with Δ^ε the single-valued first-order shielding difference computed for the N^ε2-H tautomer (Δ^ε ~ 31 ppm). The fraction of the f ^δ tautomer is obtained straightforwardly as: $ f^{\delta } \, = \,1 - < \rho > - f^{\varepsilon } $.

The above formulation was used to determine the tautomeric forms of His for each of 8 selected proteins for which both the structure and the ¹³C^δ2 and ¹³C^γ chemical shifts of the imidazole ring of His, are available. In each of these applications the average degree of protonation < ρ > for all ionizable residues was computed by using Eq. (4). The tautomeric forms of His are determined by using the expressions for f ^δ and f ^ε given above [43]. Likewise, using the observed values, Δ^obs, obtained from solid-state NMR for unblocked dipeptides, with the sequence His-Leu, His-Met, Gly-His, Leu-His, His-Ala, His-Glu, Ala-His and His-Asp [99], we also determined the tautomeric fractions of the imidazole ring of His for each of these 8 compounds.

Results obtained from the 8 proteins indicate that the protonated form is the most populated one while the distribution of the tautomeric forms for the imidazole ring varies significantly among different histidine residues in the same protein (see Fig. 5a). Thus, His226 and His250 show comparable degree of protonation, < ρ >, although the tautomeric distribution is very different (see Fig. 5a), i.e., showing the importance of the environment of the histidines in determining the tautomeric forms. Let us explain the origin of this observation. On one hand, the N^δ1 nucleus of H250 is located only 2.9 Å from the carbonyl backbone oxygen of S248 (see Fig. 5b), presumably forming a hydrogen-bond (green dots in Fig. 5b), while the N^ε2 nucleus is exposed to the solvent but the imidazole ring is surrounded by fully protonated R264 and R266 (data not shown) and, hence, lowering the probability that a proton binds to N^ε2, in good agreement with the computed tautomeric distribution for H250 in Fig. 5a. On the other hand, the N^ε2 nucleus of the imidazole ring of H226 is at 3.3 Å from a backbone carbonyl oxygen of W246 (see Fig. 5c), while the N^δ1 is at 3.1 Å from a backbone amino group of H226 (see Fig. 5c). As a result, a preference of N^ε2-H over the N^δ1-H tautomeric form for H226 is expected, in agreement with the computed tautomeric fractions for this residue in Fig. 5a.

In addition, our results show that for ~70% of the neutral histidine-containing dipeptides the method leads to fairly good agreement between the calculated and the experimental tautomeric form. Co-existence of different tautomeric forms in the same crystal structure may explain the disagreement obtained for the remaining 30% of dipeptides.

4.2 Protein Structure Determination

In this section we illustrate, with two examples, how the structural information encoded in the ¹³C^α chemical shifts can be used to determine an ensemble of conformations, provided that a set of NOE-derived distance constraints, is available. However, since the chemical shifts are sensitive to the dynamics of a protein on the microsecond time scale [88] the question whether a single rather than an ensemble of conformations is a better representation of the NMR observables, such as the chemical shifts, must be investigated first.

4.2.1 The Crystallographer Dilemma: A Single Structure or an Ensemble of Conformations?

In protein crystallography it is conventional to represent the conformation of a protein by a single structure, although proteins are very flexible in solution, and, hence, the question whether a single structure, rather than an ensemble of conformations, is a more accurate representation of the observed ¹³C^α chemical shifts in solution deserves to be investigated.

Proteins in solution are flexible molecules which exhibit anisotropic motion and exist as a dynamic ensemble of conformations. Although, protein flexibility in the crystalline state is reduced (compared to solution) as a result of crystal packing, some dynamics and heterogeneity still remain [119, 120] because of the high solvent content in most protein crystals [104]. Despite this, protein structures solved by X-ray diffraction are traditionally represented by a single conformation. Crystallographic temperature (B) factors, which contain information about atomic displacements arising from the combined effects of dynamic, static and lattice disorders within the crystal lattice, provide an important indication of protein motions in the crystalline state.

Consequently, consideration of an ensemble of protein conformations generated by using B-factor values as a guide may potentially improve the agreement between the NMR- and X-ray-derived protein models in terms of some NMR observables, such as ¹³C^α chemical shifts. To explore such possibility we selected ubiquitin, an α/β 76 residues protein. The structure of this protein was solved by X-ray (PDB id 1UBQ [83]), and NMR (PDB id 1D3Z [54]) methods, with the latter providing the available ¹³C^α chemical shifts.

Since the deposited PDB structures of 1UBQ were solved and refined by using software and force-field parameters different from those employed in our method, a new set of conformations was generated using MCM and rigid geometry starting from the corresponding regularized experimental X-ray structure (1UBQ^regular). During the MCM search, variations of the (ϕ, ψ, χ) torsional angles were allowed for all the residues in the sequence. The reported B-factors for 1UBQ were used to estimate the upper limit of the torsional angle variation adopted (±10^°). The generated set of conformations was subjected to several rounds of refinement using a standard procedure in X-ray crystallography, i.e., the Crystallography and NMR System (CNS) program [51, 52]. As a result 5 conformations were selected.

All the 5 generated models are quite different among themselves and from the corresponding starting structure, with an all-atom rmsd of 0.36–1.13 Å. Moreover, for all 5 models, no residues were in disallowed regions of the Ramachandran plot [8] and all unfavorable contacts occur between the atoms from the last five residues in the sequence, which were not visible in the electron-density map. In addition, the R and R_free factors of the 5 models are equivalent to or better than those of the one obtained for a Simulated Annealing Refined (SAR) structure of PDB 1UBQ. This refinement of the deposited 1UBQ structure i.e., named SAR structure, is a necessary step for a consistent comparison between the chemical shifts of the generated 5 models and the PDB structure, because C¹³ chemical shifts are very sensitive to small differences in bond lengths and bond angles [16].

Figure 6 shows the rmsd values between the observed and computed ¹³C^α chemical shifts obtained for each of the 5 new models (light-grey bars) and the SAR structure (black-filled bar). The ca-rmsd, computed from the ensemble of 5 new models, is shown as a horizontal solid line in Fig. 6. The ca-rmsd (2.36 ppm) is lower than the value for the SAR structure (2.74 ppm) or for any of the new models. These results obtained for ubiquitin demonstrate that consideration of an ensemble of 5 conformations, derived from the regularized experimental X-ray (1UBQ^regular) structure, leads to better agreement with the observed ¹³C^α chemical shifts than does a single conformation (the SAR structure).

The above conclusion is in line with the suggestion of crystallographers’ that “…a more suitable representation of a macromolecular crystal structure would be an ensemble of models...” [121]. Analysis of NMR-determined ensemble of conformations also lead to similar conclusion, i.e., use of the ca-rmsd value led to closer agreement with the observed ¹³C^α chemical shifts in solution than when individual, or the mean, rmsd is used [33]. In other words, proteins in solution are conformationally labile, as indicated by both the ca-rmsd and the theoretical minimal-rmsd model analyses, and this must be taken into account to predict the ¹³C^α chemical shifts most accurately.

4.2.2 Determination of β-Sheet Structures

Evidence obtained from the probability-based secondary structure identification method of Wang and Jardetzky [122] suggests that the reliability to distinguish an α-helix from a statistical coil based on chemical shift information follows, for the heavy nuclei only, the ranking: ¹³C^α > ¹³C′ > ¹³C^β > ¹⁵N, whereas a different trend (¹³C^β > ¹³C^α ~ ¹³C′ ~ ¹⁵N) was found for the corresponding reliability to distinguish a β-strand conformation from a statistical coil. This trend raises the question whether a mainly ¹³C^α-driven methodology can be used to predict predominantly β-sheet structures and, if so, how well the corresponding ¹³C^β chemical shift predictions would be.

To answer this question, our recently-introduced physics-based protocol (see Fig. 1) was applied to determine the structure a 20-residue peptide capable of forming a three-stranded antiparallel β-sheet in aqueous solution, i.e., the BS2 peptide with the sequence: TWIQN_DPGTKWYQN_DPGTKIYT, for which both a complete set of ¹³C^α chemical shifts and a reduced number of NOEs were reported. The experimental structure determination of small proteins and peptides, which are able to fold as monomers and do not contain disulfide bonds, is very valuable because such determinations can provide important information for force-field development and evaluation or improvement of search algorithms aimed at an efficient exploration of the conformational space [123–126].

The results obtained indicate that an accurate all β-sheet structure can be determined by simply identifying a set of conformations which simultaneously satisfy a set of constraints including ¹³C^α-dynamically-derived torsional angle constraints for all amino acid residues in the sequence and a fixed set of NOE-derived distance constraints [29]. Among the thousands of conformations generated by the VTF approach, i.e., during the step 1 of the protein structure determination protocol shown in Fig. 1, 25 of them (see Fig. 7a) were selected by using a clustering procedure. This small set of conformation was used to determine the theoretical minimal-rmsd model that provides us with a set of ϕ, ψ, and χ torsional angle constraints for all the residues in the sequence not just for those in α-helix or β-sheet regions. Using this set of torsional angle constraints (ϕ, ψ, χ), combined with different number of NOE-derived constraints, 2 sets of conformations of the BS2 peptide were determined after the step 2 of the protocol. One set of 20 conformations (shown in Fig. 7b) was obtained by using 118 NOE-derived distance constraints, while the other set of 10 conformations (shown in Fig. 7c) was obtained by using 130 NOE-derived distance constraints. Regardless of the number of the NOE’s-derived distance constraints used, addition of the ¹³C^α-derived torsional constraints led to a noticeably lower ca-rmsd’s (2.2 and 3.5 ppm, for the set of 20 and 10 conformations, respectively) compared to the 20 models obtained by Santiveri et al. [127] who used a full set of 130 NOE’s-derived distance constraints but no ¹³C^α chemical shift information (4.6 ppm). In line with this finding, graphical inspection of the results shown in Fig. 7b–c also indicated that use of ¹³C^α-derived torsional constraints led to sets of conformations with less side-chain torsional angle spreading, i.e., as can be seen from comparison of Fig. 7b and c against 7d, with the latter obtained by Santiveri et al [127]. In addition, the correlation coefficient, R, between the observed and computed ¹³C^β chemical shifts was somewhat better for the two sets obtained using the ¹³C^α-based determination protocol (shown in Fig. 1). Thus, R is 0.99 and 0.98 for the 20 and 10 conformation sets, respectively, while R is 0.97 for the set of conformation derived by Santiveri et al [127].

Overall, analysis of the ca-rmsd, the NOE-derived distance violations, the ¹³C^β chemical shifts, and some stereo chemical quality factors for these sets, as a measure of the closeness with which the calculations reproduce the structure in solution, indicates that our self-consistent physics-based method is able to produce a more accurate set of conformations (shown in Fig. 7b and c) than that obtained with the traditional methods [127] [shown in Fig. 7d]. Our results also suggest that for a flexible molecule in solution, like BS2, it may not be possible to determine a single structure that would satisfy all the constraints simultaneously. This is a consequence of the well-known fact that NMR parameters, such as the observed NOE-derived distances and the ¹³C^α chemical shifts, correspond to a dynamic ensemble of conformations and, therefore, may not be reproduced exactly by a limited set of static structures [44, 128].

Characterization of the structural flexibility of molecules in solution is of fundamental importance for the study of biological function, stability and folding [129, 130]. Therefore, additional analysis of the per-residue average ¹³C^α conformational shifts was carried out and the results indicated that the third, C-terminal, strand in the β-sheet of the BS2 peptide is the most flexible strand, although less flexible than the turns. In addition, a 20 ns molecular dynamics simulations (MD) using the AMBER 8.0 package [131] were performed. The MD runs yielded a plausible atomic description of the motion of BS2 peptide in solution, as revealed by both the pattern of hydrogen bonds and the generalized Lindemann parameter [132]. The MD results were in line with the per-residue average ¹³C^α conformational shifts analysis, providing additional evidence of greater flexibility of the C-terminal strand.

The fact that the observed ¹³C^α chemical shifts, supplemented only by NOE-derived distance constraints, provide accurate information for validation and refinement of protein structures, as well as site-specific information about the flexibility of a molecule in solution, may be very useful for NMR spectroscopists and theoreticians interested in analysis of the stability and protein-folding mechanism.

4.2.3 A Blind Test to Determine an α-Helical Structure

The solution NMR structures of both full length (residues 1–77) and truncated (residues 1–46) forms of YnzC protein (PDB id 2JVD) from Bacillus subtilis [133], that is part of the small yneA SOS response operon that regulates cell division in this organism [134], have been determined recently [135]. The corresponding X-ray crystal structure (PDB ID, 3BHP) was solved by Kuzin et al. [133] at 2.0 Å resolution. The unique two-helix monomeric structure of YnzC, with no disulfide bonds, makes it an attractive subject for testing our physics-based methodology for protein structure determination.

The goal of this application is two two-fold. First, as a blind test, we attempted to determine whether it is possible to obtain an ensemble of conformations for which each individual conformer simultaneously satisfies the NOE-derived distance constraints and the ¹³C^α-derived torsional constraints for the YnzC protein in solution [136]. Although the solution NMR structure [135] of this protein had been solved at the time of this blind test, the only information provided was a full set of both the observed ¹³C^α chemical shifts and the NOE-derived distance constraints. In particular, no information about the coordinates of the solved structures of the YnzC protein [135] or the heteronuclear ¹⁵N-¹H NOE data was provided at the moment of the test.

Our second goal was to carry out a cross-validation test of high-quality sets of conformations obtained for the YnzC protein in solution by using alternative determination methods, namely, the solution NMR set of conformations (PDB id, 2JVD) obtained by using NOE-derived distance constraints, dihedral-angle constraints and hydrogen-bond constraints [135], and the 2.0-Å X-ray crystal structure (PDB id, 3BHP) (Kuzin et al. [133]. For this second goal, several validation scores were used [136], including: (i) Recall, Precision, F-measure (RPF) analysis [6]; (ii) several global quality score indicators provided by Verify3D [10], ProsaII [137], Procheck [8], and MolProbity [5]; (iii) the ca-rmsd and rmsd between observed ¹³C^α chemical shifts and those computed at the DFT level, and (iv) the backbone rmsd between these refined structures and the mathematical average coordinates of the ensemble of NMR structures of YnzC(1–48) deposited in the PDB.

By carrying out a blind test we demonstrated [136] that an accurate all α-helical set of protein structures can be determined by simply identifying conformations which simultaneously satisfy a set of constraints, including ¹³C^α-dynamically-derived torsional angle constraints for all amino acid residues in the sequence and a fixed set of 1022 NOE-derived distance constraints. The protein structure determination was carried out as follows: after generation of thousands of conformations using the VTF procedure (step 1) 10 of them, shown in Fig. 8b, were selected, i.e., those possessing a maximum NOE-derived distance violation lower than some fixed cutoff value; only one of the 10 conformations produced in step 1 was selected. The selected conformation was used as a starting one in a conformational search carried out with two types of constraints: the original fixed limited NOE-derived distance constraints and the set of ϕ, ψ, χ torsional angles derived from step 1. The resulting new set of 10 conformations is shown in Fig. 8c. Repetition of the step 2 with a tighter tolerance range, than in the previous iteration, for the torsional angle constraints enabled us to determine the final set of 10 conformations shown in Fig. 8d, i.e., the so-called Set-NOE-CS.

A comparative analysis of the rmsd, between the computed and observed ¹³C^α chemical shifts values for the residues 1–46, for all three sets of conformations is shown in Fig. 8a as a bar diagram, viz., the Set-NOE-CS (shown in Fig. 8d), 2JVD (shown in Fig. 8e) and the three chains of the X-ray crystallography structure 3HBP (shown in Fig. 8f). The results shown in Fig. 8a reveals that the two NMR-derived ensembles of structures (2JVD and Set-NOE-CS) are a better representation for the observed ¹³C^α chemical shifts in solution in terms of the ca-rmsd (solid horizontal black and red lines in Fig. 8a), than any single conformer (red or yellow bars in Fig. 8a), or any single chain of the X-ray structure (black, cyan and green bars in Fig. 8a). This result is in line with previous calculations for 10 NMR-derived conformations (PDB id 1D3Z) and the X-ray structure (PDB id 1UBQ) of ubiquitin.

Since the ca-rmsd analysis might be biased by the fact that the 10 conformations of Set-NOE-CS were computed using a ¹³C^α-based method while the others were not, a cross-validation quality test was also carried out. These structures consistently show good values for the RFP and DP-scores as well as for global structure quality indicators. This analysis reveals that all three sets of structures analyzed here display very good agreement with the experimental NOE data, as well as dihedral angle distributions and atomic clash scores typical of good quality protein structures. Taken together, these results indicate that the 20 conformations from the 2JVD set, the DFT-computed 10 conformations from Set-NOE-CS, and each of the three chains of the X-ray structure are highly-accurate sets of conformations which represent the YnzC protein in solution.

4.3 Protein Structure Validation

The PDB is the most important archive of experimental protein structures solved by X-ray crystallography and NMR spectroscopy. The large number of structures deposited in PDB constitutes an extraordinary source of information that has been, and continuously is, used for a wide range of applications in structural drug design, molecular modeling, force-field parameterization, molecular biology applications, etc. Some deposited protein structures, showing few, or a large number, of flaws, are formally withdrawn from the data-base and, hence, considered as obsolete, even though their coordinates remain available in PDB. In most cases, a successor (or superseded) structure replaces the old obsolete one. The large number of obsolete structure indicates that development of accurate validation protocols remains an important task.

4.3.1 A Chemical-Shift-Based Server

An ideal validation method should meet two requirements. First, it should be strong rather than weak. A validation method is considered ‘strong’ if it is able to assess how well a structure, or an ensemble of structures, predicts experimental data not used in the structure-determination process; otherwise it should be considered ‘weak’, since it is limited to reproducing the observed experimental data used in the determination of the protein models [138]. Second, it should be able to detect fast and accurately, at residue level, the existence of structural flaws. With these goals in mind a new server (CheShift) has been developed recently to predict ¹³C^α chemical shifts of protein structures. It is based on a database of chemical shifts computed for 696,916 conformations as a function of the ϕ, ψ, ω, χ1 and χ2 torsional angles for all 20 naturally occurring amino acids. The ¹³C^α chemical shifts were computed at the DFT level of theory using the methodology described in Sect. 2.1. Because of the large number of conformations, the computed shielding values were obtained using a small basis set (6-31G/3-21G) and later extrapolated to a large basis set [6-311 + G(2d,p)/3-21G], as described in Methods section.

An analysis of the accuracy and sensitivity of the CheShift predictions, in terms of the correlation coefficient R between the observed and predicted ¹³C^α chemical shifts, was carried out on 36 X-ray-derived protein structures solved at 2.3 Å, or better, resolution. Results indicate that for all the proteins the R values obtained using the CheShift, SHIFTX [24], SPARTA [25], SHIFTS [38, 39], and PROSHIFT [23] servers were comparable, although the CheShift values were systematically lowest. This raises the following question: do these servers provide a more sensitive validation than CheShift? To answer this question we choose protein 1RGE, solved at 1.15 Å resolution [139]. The corresponding crystal structure of this protein contains two chemically identical but crystallographically independent molecules in the asymmetric unit, named here as A and B [139]. The main structural difference between molecules A and B (with an all-heavy-atom rmsd of 1.1 Å) is due to differences in side chain conformations, especially those occupying different rotameric states. For this test, that do not require a comparison with the observed ¹³C^α chemical shifts, we computed the correlation coefficient R between the ¹³C^α chemical-shift predictions obtained for molecules A and B, respectively, by using five servers listed above. The results of this test give the following R values: 0.96, 1.00, 1.00, 0.98, and 1.00 for CheShift, SHIFTX, SPARTA, SHIFTS, and PROSHIFT, respectively. Except for CheShift (0.96) and SHIFTS (0.98), none of the servers is able to discriminate, beyond doubt, between molecules A and B. From a statistical point of view the R values obtained from SHIFTX (1.00), SPARTA (1.00) and PROSHIFT (1.00) servers indicate that molecules A and B are practically indistinguishable protein models. Therefore a lower R value between the predicted and observed ¹³C^α chemical shifts does not necessarily mean poorer accuracy but it could mean higher sensitivity to subtle structural differences. This conclusion can be confirmed by a similar analysis carried out at a higher level of accuracy, for example, by using a larger basis set and the actual geometry of chains A and B, i.e., without need for any torsional angle interpolations as with the CheShift server. In this case, the R value (0.93) computed with the larger basis set was significantly lower than the R value obtained with CheShift (0.96), or any other server, namely, 1.00, 1.00, 0.98, and 1.00 for SHIFTX, SPARTA, SHIFTS, and PROSHIFT, respectively.

So far, we have shown that the QM basis of the CheShift server enables us to predict the ¹³C^α chemical shifts with reasonable accuracy in seconds. Our results suggest that CheShift can provide a standard with which to evaluate the quality of protein structures solved by either X-ray crystallography or NMR-spectroscopy, if the experimentally observed ¹³C^α chemical shifts are available.

4.3.2 CheShift-2: A Picture Is Worth a Thousand Words

Differences between the observed and CheShift-predicted ¹³C^α chemical shifts can be used as a sensitive probe with which to detect possible local flaws in NMR-determined protein structures; hence, a graphical user interface has been added to the CheShift-2 server [49] to render such flaws easily visible. CheShift was originally developed to return a list of ¹³C^α predicted chemical-shift values, one for each amino acid in the sequence of a protein, except for the first and last residues [28, 33]. The validation process, i.e., the comparison between the predicted and the observed ¹³C^α chemical-shift values, is left to the user of the server who can use the provided information to determine the quality of the NMR structure as a whole, e.g., by computing the ca-rmsd [33]. However, it is a highly-desirable goal of any accurate validation method [11, 34] to identify the existence of local flaws in the sequence rather than only the global quality. Therefore, we added a graphical user interface (GUI) to the CheShift server. As a result, it will be possible to facilitate the validation process by displaying the differences between the observed and computed ¹³C^α chemical shifts by using a three-color code mapped onto a 3D protein model. This graphic validation method, far from being only an aesthetic improvement, will enable users of CheShift-2 to detect local flaws in proteins on a per-residue basis fast and accurately without the need for the user to carry out the extensive DFT calculations on which the server is based.

The CheShfit-2 server [49] makes use of the following sequential steps: (i) for each amino acid residue i the average difference between the observed and predicted ¹³C^α chemical-shifts, Δ_i, is computed by using Eq. (2); (ii) the Δ_i value is smoothed by averaging it over the values of the two nearest-neighbor residues (< Δ_i>); (iii) the resulting nearest-neighbor averaged value, < Δ_i> , is discretized, i.e., it is assigned an integer value of 1, 0 or −1, depending on the magnitude of < Δ_i > ; and (iv) these discrete values are mapped onto the 3D protein model and color coded as blue, white and red, respectively. This color-code assignment is based on the assumption that < Δ_i> values which are within ~1.7 ppm (blue), are considered as small; within ~3.4 ppm (white), as medium; and beyond 3.4 ppm (red), as large. Differences corresponding to blue and white colors are considered acceptable, while red color indicates possible flaws in the structure. In addition, the yellow color was adopted to specify the absence of observed or computed ¹³C^α chemical shifts [49].

When more than one protein model exists the averaged Δ_i values are computed considering all the deposited conformations, although the colored representation is illustrated by using only the first model. This situation is illustrated in Fig. 9 for the 20 NMR-determined conformations (see Fig. 9a) of Bacillus Cereus, a membrane associate protein, PDB id 2K5Q. The large dispersion of conformation in the loops and at the N- and C-termini shown in Fig. 9a, rather than being poor representation of the protein, reflects the flexibility of these segments of the molecules in solution, as is clearly shown by the CheShift-2 validation of 2K5Q (see Fig. 9b).

4.3.3 Global Versus Local Validation of Proteins

The NMR-determined ensembles of dynein light chain 2A protein, PDB id 1TGQ and 2B95, respectively, show different fold, with one of them, namely 1TGQ (now obsolete) having a wrong fold; while the other one, 2B95 (that replaced the obsolete 1TGQ in the PDB), showing a correct fold. This difference is a result of the oligomeric state assumed during the protein-structure determination, namely a monomer for 1TGQ, and a homodimer for 2B95, as pointed out by Nabuurs et al. [11].

Validation of both protein ensembles, as a whole, shows that 2B95 is a slightly better representation of the observed ¹³C^α chemical shifts, in terms of the ca-rmsd [34], than 1TGQ, viz., ca-rmsd = 2.08 and 2.35 ppm, for 2B95 and 1TGQ, respectively. However, the ca-rmsd difference between these two ensembles (~0.30 ppm) is not large enough to assure, unambiguously, that the 1TGQ ensemble needs further refinement. In fact, a similar difference in terms of rmsd, i.e., within a range of ~0.30 ppm, was found among 5 new models of the protein ubiquitin (see grey bars in Fig. 6), all of which fit X-ray diffraction data with R and R_free factors similar to those for the deposited X-ray structure, PDB id1UBQ, solved at 1.8 Å resolution [41]. Certainly, these 5 new models can be considered to be of comparable structural quality. Consequently, variations of ca-rmsd ~0.30 ppm cannot be used as a universal criterion to unequivocally determine if a protein, such as 1TGQ, needs further refinement.

Analysis of dynein light chain 2A protein illustrates that validation of a protein as a whole (global validation), e.g., with the ca-rmsd, may not enable us to determine unambiguously whether one protein model is of better quality than another model of the same protein, while the validation at a per-residue basis (local validation), e.g., as with the CheShift-2 server, does (see Fig. 10). To further test the ability of CheShift-2 server to detect small differences between protein models, a small set of 15 obsolete/successor pairs of proteins was also considered (see Supplementary Data of [49]. The results indicate that the CheShift-2 server constitutes a fast and accurate validation tool with which to determine, at the per-residue basis, the existence of local flaws in protein models even for conformations that differ in small details, as for the obsolete and successor models of Membrane-bound Lytic Murein Transglycosylase D (fragment Lysm Domain) (see Fig. 11).

In general, pairs of obsolete and successor proteins present in PDB can be used as a benchmark set with which to test validation methods. These ensembles of obsolete/successor pairs of proteins are very appealing because their members possess different topology and numbers of residues and a complete sets of ¹³C^α chemical shifts are available for a large number of them from the Bio Magnetic Resonance Data Bank (BMRB) [117].

5 Conclusions and Future Directions

In this chapter we have illustrated how the information encoded in the ¹³C chemical shifts can be used for an assorted number of applications, namely, from protein structure prediction to accurate detection of structural flaws, at a residue-level, in NMR-determined protein models.

The ability to detect and accurately characterize the mobility of the surface side chains by computing ¹³C^α chemical shifts constitutes one of the strengths of the current methodology. Hence, we are planning to focus our research on the development of new physics-based algorithms for a fast and accurate determination and validation of side-chain conformations, with the goal to improve the quality of NMR-determined protein models. Since NMR spectroscopy provides chemical shifts for several other nuclei, besides ¹³C^α, feasibility of their DFT-computation and benefits of including the information encoded in these data in structure determination protocols is currently under investigation in our group. In general, new developments in the field of NMR spectroscopy are needed in order to develop protocols for high-throughput NMR determination of high-quality protein structures in solution.

References

Bhattacharya, A., Tejero, R., Montelione, G.T.: Evaluating protein structures determined by structural genomics consortia. Proteins 66, 778–795 (2007)
Article Google Scholar
Billeter, M., Wagner, G., Wüthrich, K.: Solution NMR structure determination of proteins revisited. J. Biomol. NMR 42, 155–158 (2008)
Article Google Scholar
Williamson, M.P., Craven, C.J.: Automated protein structure calculation from NMR data. J. Biomol. NMR 43, 131–143 (2009)
Article Google Scholar
Williamson, M.P., Kikuchi, J., Asajura, T.: Application of 1H-NMR chemical-shifts to measure the quality of protein structures. J. Mol. Biol. 247, 541–546 (1995)
Google Scholar
Davis, I.W., Leaver-Fay, A., Chen, V.B., Block, J.N., Kapral, G.J., Wang, X., Murray, L.W., Arendall III, W.B., Snoeyink, J., Richardson, J.S., Richardson, D.C.: MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res. 35, W375–W383 (2007)
Article Google Scholar
Huang, Y.J., Powers, R., Montelione, G.T.: Protein NMR Recall, Precision, and F-measure scores (RPF scores): Structure quality assessment measures based on information retrieval statistics. J. Am. Chem. Soc. 127, 1665–1674 (2005)
Article Google Scholar
Huang, Y.J., Tejero, R., Powers, R., Montelione, G.T.: A topology-constrained distance network algorithm for protein structure determination from NOESY data. Proteins 62, 587–603 (2006)
Article Google Scholar
Laskowski, R.A., MacArthur, M.W., Moss, D.S., Thornton, J.: PROCHECK—a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291 (1993)
Article Google Scholar
Lovell, S.C., Davis, I.W., Arendall III, W.B., de Bakker, P.I.W., Word, J.M., Prisant, M.G., Richardson, J.S., Richardson, D.C.: Structure validation by Cα geometry: ϕ, ψ, and Cβ deviation. Proteins 50, 437–450 (2003)
Article Google Scholar
Lüthy, R., Bowie, J.U., Eisenberg, D.: Assessment of protein models with three-dimensional profiles. Nature 356, 83–85 (1992)
Article Google Scholar
Nabuurs, S.B., Spronk, C.A.E.M., Vuister, G.W., Vriend, G.: Tradional biomolecular structure determination by NMR spectroscopy allows for major errors PLOS. Comp. Biol. 2, 71–79 (2006)
Google Scholar
Vriend, G.: WHAT IF: a molecular modeling and drug design program. J. Mol. Graph. 8, 52–56 (1990)
Article Google Scholar
Berjanskii, M., Wishart, D.S.: A simple method to predict protein flexibility using secondary chemical shifts. J. Am. Chem. Soc. 127, 14970–14971 (2005)
Article Google Scholar
Berjanskii, M., Wishart, D.S.: The RCI server: rapid and accurate calculation of protein flexibility using chemical shifts. Nucleic Acids Res. 35, W531–W537 (2007)
Article Google Scholar
Cornilescu, G., Delaglio, F., Bax, A.: Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J. Biomol. NMR 13, 289–302 (1999)
Article Google Scholar
de Dios, A.C., Pearson, J.G., Oldfield, E.: Chemical shifts in proteins: An ab initio study of carbon-13 nuclear magnetic resonance chemical shielding in glycine alanine and valine residues. J. Am. Chem. Soc. 115, 9768–9773 (1993)
Article Google Scholar
de Dios, A.C., Pearson, J.G., Oldfield, E.: Secondary and tertiary structural effects on protein NMR chemical shifts: An ab initio approach. Science 260, 1491–1496 (1993)
Article Google Scholar
Frank, A., Möller, H.M., Exner, T.H.: Toward the quantum chemical calculation of NMR chemical shifts of proteins. 2 Level of theory, basis set, and solvent model dependence. J. Chem. Theory Comput. 8, 1480–1492 (2012)
Article Google Scholar
Havlin, R.H., Le, H., Laws, D.D., de Dios, A.C., Oldfield, E.: An ab initio quantum chemical investigation of carbon–13 NMR shielding tensors in glycine, alanine, valine, isoleucine, serine, and threonine: Comparisons between helical and sheet tensors, and effects of χ1 on shielding. J. Am. Chem. Soc. 119, 11951–11958 (1997)
Article Google Scholar
Iwadate, M., Asakura, T., Williamson, M.P.: Cα and Cβ carbon-13 chemical shifts in proteins from an empirical database. J. Biomol. NMR 13, 199–211 (1999)
Article Google Scholar
Kuszewski, J., Qin, J., Gronenborn, A.M., Clore, M.: The impact of direct refinement against 13Cα and 13Cβ chemical shifts on protein structure determination by NMR. J. Magn. Reson. Ser. B 106, 92–96 (1995)
Article Google Scholar
Luginbühl, P., Szyperski, T., Wüthrich, K.: Statistical basis for the use of 13Cα chemical shift in protein structure determination. J. Magn. Reson. 109, 229–233 (1995)
Article Google Scholar
Meiler, J.: PROSHIFT: protein chemical shift prediction using artificial neural networks. J. Biomol. NMR 26, 25–37 (2003)
Article Google Scholar
Neal, S., Nip, A.M., Zhang, H., Wishart, D.S.: Rapid and accurate calculation of protein 1H, 13C and 15 N chemical shifts. J. Biomol. NMR 26, 215–240 (2003)
Article Google Scholar
Shen, Y., Bax. Ad.: Protein backbone chemical shifts predicted from searching a database for torsional angle and sequence homology. J. Biomol. NMR, 38, 289–302 (2007)
Article Google Scholar
Shen, Y., Lange, O., Delaglio, F., Rossi, P., Aramini, J.M., Liu, G., Eletsky, A., Wu, Y., Singarapu, K.K., Lemak, A., et al.: Consistent blind protein structure generation from NMR chemical shift data. Proc. Natl. Acad. Sci. U. S. A. 105, 4685–4690 (2008)
Article Google Scholar
Spera, S., Bax, A.: Empirical correlation between protein backbone conformation and Cα and Cβ 13C nuclear magnetic resonance chemical shifts. J. Am. Chem. Soc. 113, 5490–5492 (1991)
Article Google Scholar
Vila, J.A., Arnautova, Y.A., Martin, O.A., Scheraga, H.A.: Quantum-mechanics-derived 13Cα chemical shift server (CheShift) for Protein Structure validation. Proc. Natl. Acad. Sci. U. S. A 106, 16972–16977 (2009)
Article Google Scholar
Vila, J.A., Arnautova, Y.A., Scheraga, H.A.: Use of 13Cα chemical shifts for accurate determination of β-sheet structures in solution. Proc. Natl. Acad. Sci. U. S. A. 105, 1891–1896 (2008)
Article Google Scholar
Vila, J.A., Aramini, J.M., Rossi, P., Kuzin, A., Su, M., Seetharaman, J., Xiao, R., Tong, L., Montelione, G.T., Scheraga, H.A.: Quantum chemical 13Cα chemical shift calculations for protein NMR structure determination. refinement, and validation. Proc. Natl. Acad. Sci. U. S. A. 105, 14389–14394 (2008)
Article Google Scholar
Vila, J.A., Baldoni, H.A., Ripoll, D.R., Ghosh, A., Scheraga, H.A.: Polyproline II helix conformation in a proline-rich environment: a theoretical Study. Biophys. J. 86, 731–742 (2004)
Article Google Scholar
Vila, J.A., Baldoni, H.A., Ripoll, D.R., Scheraga, H.A.: Unblocked statistical-coil tetrapeptides in aqueous solution: quantum-chemical computation of the carbon-13 NMR chemical shifts. J. Biomol. NMR 26, 113–130 (2003)
Article Google Scholar
Vila, J.A., Villegas, M.E., Baldoni, H.A., Scheraga, H.A.: Predicting 13Cα chemical shifts for validation of protein structures. J. Biomol. NMR 38, 221–235 (2007)
Article Google Scholar
Vila, J.A., Scheraga, H.A.: Assessing the accuracy of protein structures by quantum mechanical computations of 13Cα chemical shifts. Acc. Chem. Res. 42, 1545–1553 (2009)
Article Google Scholar
Villegas, M.E., Vila, J.A., Scheraga, H.A.: Effects of side-chain orientation on the 13C chemical shifts of antiparallel β-sheet model peptides. J. Biomol. NMR 37, 137–146 (2007)
Article Google Scholar
Wishart, D., Bigam, C.G., Yao, J., Abildgaard, F., Dyson, H., Oldfield, E., Markley, J., Sykes, B.: 1H, 13C and 15 N chemical shift referencing in biomolecular NMR. J. Biomol. NMR 6, 135–140 (1995)
Article Google Scholar
Wishart, D., Bigam, C.G., Holm, A., Hodges, R.S., Sykes, B.D.: 1H, 13C and 15 N random coil NMR chemical shifts of the common amino acids. I Investigation of nearest-neigbor effects. J. Biomol. NMR 5, 67–81 (1995)
Article Google Scholar
Xu, X.-P., Case, D.A.: Probing multiple effects on 15 N, 13Cα, 13Cβ and 13C′ chemical shifts in peptides using density functional theory. Biopolymers 65, 408–423 (2002)
Article Google Scholar
Xu, X.-P., Case, D.A.: Automated prediction of 15 N, 13Cα, 13Cβ and 13C’ chemical shifts in proteins using a density functional database. J. Biomol. NMR 21, 321–333 (2001)
Article Google Scholar
Parr, R.G., Yang, W.: Density functional theory of atoms and molecules. Oxford University Press, New York (1989)
Google Scholar
Arnautova, Y.A., Vila, J.A., Martin, O.A., Scheraga, H.A.: What can we learn by computing 13Cα chemical shifts for X-ray protein models? Acta Crystallogr. D D65, 697–703 (2009)
Article Google Scholar
Martin, O.A., Villegas, M.E., Vila, J.A., Scheraga, H.A.: Analysis of 13Cα and 13Cβ chemical shifts of cysteine and cystine residues in proteins: A quantum chemical approach. J. Biomol. NMR 46, 217–225 (2010)
Article Google Scholar
Vila, J.A., Arnautova, Y.A.: Vorobjev and Scheraga HA. Assessing the fractions of tautomeric forms of the imidazole ring of histidine in proteins as a function of pH. Proc. Natl. Acad. Sci. U. S. A. 108, 5602–5607 (2011)
Article Google Scholar
Vila, J.A., Ripoll, D.R., Scheraga, H.A.: Use of 13Cα chemical shifts in protein structure determination. J. Phys. Chem. B 111, 6577–6585 (2007)
Article Google Scholar
Vila, J.A., Scheraga, H.A.: Factors affecting the use of 13Cα chemical shifts to determine, refine, and validate protein structures. Proteins: structure. Funct. Bioinformatics 71, 641–654 (2008)
Article Google Scholar
Wüthrich, K.: NMR of Proteins and Nucleic Acids. Wiley, New York, NY, U. S. A. (1986)
Google Scholar
Sun, H., Sanders, L.K., Oldfield, E.: Carbon-13 NMR shielding in the twenty common amino acids: comparisons with experimental results in proteins. J. Am. Chem. Soc. 124, 5486–5495 (2002)
Article Google Scholar
Vila, J.A., Serrano, P., Wüthrich, K., Scheraga, H.A.: Sequential nearest-neighbor effects on computed 13Cα chemical shifts. J. Biomol. NMR 48, 23–30 (2010)
Article Google Scholar
Martin, O.A., Vila, J.A., Scheraga, H.A.: CheShift-2: graphic validation of protein structures. Bioinformatics 28, 1538–1539 (2012)
Article Google Scholar
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: Protein Data Bank Nucleic Acids Res. 28, 235–242 (2000)
Article Google Scholar
Brünger, A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros, P., Grosse-Kunstleve, R.W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N.S., Read, R.J., Rice, L.M., Simonson, T., Warren, G.L.: Crystallography and NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr D 54, 905–921 (1998)
Article Google Scholar
Brünger, A.T.: Version 1.2 of the Crystallography and NMR system. Nat. Protoc. 2, 2728–2733 (2007)
Article Google Scholar
Cavalli, A., Salvatella, X., Dobson, C.M., Vendruscolo, M.: Protein structure determination from NMR chemical shifts. Proc. Natl. Acad. Sci. U.S.A. 104, 9615–9620 (2007)
Article Google Scholar
Cornilescu, G., Marquardt, J.L., Ottiger, M., Bax, A.: Validation of protein structure from anisotropic carbonyl chemical shifts in a dilute liquid crystalline phase. J. Am. Chem. Soc. 120, 6836–6837 (1998)
Article Google Scholar
Frank, A., Onila, I., Moller, H.M., Exner, T.E.: Toward the quantum chemical calculation of nuclear magnetic resonance chemical shifts of proteins. Proteins 79(2189), 2202 (2011)
Google Scholar
Guerry, P., Herrmann, T.: Advances in automated NMR protein structure determination. Q. Rev. Biophys. 44, 257–309 (2011)
Article Google Scholar
Güntert, P.: Structure calculation of biological macromolecules from NMR data. Q. Rev. Biophys. 31, 145–237 (1998)
Article Google Scholar
Güntert, P.: Automated structure determination from NMR spectra. Eur. Biophys. J. 38, 129–143 (2009)
Article Google Scholar
Güntert, P., Braun, W., Wüthrich, K.: Efficient computation of threedimensional protein structures in solution from nuclear magnetic resonance data using the program DIANA and the supporting programs CALIBA, HABAS and GLOMSA. J. Mol. Biol. 217, 517–530 (1991)
Article Google Scholar
Rosato, A., Aramini, J.M., Arrowsmith, C., Bagaria, A., Baker, D., Cavalli, A., Doreleijers, J.F., Eletsky, A., Giachetti, A., Guerry, P., et al.: Blind testing of routine, fully automated determination of protein structures from NMR data. Structure 20, 227–236 (2012)
Article Google Scholar
Rosato, A., Bagaria, A., Baker, D., Bardiaux, B., Cavalli, A., Doreleijers, J.F., Giachetti, A., Guerry, P., Guntert, P., Herrmann, T., et al.: CASDNMR: critical assessment of automated structure determination by NMR. Nat. Methods 6, 625–626 (2009)
Article Google Scholar
Némethy, G., Gibson, K.D., Palmer, K.A., Yoon, C.N., Paterlini, G., Zagari, A., Rumsey, S., Scheraga, H.A.: Energy parameters in polypeptides. 10. Improved geometrical parameters and nonbonded interactions for use in the ECEPP/3 algorithm, with application to praline-containing peptides. J. Phys. Chem. 96, 6472–6484 (1992)
Article Google Scholar
Frisch, M.J., Trucks, G.W., Schlegel, H.B., Scuseria, G.E., Robb, M.A., Cheeseman, J.R., Zakrzewski, V.G., Montgomery, J.A., Jr Stratmann, R.E., Burant, J.C., et al.: Gaussian 03, Revision E.01, Gaussian, Inc., Wallingford CT (2003)
Google Scholar
Chesnut, D.B., Moore, K.D.: Locally dense basis-sets for chemical-shift calculations. J. Comp. Chem. 10, 648–659 (1989)
Article Google Scholar
Jameson, A.K., Jameson, C.J.: Gas-phase 13C chemical shifts in the zero-pressure limit: Refinements to the absolute shielding scale for 13C J. Chem. Phys. Lett. 134, 461–466 (1997)
Article Google Scholar
Vásquez, M., Scheraga, H.A.: Variable-target-function and buildup procedures for the calculation of protein conformation—application to bovine pancreatic trypsin-inhibitor using limited simulated nuclear magnetic-resonance data. J. Biomol. Struct. Dyn. 5, 757–784 (1988)
Article Google Scholar
Kruskal Jr., J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7, 48–50 (1956)
Article MathSciNet MATH Google Scholar
Li, Z., Scheraga, H.A.: Monte Carlo minimization approach to the multiple minima problem in protein folding. Proc. Natl. Acad. Sci. U. S. A. 84, 6611–6615 (1987)
Article MathSciNet Google Scholar
Li, Z., Scheraga, H.A.: Structure and free energy of complex thermodynamic systems. J. Molec. Str. (Theochem) 179, 333–352 (1998)
Article Google Scholar
Arnautova, Y.A., Jagielska, A., Scheraga, H.A.: A new force field (ECEPP05) for peptides proteins and organic molecules. J. Phys. Chem. B 110, 5025–5044 (2006)
Article Google Scholar
Vila, J., Williams, R.L., Vásquez, M., Scheraga, H.A.: Empirical solvation models can be used to differentiate native from near-native conformations of bovine pancreatic trypsin inhibitor Proteins: structure. Funct. Genet. 10, 199–218 (1991)
Article Google Scholar
Ripoll, D.R., Ni, F.: Refinement of the thrombin-bound structure of a hirudin peptide by a restrained electrostatically driven monte-carlo method. Biopolymers 32, 359–365 (1992)
Article Google Scholar
Vorobjev, Y.N., Scheraga, H.A.: A fast adaptive multigrid boundary element method for macromolecule electrostatic computations in solvent. J. Comp. Chem. 18, 569–583 (1997)
Article Google Scholar
Vorobjev, Y.N., Vila, J.A., Scheraga, H.A.: FAMBE-pH: a fast and accurate method to compute the total solvation free energies of proteins. J. Phys. Chem. B 112, 11122–11136 (2008)
Article Google Scholar
Ripoll, D.R., Vorobjev, Y.N., Liwo, A., Vila, J.A., Scheraga, H.A.: Coupling between folding and ionization equilibria: Effects of pH on the conformational preferences of polypeptides. J. Mol. Biol. 264, 770–783 (1996)
Article Google Scholar
Vila, J.A., Ripoll, D.R., Arnaturova, Y.A., Vorobjev, Y.N., Scheraga, H.A.: Coupling between conformation and proton binding in proteins. Proteins 61, 56–68 (2005)
Article Google Scholar
Sitkoff, D., Sharp, K.A., Honig, B.: Accurate calculation of hydration free energies using macroscopic solvent models. J. Phys. Chem. 98, 1978–1988 (1994)
Article Google Scholar
Barth, P., Alber, T., Harbury, P.B.: Accurate, conformation-dependent predictions of solvent effects on protein ionization constants. Proc. Natl. Acad. Sci. U. S.A. 104, 4898–4903 (2007)
Article Google Scholar
Hass, M.A.S., Hansen, D.F., Christensen, H.E.M., Led, J.J., Kay, L.E.: Characterization of conformational exchange of a histidine side chain: protonation, rotamerization, and tautomerization of His61 plastocyanin from Anabaena variabilis. J. Am. Chem. Soc. 130, 8460–8470 (2008)
Article Google Scholar
Serrano, P., Johnson, M.A., Chatterjee, A., Neuman, B., Joseph, J.S., Buchmeier, M.J., Kuhn, P., Wüthrich, K.: NMR structure of the nucleic acid-binding domain of the SARS coronavirus nonstructural protein 3. J. Virol. 83, 12998–13008 (2009)
Article Google Scholar
Schwarzinger, S., Kroon, G.J.A., Foss, T.R., Chung, J., Wright, P.E., Dyson, H.J.: Sequence-dependent correction of random coil NMR chemical shifts. J. Am. Chem. Soc. 123, 2970–2978 (2001)
Article Google Scholar
Wang, Y., Jardetzky, O.: Investigation of the neighboring residue effects on protein chemical shifts. J. Am. Chem. Soc. 12, 14075–14084 (2002)
Article Google Scholar
Vijay-Kumar, S., Bugg, C.E., Cook, W.J.: Structure of ubiquitin refined at 1.8 Å resolution. J. Mol. Biol. 194, 531–544 (1987)
Article Google Scholar
Quirt, A.R., Lyerla Jr., J.R., Peat, I.R., Cohen, J.S.: Reynolds WF and freedman MH Carbon-13 nuclear magnetic resonance titration shifts in amino acids. J. Am. Chem. Soc. 96, 570–574 (1974)
Article Google Scholar
Rabenstein, D.L., Sayer, T.L.: Carbon-13 shifts parameters for amines, carboxylic acids and amino acids. J. Magn. Res. 24, 27–39 (1976)
Google Scholar
Sayer, T.L., Rabenstein, D.L.: Nuclear magnetic resonance studies of the acid-base chemistry of amino acids and peptides. III Determination of the microscopic and macroscopic acid dissociation constants of α, ω-diaminocarboxylic acids Can. J. Chem. 54, 3392–3400 (1976)
Google Scholar
Surprenant, H.L., Sarneski, J.E., Key, R.R., Byrd, J.T., Reilley, C.N.: Carbon-13 studies of amino acids: chemical shifts, protonation shifts, microscopic protonation behavior. J. Magn. Res. 40, 231–243 (1980)
Article Google Scholar
Lindorff-Larsen, K., Best, R.B., Depristo, M.A., Dobson, C.M., Vendruscolo, M.: Simultaneous determination of protein structure and dynamics. Nature 433, 128–132 (2005)
Article Google Scholar
Chakrabarti, P., Pal, D.: Main-chain conformational features at different conformations of the side-chains in proteins. Protein Eng. 11, 631–647 (1998)
Article Google Scholar
Dumbrack Jr., R.L., Karplus, M.: Conformational analysis of the backbone-dependent rotamer preferences of protein sidechains. J. Mol. Biol. 230, 543–574 (1993)
Article Google Scholar
Chothia, C., Levitt, M., Richardson, D.: Structure of proteins: packing of α-helices and β-sheets. Proc. Natl. Acad. Sci. U. S. A. 74, 4130–4134 (1977)
Article Google Scholar
Chou, K.-C., Pottle, M., Némethy, G., Ueda, Y., Scheraga, H.A.: Structure of β sheets. Origin of the right handed twist and of the increased stability of antiparallel over parallel sheets. J. Mol. Biol. 162, 89–112 (1982)
Google Scholar
Chou, K.-C., Scheraga, H.A.: Origin of the right handed twist of β sheets of poly(L Val) chains. Proc. Natl. Acad. Sci. USA 79, 7047–7051 (1982)
Article Google Scholar
Creighton, T.E.: Proteins: Structure and Molecular Properties, pp. 186, 223. W.E. Freeman and Company, New York (1984)
Google Scholar
Karplus, M.: Contact electron-spin coupling of nuclear magnetic moments. J. Chem. Phys. 30, 11–15 (1959)
Article Google Scholar
Mandel, M.: Proton Magnetic resonance spectra of some proteins: I. Ribonuclease, oxidized ribonuclease, lysozyme, and cytochrome c. J. Biol Chem. 240, 1586–1592 (1965)
Google Scholar
Bradbury, J.H., Scheraga, H.A.: Structural studies of ribonuclease. XXIV. The application of nuclear magnetic resonance spectroscopy to distinguish between the histidine residues of ribonuclease. J. Am. Chem. Soc. 88, 4240–4246 (1966)
Article Google Scholar
Bachovchin, W.W.: 15 N NMR spectroscopy of hydrogen-bonding interactions in the active site of serine proteases: evidence for a moving histidine mechanism. Biochemistry 25, 7751–7759 (1986)
Article Google Scholar
Cheng, F., Sun, H., Zhang, Y., Mukkamala, D., Oldfield, E.: A solid state 13C NMR, crystallographic, and quantum chemical investigation of chemical shifts and hydrogen bonding in histidine dipeptides. J. Am. Chem. Soc. 127, 12544–12554 (2005)
Article Google Scholar
Farr-Jones, S., Wong, W.Y.L., Gutheil, W.G., Bachovchin, W.W.: Direct observation of the tautomeric forms of histidine in 15 N NMR spectra at low temperatures. Comments on intramolecular hydrogen bonding on tautomeric equilibrium. J. Am. Chem. Soc. 115, 6813–6819 (1993)
Article Google Scholar
Harbison, G., Herzfeld, J.: Griffin RGJ Nitrogen-15 chemical shifts tensors in L-histidine hydrochloride monohydrate. J. Am. Chem. Soc. 103, 4752–4754 (1981)
Article Google Scholar
Hass, M.A.S., Yilmaz, A., Christensen, H.E.M., Led, J.J.: Histidine side-chain dynamics and protonation monitored by 13C CPMG NMR relaxation dispersion. J. Biomol. NMR 44, 225–233 (2009)
Article Google Scholar
Hu, F., Wenbin, L., Hong, M.: Mechanism of proton conduction and gating in influenza M2 proton channels from solid-state NMR. Science 330, 505–508 (2010)
Article Google Scholar
Jensen, M.R., Has, M.A.S., Hansen, D.F., Led, J.J.: Investigating metal-binding in proteins by nuclear magnetic resonance. Cell. Mol. Life Sci. 64, 1085–1104 (2007)
Article Google Scholar
Markley, J.L.: Observation of histidine residues in proteins by means of nuclear magnetic resonance spectroscopy. Acc. Chem. Res. 8, 70–80 (1974)
Article Google Scholar
Meadows, D.H., Jardetzky, O., Epand, R.M., Ruterjans, H.H., Scheraga, H.A.: Proc. Natl. Acad. Sci. U.S.A. 60, 766–772 (1968)
Article Google Scholar
Pelton, J.G., Torchia, D.A., Meadow, N.D., Roseman, S.: Tautomeric states of the active-site histidine of phosphorylated and unphosphorylated IIIGlc, a signal-transducing protein from Escherichia coli, using two-dimensional heteronuclear NMR techniques ProtSci 2, 543–558 (1993)
Google Scholar
Reynolds, W.F., Peat, I.R., Freedman, M.H., LyerlaJr, J.R.: Determination of the tautomeric form of the imidazole ring of L-Histidine in basic solution by carbon-13 magnetic resonance spectroscopy. J. Am. Chem. Soc. 95, 328–331 (1973)
Article Google Scholar
Schuster, I.I., Roberts, J.D.: Nitrogen-15 nuclear magnetic resonance spectroscopy. Effects of hydrogen bonding and protonation on nitrogen chemical shifts in imidazoles. J. Org. Chem. 44, 3864–3867 (1979)
Article Google Scholar
Shimba, N., Serber, Z., Lewidge, R., Miller, S.M., Craik, C.S., Dotsch, V.: Quantitative identification of the protonation state of histidine in vitro and in vivo. Biochem 42, 9227–9234 (2003)
Article Google Scholar
Shimba, N., Takahashi, H., Sakakura, M., Fuji, I., Shimada, I.: Determination of protonation and deprotonation forms and tautomeric states of histidine residues in large proteins using nitrogen-carbon J couplings in imidazole ring. J. Am. Chem. Soc. 120, 10988–10989 (1998)
Article Google Scholar
Steiner, T.: L-Histidyl-L-alanine dehydrate. Acta. Cryst. C 52, 2554–2556 (1996)
Article Google Scholar
Steiner, T., Koellner, G.: Coexistence of both histidines tautomers in the solid state and stabilization of the unfavorable Nδ-H form by intramolecular hydrogen bonding: rystalline L-His-Gly hemihydrates. Chem. Commun. 13, 1207–1208 (1997)
Article Google Scholar
Strohmeier, M., Stueber, D., Grant, D.M.: Accurate 13C and 15 N chemical shift and 14 N quadrupolar coupling constant calculations in amino acid crystals: Zwitterionic, hydrogen-bonded systems. J. Phys. Chem. A 107, 7629–7642 (2003)
Article Google Scholar
Sudmeier, J.L., Bradshaw, E.M., Coffman Haddad, K.E., Day, R.M., Thalhauser, C.J., Bullock, P.A., Bachovchin, W.W.: Identification of histidine tautomers in proteins by 2D 1H/13Cδ2 one-bond correlated NMR. J. Am. Chem. Soc. 125, 8430–8431 (2003)
Article Google Scholar
Wüthrich, K.: NMR in Biological Research: Peptides and Proteins. North-Holland, Amsterdam (1976)
Google Scholar
Ulrich, E.L., Akutsu, H., Doreleijers, J.F., Harano, Y., Ioannidis, Y.E., Lin, J., Livny, M., Mading, S., Maziuk, D., Miller, Z., Nakatani, E., Schulte, C.F., Tolmie, D.E., Wenger, R.K., Yao, H., Markley, J.L.: BioMagResBank nucleic. Acids Res. 36, D402–D408 (2008)
Article Google Scholar
Demchuk, E., Wade, R.C.: Improving the continuum dielectric approach to calculating pKas of ionizeable groups in proteins. J. Phys. Chem. 100, 17373–17387 (1996)
Article Google Scholar
DePristo, M.A., de Bakker, P.I.W., Blundell, T.L.: Heterogeneity and inaccuracy in protein structures solved by X-ray crystallography. Structure 12, 831–838 (2004)
Article Google Scholar
Ringe, D., Petsko, G.A.: Study of protein dynamics by X-ray diffraction Methods in Emzymology 131, 389–433 (1986)
Google Scholar
Furnham, N., Blundell, T.L., DePristo, M.A., Terwilliger, T.C.: Is one solution good enough? Nature Struct. Mol. Biol. 13, 184–185 (2006)
Article Google Scholar
Wang, Y., Jardetzky, O.: Probability-based protein secondary structure identification using combined NMR chemical-shift data. Prot Sci 11, 852–861 (2002)
Article Google Scholar
Höfinger, S., Almeida, B., Hansmann, U.H.E.: Parallel tempering molecular dynamics folding simulation of a signal peptide in explicit water. Proteins 68, 662–669 (2007)
Article Google Scholar
Jang, S., Kim, E., Pak, Y.: Free energy surfaces of miniproteins with a beta beta alpha motif: replica exchange molecular dynamics simulation with an implicit solvation model. Proteins 62, 663–671 (2006)
Article Google Scholar
Mohanty, S., Hansmann, U.H.E.: Folding of proteins with diverse folds. Biophy. J. 91, 3573–3578 (2006)
Article Google Scholar
Zhou, R.: Free energy landscape of protein folding in water: Explicit versus implicit solvent. Proteins 53, 148–161 (2003)
Article Google Scholar
Santiveri, C.M., Santoro, J., Rico, M., Jiménez, M.A.: Factors involved in the stability of isolated beta-sheets: turn sequence, beta-sheet twisting, and hydrophobic surface burial. Prot. Sci. 13, 1134–1147 (2004)
Article Google Scholar
Zhao, D., Jardetzky, O.: An assessment of the precision and accuracy of protein structures determined by NMR–dependence on distance errors. J. Mol. Biol. 239, 601–607 (1994)
Article Google Scholar
Korzhnev, D.M., Orekhov, V.Y., Arseniev, A.S.: Model-free approach beyond the borders of its applicability. J. Mag. Res. 127, 184–191 (1997)
Article Google Scholar
Palmer III, A.G.: NMR characterization of the dynamics of biomacromolecules. Chem. Rev. 104, 3623–3640 (2004)
Article Google Scholar
Case, D.A., Darden, T.A., Cheatham, T.E., III, Simmerling, C.L., Wang, J., Duke, R.E., Luo, R., Merz, K.M., Wang, B., Pearlman, D.A., et al.: AMBER 8 University of California, San Francisco (2004)
Google Scholar
Zhou, Y., Vitkup, D., Karplus, M.: Native proteins are surface-molten solids: Application of the Lindemann criterion for the solid versus liquid state. J. Mol. Biol. 285, 1371–1375 (1999)
Article Google Scholar
Kuzin, A.P., Su M., Seetharaman, J., Janjua, H., Cunningham, K., Maglaqui, M., Owens, L.A., Zhao, L., Xiao, R., Baran, M.C., Acton, T.B., Rost, B., Montelione, G.T., Hunt, J.F., Tong, L.: Crystal structure of UPF0291 protein ynzC from Bacillus subtilis at resolution 2.0 A. (2008) Northeast Structural Genomics Consortium target SR384. https://doi.org/10.2210/pdb3bhp/pdb
Kawai, Y., Moriya, S., Ogasawara, N.: Identification of a protein YneA, responsible for cell division suppression during the SOS response in Bacillus subtilis. Mol. Microbiol. 47, 1113–1122 (2003)
Article Google Scholar
Aramini, J.M., Sharma, S., Huang, Y.J., Swapna, G.V.T., Ho, C.K., Shetty, K., Cunningham, K., Ma, L.-C., Zhao, L., Owens, L.A., Jiang, M., Xiao, R., Liu, J., Baran, M.C., Acton, T.B., Rost, B., Montelione, G.T.: Solution NMR structure of the SOS response protein YnzC from Bacillus subtilis Proteins: Structure. Funct. Bioinformatics 72, 526–530 (2008)
Article Google Scholar
Vila, J. A., Baldoni, H. A., Scheraga, H. A.: performance of density functional models to reproduce observed 13Cα chemical shifts of proteins in solution. J. Comp. Chem. 38, 884–892 (2008b)
Article Google Scholar
Sippl, M.J.: Recognition of errors in three-dimensional structures of proteins. Proteins 17, 355–362 (1993)
Article Google Scholar
Kleywegt, G.J.: On vital aid: the why, what and how of validation Acta. Cryst, D 65, 134–139 (2009)
Article Google Scholar
Sevcik, J., Dauter, Z., Lamzin, V.S., Wilson, K.S.: Ribonuclease from streptomyces aureofaciens at atomic resolution. Acta Cryst D D52, 327–344 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

IMASL-CONICET, Universidad Nacional de San Luis, Ejército de Los Andes, 950-5700, San Luis, Argentina
Jorge A. Vila
Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, 14853-1301, USA
Jorge A. Vila
Molsoft L.L.C, 11199 Sorrento Valley Road, S209, San Diego, CA, 92121, USA
Yelena A. Arnautova

Authors

Jorge A. Vila
View author publications
You can also search for this author in PubMed Google Scholar
Yelena A. Arnautova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jorge A. Vila .

Editor information

Editors and Affiliations

Faculty of Chemistry, University of Gdańsk, Gdańsk, Poland
Adam Liwo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vila, J.A., Arnautova, Y.A. (2019). ¹³C Chemical Shifts in Proteins: A Rich Source of Encoded Structural Information. In: Liwo, A. (eds) Computational Methods to Study the Structure and Dynamics of Biomolecules and Biomolecular Processes. Springer Series on Bio- and Neurosystems, vol 8. Springer, Cham. https://doi.org/10.1007/978-3-319-95843-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-95843-9_20
Published: 29 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95842-2
Online ISBN: 978-3-319-95843-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

13C Chemical Shifts in Proteins: A Rich Source of Encoded Structural Information

Abstract

Similar content being viewed by others

13C Chemical Shifts in Proteins: A Rich Source of Encoded Structural Information

An Overview on Protein Structure Determination by NMR: Historical and Future Perspectives of the use of Distance Geometry Methods

Quantitative Protein Disorder Assessment Using NMR Chemical Shifts

1 Introduction

2 Methods

2.1 Calculation of 13Cα Chemical Shifts

2.2 Determination of an Effective TMS Shielding Value

2.3 Computation of the Ca-RMSD Model

2.4 13Cα-Based Protein Structure Determination Method

2.5 Computation of the 13Cα Chemical Shifts as Function of the PH

3 Factors Affecting the Calculation of 13Cα Chemical Shifts

3.1 Transferability of the Results

3.2 Performance of Different DFT Functionals to Reproduce Observed 13Cα Chemical Shifts

3.3 Performance of Different Basis Sets to Reproduce Observed 13Cα Chemical Shifts

3.4 Effect of Sequential Nearest-Neighbors on the 13Cα Chemical Shifts Calculations

3.5 Rigid-Geometry Approximation and Accuracy of the Calculations of 13Cα Chemical Shifts

3.6 13Cα Chemical Shifts as a Function of the Charge Distribution

3.7 13Cα Chemical Shifts as a Function of Side-Chain Flexibility

4 Use of the Structural Information Decoded from 13C Chemical Shifts

4.1 The Importance of Being His

4.2 Protein Structure Determination

4.2.1 The Crystallographer Dilemma: A Single Structure or an Ensemble of Conformations?

4.2.2 Determination of β-Sheet Structures

4.2.3 A Blind Test to Determine an α-Helical Structure

4.3 Protein Structure Validation

4.3.1 A Chemical-Shift-Based Server

4.3.2 CheShift-2: A Picture Is Worth a Thousand Words

4.3.3 Global Versus Local Validation of Proteins

5 Conclusions and Future Directions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation

¹³C Chemical Shifts in Proteins: A Rich Source of Encoded Structural Information

2.1 Calculation of ¹³C^α Chemical Shifts

2.4 ¹³C^α-Based Protein Structure Determination Method

2.5 Computation of the ¹³C^α Chemical Shifts as Function of the PH

3 Factors Affecting the Calculation of ¹³C^α Chemical Shifts

3.2 Performance of Different DFT Functionals to Reproduce Observed ¹³C^α Chemical Shifts

3.3 Performance of Different Basis Sets to Reproduce Observed ¹³C^α Chemical Shifts

3.4 Effect of Sequential Nearest-Neighbors on the ¹³C^α Chemical Shifts Calculations

3.5 Rigid-Geometry Approximation and Accuracy of the Calculations of ¹³C^α Chemical Shifts

3.6 ¹³C^α Chemical Shifts as a Function of the Charge Distribution

3.7 ¹³C^α Chemical Shifts as a Function of Side-Chain Flexibility

4 Use of the Structural Information Decoded from ¹³C Chemical Shifts