On the Effect of Sphere-Overlap on Super Coarse-Grained Models of Protein Assemblies
Ion mobility mass spectrometry (IM/MS) can provide structural information on intact protein complexes. Such data, including connectivity and collision cross sections (CCS) of assemblies’ subunits, can in turn be used as a guide to produce representative super coarse-grained models. These models are constituted by ensembles of overlapping spheres, each representing a protein subunit. A model is considered plausible if the CCS and sphere-overlap levels of its subunits fall within predetermined confidence intervals. While the first is determined by experimental error, the latter is based on a statistical analysis on a range of protein dimers. Here, we first propose a new expression to describe the overlap between two spheres. Then we analyze the effect of specific overlap cutoff choices on the precision and accuracy of super coarse-grained models. Finally, we propose a method to determine overlap cutoff levels on a per-case scenario, based on collected CCS data, and show that it can be applied to the characterization of the assembly topology of symmetrical homo-multimers.
KeywordsMolecular modeling Protein assembly Native mass spectrometry Ion mobility, super coarse-grain
Most proteins assemble into complexes to achieve a specific biological function . Atomic-level information about these complexes can provide precious insights into their mode of action. However, obtaining such high-resolution information is often technically challenging. In this context, integrative modeling approaches can be used to combine low-resolution experimental data on the complex with high-resolution structural information on its subunits, to build models rationalizing all observables .
Native ion mobility mass spectrometry (IM/MS) reports on the connectivity between protein subunits and allows deriving the collision cross section (CCS) of these, as well as their sub-complexes . In recent years, efforts have been dedicated to exploit this data within integrative modeling protocols [4, 5]. Unfortunately, sometimes no atomic model of all the subunits of a complex is available. In this case, super coarse-grained models may be adopted, whereby every molecular subunit is represented by one (or a few more) large sphere [6, 7].
It should be noted that, in the absence of substantial conformational changes upon binding, it will always be possible to find an overlapping arrangement of two spheres so that their combined CCS matches that of the complex they form.
To assess the relationship between spheres’ overlap and their associated CCS, we selected an ensemble of 1988 protein couples from the PiQSi database . Of these, 241 were crystallized as dimers, whereas the rest were proteins being in contact within 526 crystal structures of larger assemblies. Using IMPACT , software numerically estimating the CCS of molecular structures using the projection approximation method, we calculated the CCS of each dimer, as well as that of their constituent subunits. Then, for each pair, we placed a sphere having radius as per Eq. 1 (with rgas = 1 Å, representing helium) on the center of mass of each protein subunit and calculated their resulting overlap, hereon called Ostruct. Such test has been already performed previously, on smaller datasets, to identify an overlap interval representative of most protein couples . This led to proposing a confidence interval between 15 and 45% for sphere-overlap, usable to guide super coarse-grained integrative modeling protocols exploiting CCS data. Analyzing the average value of Ostruct may however not be perfectly suited to this context. Indeed, integrative modeling protocols typically exploit an optimization engine to find an arrangement of protein subunits minimizing a scoring function usually including terms for the physics of molecular interactions (e.g., van der Waals, electrostatics), and assessments of models’ match against available experimental data. As such, optimizers will be naturally guided to the overlap level Obest associated to an arrangement of spheres having the smallest deviation from the target dimer CCS. Therefore, for each protein pair, we also tested a range of overlap levels (from 0 to 100%, with steps of 1%), assessing their error with respect of the known dimer CCS, and identifying the optimal overlap Obest for each of them. For this test, the CCS of each sphere dimer was calculated with IMPACT. The collected Ostruct and Obest values were both Gaussian distributed and centered at 25.4 ± 16.2 and 22.6 ± 15.6%, respectively (Fig. 1b). Analyzing solely protein pairs generated for dimers, and pairs extracted from larger complexes, yielded similar results.
Any overlap confidence interval used to determine whether a sphere arrangement is suitable will be associated to a CCS error: the larger the interval, the broader the range of accepted CCS values. On the other hand, the wider this interval, the higher the likelihood of including within it the most suitable overlap level. For instance, defining the acceptable overlap interval as being within one standard deviation of Obest mean value, i.e., anything between 7.0 and 38.2%, is associated to a CCS error of ± 7.4%, and a likelihood of 73.7% of including Obest in this interval (Fig. 1c). Taken in the context of a modeling framework, this observation indicates there is a non-negligible likelihood for a constraint based on CCS and one based on the statistical distribution of overlaps to be inconsistent. It is therefore not advisable to use such an overlap restraint where CCS data is available.
We calculated CCSratio and err(CCSratio) for each protein pair in our benchmark dataset, supposing a generous experimental error of 3% on each CCS measure (larger than the typical experimental error [9, 14]). These values allowed us to define, for each protein pair, a custom overlap confidence interval, i.e., an overlap region consistent with data derived by ion mobility spectrometry. On average, the obtained intervals had a size (distance from minimum to maximum acceptable overlap) of 13.1%, i.e., less than half than what is typically considered when adopting the same, statistically determined, interval for all protein dimers. Furthermore, for all pairs, the predicted intervals included their specific Obest value. Within these intervals, CCS measurements had an average standard deviation of 3.5%. In summary, our data-driven method to define overlap restraints, hereafter called “adaptive cutoff,” is both more precise and accurate than the traditionally used constant cutoff (i.e., same for each case) based upon a statistical analysis of an ensemble of protein pairs.
In conclusion, we suggest Eq. 2 to be a more suitable metric to define the overlap between two spheres representing super coarse-grained models of proteins. When information about the CCS of both spheres and their complex is available, our adaptive cutoff method should be used to define a suitable confidence interval for the overlap between two spheres, with the overlap defined as per Eq. 1. We note that, in case binding leads to conformational changes altering the CCS of the individual binding partners, the adaptive cutoff will impose a tighter or looser sphere-overlap level. When no information about the CCS of both spheres and their complex is available, the confidence interval should be instead defined on the basis of the constant cutoff criterion we determined by analyzing a large protein pair dataset. The mean overlap value we determined here is Gaussian distributed at 22.6 ± 15.6%. We have however observed that the identification of a protein assembly topology applying such a cutoff on spheres overlap is prone to both false negatives and false positives. Still, we should stress that our tests were simple cases based on symmetrical homo-multimers. It cannot be excluded that better performances may be observed when modeling larger hetero-multimers with no symmetry. Our data-driven adaptive cutoff led to accurate topology prediction in all test cases. This method suffers of two limitations: (1) it currently only applies to symmetrical homo-multimers and, (2) besides the CCS of a single building block and the whole complex, it also requires the CCS of both a monomer and a dimer. Nevertheless, we believe that our observations indicate that exploiting experiment-based overlap restraints for the characterization of protein assembly topologies is a promising route for substantially increasing super coarse-grained models’ accuracy.
We thank Lucas Rudden, Justin Benesch, and Valentina Erastova for critically reviewing this manuscript.
This work was supported by the Engineering and Physical Sciences Research Council (grant EP/P016499/1).
- 1.Gavin, A.-C., Bösche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J.M., Michon, A.-M., Cruciat, C.-M., Remor, M., Höfert, C., Schelder, M., Brajenovic, M., Ruffner, H., Merino, A., Klein, K., Hudak, M., Dickson, D., Rudi, T., Gnau, V., Bauch, A., Bastuck, S., Huhse, B., Leutwein, C., Heurtier, M.-A., Copley, R.R., Edelmann, A., Querfurth, E., Rybin, V., Drewes, G., Raida, M., Bouwmeester, T., Bork, P., Seraphin, B., Kuster, B., Neubauer, G., Superti-Furga, G.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 415, 141–147 (2002)CrossRefGoogle Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.