Sparse Estimation for Structural Variability

  • Raghavendra Hosur
  • Rohit Singh
  • Bonnie Berger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6293)


Proteins are dynamic molecules that exhibit a wide range of motions; often these conformational changes are important for protein function. Determining biologically relevant conformational changes, or true variability, efficiently is challenging due to the noise present in structure data. In this paper we present a novel approach to elucidate conformational variability in structures solved using X-ray crystallography. We first infer an ensemble to represent the experimental data and then formulate the identification of truly variable members of the ensemble (as opposed to those that vary only due to noise) as a sparse estimation problem. Our results indicate that the algorithm is able to accurately distinguish genuine conformational changes from variability due to noise. We validate our predictions for structures in the Protein Data Bank by comparing with NMR experiments, as well as on synthetic data. In addition to improved performance over existing methods, the algorithm is robust to the levels of noise present in real data. In the case of Ubc9, variability identified by the algorithm corresponds to functionally important residues implicated by mutagenesis experiments. Our algorithm is also general enough to be integrated into state-of-the-art software tools for structure-inference.


Structural Variability Sparse Estimation Lasso Regression Neural Information Processing System True Variability 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adams, P., Grosse-Kunstleve, R., Hung, L., Loerger, T., McCoy, A., Moriarty, N., Read, R., Sacchettini, J., Sauter, N., Terwilliger, T.: Phenix:building new software for automated crystallographic structure determination. Acta Crystallographica (D) 58, 1948–1954 (2002)CrossRefGoogle Scholar
  2. 2.
    Bourne, P., Weissig, H.: Structural Bioinformatics. Wiley-Liss, Inc., NJ (2003)Google Scholar
  3. 3.
  4. 4.
    Delano, W.: The pymol molecular graphics system (2002),
  5. 5.
    Bedem van den, H., Dhanik, A., Latombe, J., Deacon, A.: Modeling discrete heterogeneity in x-ray diffraction data by fitting multi-conformers. Acta Cryst. (D) D65, 1107–1117 (2009)Google Scholar
  6. 6.
    DePristo, M., de Bakker, P., Blundell, T.: Heterogeneity and inaccuracy in protein structures solved by x-ray crystallography. Structure 12, 831–838 (2004)CrossRefPubMedGoogle Scholar
  7. 7.
    Drenth, J.: Principles of Protein x-ray crystallography. Springer, New York (1999)CrossRefGoogle Scholar
  8. 8.
    Eissenmesser, E., Millet, O., Labeikovsky, W., Korzhnev, D., Wolf-Watz, M., Bosco, D., Skalicky, J., Kay, L., Kern, D.: Intrinsic dynamics of an enzyme underlies catalysis. Nature 438, 117–121 (2005)CrossRefGoogle Scholar
  9. 9.
    Furnham, N., Blundell, T., DePristo, M., Terwilliger, T.: Is one solution good enough. Nature Struct. and Mol. Biol. 13(3), 184–185 (2006)CrossRefGoogle Scholar
  10. 10.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    Jensen, L.: Methods in Enzymology, pp. 353–366 (1997)Google Scholar
  12. 12.
    Ji, H., Liu, S.: Analyzing ’omics data using hierarchical models. Nature Biotechnology 28, 337–340 (2010)CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Kleywegt, G.: Validation of protein crystal structures. Acta Crystallographica (D) 56, 249–265 (2000)CrossRefGoogle Scholar
  14. 14.
    Knight, J., Zhou, Z., Gallichio, E., Himmel, D., Friesner, R., Arnold, E., Levy, R.: Exploring structural variability in x-ray crystallographic models using protein local optimization by torsion angle sampling. Acta Crystallographica (D) 64, 383–396 (2008)CrossRefGoogle Scholar
  15. 15.
    Knipscheer, P., van Dijk, W., Olsen, J., Mann, M., Sixma, T.: Noncovalent interaction between ubc9 and sumo promoted sumo chain formation. The EMBO Journal 26, 2797–2807 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Koshland, D.: Conformational changes: How small is big enough? Nature Medicine 4, 1112–1114 (1998)CrossRefPubMedGoogle Scholar
  17. 17.
    Li, F., Yang, Y., Xing, E.: From lasso regression to feature vector machine. Neural Information Processing Systems (NIPS) 18 (2005)Google Scholar
  18. 18.
    Liu, Q., Yuan, Y., Shen, B., Chen, D., Chen, Y.: Conformational flexibility of a ubiquitin conjugation enzyme (e2). Biochemistry 38, 1415–1425 (1999)CrossRefPubMedGoogle Scholar
  19. 19.
    Meinshausen, N., Rocha, B., Yu, B.: Discussion: A tale of three cousins: Lasso, l2boosting and dantzig. Annals of Statistics 35, 2373–2384 (2007)CrossRefGoogle Scholar
  20. 20.
    Nigham, A., Hsu, D.: Protein conformational flexibility analysis with noisy data. Journal of Computational Biology 15, 813–828 (2008)CrossRefPubMedGoogle Scholar
  21. 21.
    Ringe, G., Petsko, G.: Study of protein dynamics by x-ray diffraction. Methods in Enzymology 131, 389–433 (1986)CrossRefPubMedGoogle Scholar
  22. 22.
    Singh, R., Berger, B.: Chaintweak: Sampling from the neighbourhood of a protein conformation. In: Pacific Symposium on Biocomputing, pp. 52–63 (2005)Google Scholar
  23. 23.
    Tatham, M., Kim, S., Yu, B., Jaffray, E., Song, J., Zheng, J., Rodriguez, M., Hay, R., Chen, Y.: Role of n-terminal site of ubc9 in sumo-1,-2, and -3 binding and conjugation. Biochemistry 42, 9959–9969 (2003)CrossRefPubMedGoogle Scholar
  24. 24.
    Terwilliger, T., Grosse-Kunstleve, R., Afonine, P., Adams, P., Moriarty, N., Zwart, P., Read, R., Turk, D., Hung, L.W.: Interpretation of ensembles created by multiple iterative rebuilding of macromolecular models. Acta Crystallographica (D) 63, 597–610 (2007)CrossRefGoogle Scholar
  25. 25.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Stat. Soc. Series B 58, 267–288 (1996)Google Scholar
  26. 26.
    Vitkup, D., Ringe, D., Karplus, M., Petsko, G.: Why proteins r-factors are so large: a self consistent analysis. Proteins 46, 345–354 (2002)CrossRefPubMedGoogle Scholar
  27. 27.
    Volkman, B., Lipson, D., Wemmer, D., Kern, D.: Two state allosteric behaviour in a single domain signalling protein. Science 291, 2429–2433 (2001)CrossRefPubMedGoogle Scholar
  28. 28.
    Wachter, A., Biegler, T.: On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Mathematical Programming 106, 25–57 (2006)CrossRefGoogle Scholar
  29. 29.
    Xu, H., Caramanis, C., Mannor, S.: Robust regression and lasso. Neural Information Processing Systems, NIPS (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Raghavendra Hosur
    • 1
    • 3
  • Rohit Singh
    • 1
  • Bonnie Berger
    • 1
    • 2
  1. 1.Computer Science and Artificial Intelligence LaboratoryMIT, Massachusetts Institute of TechnologyCambridge
  2. 2.Dept. Of MathematicsMITUSA
  3. 3.Dept. Of Materials Science and Eng.MITUSA

Personalised recommendations