Skip to main content

Spectral Techniques to Explore Point Clouds in Euclidean Space, with Applications to Collective Coordinates in Structural Biology

  • Conference paper
  • First Online:
Nonlinear Computational Geometry

Part of the book series: The IMA Volumes in Mathematics and its Applications ((IMA,volume 151))

  • 1174 Accesses

Abstract

Life sciences, engineering, or telecommunications provide numerous systems whose description requires a large number of variables. Developing insights into such systems, forecasting their evolution, or monitoring them is often based on the inference of correlations between these variables. Given a collection of points describing states of the system, questions such as inferring the effective number of independent parameters of the system (its intrinsic dimensionality) and the way these are coupled are paramount to develop models. In this context, this paper makes two contributions.

First, we review recent work on spectral techniques to organize point clouds in Euclidean space, with emphasis on the main difficulties faced. Second, after a careful presentation of the bio-physical context, we present applications of dimensionality reduction techniques to a core problem in structural biology, namely protein folding.

Both from the computer science and the structural biology perspective, we expect this survey to shed new light on the importance of non linear computational geometry in geometric data analysis in general, and for protein folding in particular.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P.K. Agarwal, S. Har-Peled, and H. Yu. Embeddings of surfaces, curves, and moving points in euclidean space. In ACM SoCG, 2007.

    Google Scholar 

  2. D. Agrafiotus and H. Xu. A self-organizing principle for learning nonlinear manifolds. PNAS.

    Google Scholar 

  3. M. Belkin and P. Niyogi. Towards a theoretical foundation for laplacian-based manifold methods. In COLT 2005.

    Google Scholar 

  4. M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.

    Article  MATH  Google Scholar 

  5. M. Belkin and P. Niyogi. Semi-supervised learning on riemannian manifolds. Machine Learning, Invited, Special Issue on Clustering, pages 209–234, 2004.

    Google Scholar 

  6. Y. Bengio, M. Monperrus, and H. Larochelle. Nonlocal estimation of manifold structure. Neural Computation, 18, 2006.

    Google Scholar 

  7. Y. Bengio, J.-F. Paiement, P. Vincent, O. Delalleau, N. Le Roux, and M. Ouimet. Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. In NIPS, 2004.

    Google Scholar 

  8. C.M. Bishop. Pattern Recognition and Machine Learning. Springer, 2007.

    Google Scholar 

  9. C.M. Bishop, M. Svensen, and C.K.I. Williams. Gtm: The generative topographic mapping. Neural Computation, 10:215–234, 1998.

    Article  Google Scholar 

  10. M. Brand. Charting a manifold. In Advances in Neural Information Processing Systems 15. MIT Press, Cambridge, MA, 2003.

    Google Scholar 

  11. F. Chazal, D. Cohen-Steiner, and A. Lieutier. A sampling theory for compact sets in euclidean space. In Proceedings of the 22nd ACM Symposium on Computational Geometry, 2006.

    Google Scholar 

  12. F. Chazal, D. Cohen-Steiner, and Q. Mérigot. Stability of boundary measures. 2007.

    Google Scholar 

  13. Siu-Wing Cheng, Yajun Wang, and Zhuangzhi Wu. Provable dimension detection using principal component analysis. In Symposium on Computational Geometry, pp. 208–217, 2005.

    Google Scholar 

  14. B. Christiansen. The shortcomings of nlpca in identifying circulation regimes. J. Climate, 18:4814–4823, 2005.

    Article  MathSciNet  Google Scholar 

  15. R.R. Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proc. of Nat. Acad. Sci., 102:7426–7431, 2005.

    Article  Google Scholar 

  16. R.R. Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Multiscale methods. Proc. of Nat. Acad. Sci., 102:7432–7437, 2005.

    Article  Google Scholar 

  17. J.A. Costa and A.O. Hero. Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans. on Signal Processing, 52(8), 2004.

    Google Scholar 

  18. T.F. Cox and M.A. Cox. Multidimensional Scaling. Chapman Hall, 1994.

    Google Scholar 

  19. V. de Silva and G. Carlsson. Topological estimation using witness complexes. In Eurographics Symposium on Point-BasedGraphics, ETH, Switzerland, 2004.

    Google Scholar 

  20. V. de Silva, J.C. Langford, and J.B. Tenenbaum. Graph approximations to geodesics on embedded manifolds. 2000.

    Google Scholar 

  21. V. de Silva and J.B. Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. In Advances in Neural Information Processing Systems 15. MIT Press, Cambridge, MA, 2003.

    Google Scholar 

  22. M. Dellnitz, M. Hessel von Molo, P. Metzner, R. Preiss, and C. Schutte. Graph algorithms for dynamical systems. In A. Mielke, editor, Analysis, Modeling and Simulation of Multiscale Problems. Springer, 2006.

    Google Scholar 

  23. M. Demazure. Bifurcations and Catastrophes: Geometry of Solutions to Nonlinear Problems. Springer, 1898.

    Google Scholar 

  24. D. Donoho and C. Grimes. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100(10):5591–5596, 2003.

    Article  MATH  MathSciNet  Google Scholar 

  25. Y. Bengio et al. Learning eigenfunctions links spectral embedding and kernel pca. Neural compuation, 16(10), 2004.

    Google Scholar 

  26. J. Giesen and U. Wagner. Shape dimansion and intrinsic metric from samples of manifolds with high co-dimension. In Proc. of the 19th Annual symp. Computational Geometry, pp. 329–337, 2003.

    Google Scholar 

  27. D. Givon, R. Kupferman, and A. Stuart. Extracting macroscopic dymamics. Nonlinearity, 17:R55–R127, 2004.

    Article  MATH  MathSciNet  Google Scholar 

  28. A. Globerson and S. Roweis. Metric learning by collapsing classes. In NIPS, 2005.

    Google Scholar 

  29. Jihun Ham, Daniel D. Lee, Sebastian Mika, and Bernhard Schölkopf. A kernel view of the dimensionality reduction of manifolds. In ICML '04: Proceedings of the twenty-first international conference on Machine learning, p. 47, New York, NY, USA, 2004. ACM.

    Chapter  Google Scholar 

  30. Gloria Haro, Gregory Randall, and Guillermo Sapiro. Stratification learning: Detecting mixed density and dimensionality in high dimensional point clouds. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pp. 553–560. MIT Press, Cambridge, MA, 2007.

    Google Scholar 

  31. T. Hastie and W. Stuetzle. Principal curves. J. Amer. Stat. Assoc., 84:502–516, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  32. Matthias Hein and Markus Maier. Manifold denoising. In NIPS, pp. 561–568, 2006.

    Google Scholar 

  33. Matthias Hein and Markus Maier. Manifold denoising. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pp. 561–568. MIT Press, Cambridge, MA, 2007.

    Google Scholar 

  34. I. Horenko, J. Schmidt-Ehrenberg, and C. Schutte. Set-oriented dimension reduction: localizing principal component analysis vie hidden markov models. In LNBS in Bio-informatics. 2006.

    Google Scholar 

  35. B. Kégl. Intrinsic dimension estimation using packing numbers. In Advances in Neural Information Processing Systems 17. MIT Press, Cambridge, MA, 2002.

    Google Scholar 

  36. R.I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete structures.

    Google Scholar 

  37. S. Lafon and A.B. Lee. Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning and data set parameterization. IEEE PAMI, 28(9):1393–1403, 2006.

    Google Scholar 

  38. M.C. Law and A.K. Jain. Incremental nonlinear dimensionality reduction by manifold learning. IEEE Trans. on pattern analysis and machine intelligence, 28(3), 2006.

    Google Scholar 

  39. J.A. Lee and M. Verleysen. Nonlinear Dimensionality Reduction. Springer, 2007.

    Google Scholar 

  40. Elizaveta Levina and Peter J. Bickel. Maximum likelihood estimation of intrinsic dimension. In Lawrence K. Saul, Yair Weiss, and Léon Bottou, editors, Advances in Neural Information Processing Systems 17, pp. 777–784. MIT Press, Cambridge, MA, 2005.

    Google Scholar 

  41. Nathan Linial, Eran London, and Yuri Rabinovich. The geometry of graphs and some of its algorithmic applications. In IEEE Symposium on Foundations of Computer Science, pp. 577–591, 1994.

    Google Scholar 

  42. J. Mao and A.K. Jain. Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans. Neural Networks, 6(2), 1995.

    Google Scholar 

  43. E. Meerbach, E. Dittmer, I. Horenko, and C. Schutte. Multiscale modelling in molecular dynamics : Biomolecular conformations as metastable states. Lecture notes in physics, 703, 2006.

    Google Scholar 

  44. F. Memoli and G. Sapiro. Distance functions and geodesics on point clouds, 2005.

    Google Scholar 

  45. S.T. Roweis and L.K. Saul. Non linear dimensionality reduction by locally linear embedding. Science, 290:2323–2326, 2000.

    Article  Google Scholar 

  46. S.T. Roweis and L.K. Saul. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4:119–155, 2003.

    MathSciNet  Google Scholar 

  47. J.B. Tenenbaum and V. de Silva. Sparse multi-dimensional scaling using landmark points. In preparation.

    Google Scholar 

  48. J.B. Tenenbaum, V. de Silva, and J.C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319–2323, 2000.

    Article  Google Scholar 

  49. R. Tibshirani. Principal curves revisited. Statistics and Computing, 2:183–190, 1992.

    Article  Google Scholar 

  50. M. Trosset. Applications of multidimensional scaling to molecular conformation. Computing Science and Statistics, (29):148–152, 1998.

    Google Scholar 

  51. L.J.P. van der Maaten, E.O. Postma, and H.J. van den Herik. Dimensionality reduction: a comparative review. 2007.

    Google Scholar 

  52. Kilian Q. Weinberger and Lawrence K. Saul. Unsupervised learning of image manifolds by semidefinite programming. In CVPR (2), pp. 988–995, 2004.

    Google Scholar 

  53. Kilian Q. Weinberger, Fei Sha, and Lawrence K. Saul. Learning a kernel matrix for nonlinear dimensionality reduction. In ICML '04: Proceedings of the twenty-first international conference on Machine learning, p. 106, New York, NY, USA, 2004. ACM.

    Google Scholar 

  54. K.Q. Weinberger and L.K. Saul. An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In AAAI, 2006.

    Google Scholar 

  55. K.Q. Weinberger and L.K. Saul. Unsupervised learning of image manifolds by semidefinite programming. International Journal of Computer Vision, 70(1):77–90, 2006.

    Article  Google Scholar 

  56. Li Yang. Building connected neighborhood graphs for isometric data embedding. In KDD, pp. 722–728, 2005.

    Google Scholar 

  57. P. Zhand, Y. Huang, S. Shekhar, and V. Kumar. Correlation analysis of spatial time series datasets. In Pacific Asia Conf. on Knowledge Discovery and Data Mining, 2003.

    Google Scholar 

  58. Hao Zhang, Oliver van Kaick, and Ramsay Dyer. Spectral mesh processing. Computer Graphics Forum (to appear), 2008.

    Google Scholar 

  59. A. Amadei, A.B.M. Linssen, and H.J.C. Berendsen. Essential dynamics of proteins. Proteins: Structure, Function, and Genetics, 17(4):412–425, 1993.

    Article  Google Scholar 

  60. K.D. Ball, R.S. Berry, R. Kunz, F-Y. Li, A. Proykova, and D.J. Wales. From topographies to dynamics on multidimensional potential energy surfaces of atomic clusters. Science, 271(5251):963–966, 1996.

    Article  Google Scholar 

  61. O. Becker and M. Karplus. The topology of multidimensional potential energy surfaces: Theory and application to peptide structure and kinetics. The Journal of Chemical Physics, 106(4):1495–1517, 1997.

    Article  Google Scholar 

  62. O.M. Becker. Principal coordinate maps of molecular potential energy surfaces. J. of Comp. Chem., 19(11):1255–1267, 1998.

    Article  Google Scholar 

  63. R. Stephen Berry, Nuran Elmaci, John P. Rose, and Benjamin Vekhter. Linking topography of its potential surface with the dynamics of folding of a proteinmodel. Proceedings of the National Academy of Sciences, 94(18):9520–9524, 1997.

    Article  Google Scholar 

  64. Robert B. Best and Gerhard Hummer. Chemical Theory and Computation Special Feature: Reaction coordinates and rates from transition paths. Proceedings of the National Academy of Sciences, 102(19):6732–6737, 2005.

    Article  Google Scholar 

  65. P.G. Bolhuis, D. Chandler, C. Dellago, and P.L. Geissler. Transition path sampling: Throwing ropes over rough mountain passes, in the dark. Annual review of physical chemistry, 53:291–318, 2002.

    Article  Google Scholar 

  66. P.G. Bolhuisdagger, C. Dellago, and D. Chandler. Reaction coordinates of biomolecular isomerization. PNAS, 97(11):5877–5882, 2000.

    Article  Google Scholar 

  67. C.L. Brooks, J. Onuchic, and D.J. Wales. Statistical thermodynamics: taking a walk on a landscape. Science, 293(5530):612 – 613, 2001.

    Article  Google Scholar 

  68. L. Chavez, J.N. Onuchic, and C. Clementi. Quantifying the roughness on the free energy landscape: Entropic bottlenecks and protein folding rates. J. Am. Chem. Soc., 126(27):8426–8432, 2004.

    Article  Google Scholar 

  69. Samuel S. Cho, Yaakov Levy, and Peter G. Wolynes. P versus Q: Structural reaction coordinates capture protein folding on smooth landscapes. Proceedings of the National Academy of Sciences, 103(3):586–591, 2006.

    Article  Google Scholar 

  70. P. Das, M. Moll, H. Stamati, L. Kavraki, and C. Clementi. Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction. PNAS, 103(26):9885–9890, 2006.

    Article  Google Scholar 

  71. Payel Das, Corey J. Wilson, Giovanni Fossati, Pernilla Wittung-Stafshede, Kathleen S. Matthews, and Cecilia Clementi. Characterization of the folding landscape of monomeric lactose repressor: Quantitative comparison of theory and experiment. Proceedings of the National Academy of Sciences, 102(41):14569–14574, 2005.

    Article  Google Scholar 

  72. R. Du, V. Pande, A.Y. Grosberg, T. Tanaka, and E.I. Shakhnovich. On the transition coordinate for protein folding. J. Chem. Phys., 108(1):334–350, 1998.

    Article  Google Scholar 

  73. R.L. Dunbrack. Rotamer libraries in the 21st century. Curr. Opin. Struct. Biol., 12(4):431–440, 2002.

    Article  Google Scholar 

  74. H.A. Scheraga et al. A united-residue force field for off-lattice protein-structure simulations. i. functional forms and parameters of long-range side-chain interaction potentials from protein crystal data. J. of Computational Chemistry, 18(7):849–873, 1997.

    Article  Google Scholar 

  75. A. Fersht. Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding. 1999.

    Google Scholar 

  76. A.T. Fomenko and T.L. Kunii. Topological Modeling for visualization. Springer, 1997.

    Google Scholar 

  77. D. Frenkel and B. Smit. Understanding molecular simulation. Academic Press, 2002.

    Google Scholar 

  78. A.E. Garcia. Large-amplitude nonlinear motions in proteins. Physical Review Letters, 68(17):2696–2699, 1992.

    Article  Google Scholar 

  79. D. Gfeller, P. De Los Rios, A. Caflisch, and F. Rao. Complex network analysis of free-energy landscapes. Proceedings of the National Academy of Sciences, 104(6):1817–1822, 2007.

    Article  Google Scholar 

  80. Nobuhiro Go and Hiroshi Taketomi. Respective Roles of Short- and Long-Range Interactions in Protein Folding. Proceedings of the National Academy of Sciences, 75(2):559–563, 1978.

    Article  Google Scholar 

  81. Isaac A. Hubner, Eric J. Deeds, and Eugene I. Shakhnovich. Understanding ensemble protein folding at atomic detail. Proceedings of the National Academy of Sciences, 103(47):17747–17752, 2006.

    Article  Google Scholar 

  82. G. Hummer. From transition paths to transition states and rate coefficients. J. Chemical Physics, 120(2), 2004.

    Google Scholar 

  83. T. Ichiye and M. Karplus. Collective motions in proteins: A covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations. Proteins: Structure, Function, and Genetics, 11(3):205–217, 1991.

    Article  Google Scholar 

  84. C. L Brooks III, M. Gruebele, J. Onuchic, and P. Wolynes. Chemical physics of protein folding. Proceedings of the National Academy of Sciences, 95(19):11037–11038, 1998.

    Article  Google Scholar 

  85. S.E. Jackson. How do small single-domain proteins fold? Fold Des., 3(4):R81–91, 1998.

    Article  Google Scholar 

  86. J. Janin, S. Wodak, M. Levitt, and B. Maigret. Conformations of amino acid side chains in proteins. J. Mol. Biol., 125:357–386, 1978.

    Article  Google Scholar 

  87. T. Komatsuzaki, K. Hoshino, Y. Matsunaga, G.J. Rylance, R.L. Johnston, and D. Wales. How many dimensions are required to approximate the potential energy landscape of a model protein? J. Chem. Phys., 122, February 2005.

    Google Scholar 

  88. R.E. Kunz and R.S. Berry. Statistical interpretation of topographies and dynamics of multidimensional potentials. J. Chem. Phys., 103:1904–1912, August 1995.

    Article  Google Scholar 

  89. O.F. Lange and H Grubmller. Generalized correlation for biomolecular dynamics. Proteins, 62:1053–1061, 2006.

    Article  Google Scholar 

  90. C. Levinthal. Are there pathways for protein folding? Journal de Chimie Physique et de Physico-Chimie Biologique, 65:44–45, 1968.

    Google Scholar 

  91. John W. Milnor. Morse Theory. Princeton University Press, Princeton, NJ, 1963.

    MATH  Google Scholar 

  92. E. Paci, M. Vendruscolo, and M. Karplus. Native and non-native interactions along protein folding and unfolding pathways. Proteins, 47(3):379–392, 2002.

    Article  Google Scholar 

  93. J. Palis and W. de Melo. Geometric Theory of Dynamical Systems. Springer, 1982.

    Google Scholar 

  94. M. Pettini. Geometry and Topology in Hamiltonian Dynamics and Statistical Mechanics. Springer, 2007.

    Google Scholar 

  95. E. Plaku, H. Stamati, C. Clementi, and L.E. Kavraki. Fast and reliable analysis of molecular motion using proximity relations and dimensionality reduction. Proteins: Structure, Function, and Bioinformatics, 67(4):897–907, 2007.

    Article  Google Scholar 

  96. G. Rylance, R. Johnston, Y. Matsunaga, C-B Li A. Baba, and T. Komatsuzaki. Topographical complexity of multidimensional energy landscapes. PNAS, 103(49):18551–18555, 2006.

    Article  Google Scholar 

  97. M. Tirion. Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Phys. Rev. Lett., 77:1905–1908, 1996.

    Article  Google Scholar 

  98. Monique M. Tirion. Large amplitude elastic motions in proteins from a singleparameter, atomic analysis. Phys. Rev. Lett., 77(9):1905–1908, Aug 1996.

    Article  Google Scholar 

  99. D.J. Wales. Energy Landscapes. Cambridge University Press, 2003.

    Google Scholar 

  100. L. Yang, G. Song, and R. Jernigan. Comparison of experimental and computed protein anisotropic temperature factors. In IEEE Bioinformactics and biomedecine workshop, 2007.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frédéric Cazals .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag New York

About this paper

Cite this paper

Cazals, F., Chazal, F., Giesen, J. (2009). Spectral Techniques to Explore Point Clouds in Euclidean Space, with Applications to Collective Coordinates in Structural Biology. In: Emiris, I., Sottile, F., Theobald, T. (eds) Nonlinear Computational Geometry. The IMA Volumes in Mathematics and its Applications, vol 151. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-0999-2_1

Download citation

Publish with us

Policies and ethics