Advertisement

Data Dimensionality Estimation: Achievements and Challanges

  • Francesco CamastraEmail author
Conference paper
  • 869 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7627)

Abstract

Dimensionality Reduction methods are effective preprocessing techniques that clustering algorithms can use for coping with high dimensionality. Dimensionality Reduction methods have the aim of projecting the original data set of dimensionality d, minimizing information loss, onto a lower M-dimensional submanifold. Since the value of M is unknown, techniques that allow knowing in advance the value of M, called intrinsic dimension (ID), are quite useful. The aim of the paper is to make the state-of-art of the methods of intrinsic dimensionality estimation, underlining the achievements and the challanges.

Keywords

Intrinsic Dimension Correlation Dimension Surrogate Data Voronoi Tesselation Dimensionality Reduction Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

Firstly, the author wish to thank Mario Valle for having made public Crystal Fingerspace datasets and commenting on the draft. The author thanks the anonymous referees for their useful remarks.

References

  1. 1.
    Aguirre, L., Rodrigues, G., Mendes, E.: Nonlinear identification and cluster analysis of chaotic attractors from a real implementation of chua’s circuit. Int. J. Bifurcat. Chaos 6(7), 1411–1423 (1997)CrossRefzbMATHGoogle Scholar
  2. 2.
    Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)CrossRefzbMATHGoogle Scholar
  3. 3.
    Bennett, R.S.: The intrinsic dimensionality of signal collections. IEEE Trans. Inf. Theory 15, 517–525 (1969)CrossRefzbMATHGoogle Scholar
  4. 4.
    Bishop, C.: Neural Networks for Pattern Recognition. Cambridge University Press, Cambridge (1995)zbMATHGoogle Scholar
  5. 5.
    Bruske, J., Sommer, G.: Intrinsic dimensionality estimation with optimally topology preserving maps. IEEE Trans. Pattern Anal. Mach. Intel. 20(5), 572–575 (1998)CrossRefGoogle Scholar
  6. 6.
    Camastra, F., Filippone, M.: A comparative evaluation of nonlinear dynamics methods for time series prediction. Neural Comput. Appl. 18(8), 1021–1029 (2009)CrossRefGoogle Scholar
  7. 7.
    Camastra, F., Vinciarelli, A.: Intrinsic dimension estimation of data: an approach based on grassberger-procaccia’s algorithm. Neural Process. Lett. 14(1), 27–34 (2001)CrossRefzbMATHGoogle Scholar
  8. 8.
    Camastra, F., Vinciarelli, A.: Estimating the intrinsic dimension of data with a fractal-based method. IEEE Trans. Pattern Anal. Mach. Intel. 24(10), 1404–1407 (2002)CrossRefGoogle Scholar
  9. 9.
    Carter, K., Raich, R., Hero, A.: On local intrinsic dimension estimation and its application. IEEE Trans. Sig. Process. 58(2), 650–663 (2010)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Chua, L., Komuro, M., Matsumoto, T.: The double scroll. IEEE Trans. Circuits Syst. 32(8), 797–818 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Costa, J., Hero, A.: Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans. Sig. Process. 52(8), 2210–2221 (2004)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)zbMATHGoogle Scholar
  13. 13.
    Eckmann, J.P., Ruelle, D.: Ergodic theory of chaos and strange attractors. Rev. Mod. Phys. 57, 617–659 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Eckmann, J.P., Ruelle, D.: Fundamental limitations for estimating dimensions and lyapounov exponents in dynamical systems. Physica D–56, 185–187 (1992)zbMATHGoogle Scholar
  15. 15.
    Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall, New York (1993)CrossRefzbMATHGoogle Scholar
  16. 16.
    Fukunaga, K.: Intrinsic dimensionality extraction. In: Krishnaiah, P.R., Kanal, L.N. (eds.) Classification, Pattern Recognition and Reduction of Dimensionality. Handbook of Statistics, pp. 347–360. North Holland, Amsterdam (1982)CrossRefGoogle Scholar
  17. 17.
    Fukunaga, K., Olsen, D.: An algorithm for finding intrinsic dimensionality of data. IEEE Trans. Comput. C–20(2), 176–183 (1971)CrossRefzbMATHGoogle Scholar
  18. 18.
    Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica D 9, 189–208 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Hausdorff, F.: Dimension und äusseres mass. Mathematische Annalen 79, 57 (1918)CrossRefGoogle Scholar
  20. 20.
    Hein, M., Audibert, J.Y.: Intrinsic dimensionality estimation of submanifolds in \(\mathbb{R}^d\). In: ICML 2005 Proceedings of \(22^{nd}\) International Conference on Machine Learning, pp. 289–296 (2005)Google Scholar
  21. 21.
    Heyting, A., Freudenthal, H.: Collected Works of L.E.J Brouwer. North Holland Elsevier, Amsterdam (1975)Google Scholar
  22. 22.
    Hoeffding, W.: A class of statistics with asymptotically normal distributions. Ann. Stat. 19, 293–325 (1948)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Hubner, U., Weiss, C., Abraham, N., Tang, D.: Lorentz-like chaos in nh\(_3\)-fir lasers. In: Gershenfeld, N.A., Weigend, S.A. (eds.) Time Series Prediction: Forecasting the Future and Understanding the Past, pp. 73–104. Addison Wesley, Reading (1994)Google Scholar
  25. 25.
    Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice Hall, New Jersey (1988)zbMATHGoogle Scholar
  26. 26.
    Jollife, I.T.: Principal Component Analysis. Springer, New York (1986)CrossRefGoogle Scholar
  27. 27.
    Kaplan, D., Glass, L.: Understanding Nonlinear Dynamics. Springer, New York (1995)CrossRefzbMATHGoogle Scholar
  28. 28.
    Karhunen, J., Joutsensalo, J.: Representations and separation of signals using nonlinear PCA type learning. Neural Netw. 7(1), 113–127 (1994)CrossRefGoogle Scholar
  29. 29.
    Kégl, B.: Intrinsic dimension estimation using packing numbers. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Advances in Neural Information Processing. MIT Press, Cambridge (2003)Google Scholar
  30. 30.
    Kirby, M.: Geometric Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Patterns. Wiley, New York (2001)zbMATHGoogle Scholar
  31. 31.
    Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1–27 (1964)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Levina, E., Bickel, P.: Maximum likelihood estimation of intrinsic dimension. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing, pp. 777–784. MIT Press, Cambridge (2005)Google Scholar
  33. 33.
    Linde, Y., Buzo, A., Gray, R.: An algorithm for vector quantizer design. IEEE Trans. Commun. 28(1), 84–95 (1980)CrossRefGoogle Scholar
  34. 34.
    Little, A., Jung, Y.M., Maggioni, M.: Multiscale estimation of intrinsic dimensionality of a data set. In: Manifold Learning and Its Applications: Papers from the AAAI Fall Symposium, pp. 26–33. IEEE (2009)Google Scholar
  35. 35.
    Lombardi, G., Rozza, A., Ceruti, C., Casiraghi, E., Campadelli, P.: Minimum neighbor distance estimators of intrinsic dimension. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part II. LNCS, vol. 6912, pp. 374–389. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  36. 36.
    MacKay, D., Ghamarani, Z.: Comments on ‘Maximum likelihood estimation of intrinsic dimension by E. Levina and M. Bickel’, University of Cambridge (2005). http://inference.phy.cam.uc.uk/mackay/dimension
  37. 37.
    Malthouse, E.C.: Limitations of nonlinear PCA as performed with generic neural networks. IEEE Trans. Neural Netw. 9(1), 165–173 (1998)CrossRefGoogle Scholar
  38. 38.
    Mandelbrot, B.: Fractals: Form, Chance and Dimension. Freeman, San Francisco (1977) zbMATHGoogle Scholar
  39. 39.
    Martinetz, T., Schulten, K.: Topology representing networks. Neural Netw. 3, 507–522 (1994)CrossRefGoogle Scholar
  40. 40.
    Oganov, A., Valle, M.: How to quantify energy landscapes of solids. J. Chem. Phys. 130, 104504 (2009)CrossRefGoogle Scholar
  41. 41.
    Ott, E.: Chaos in Dynamical Systems. Cambridge University Press, Cambridge (1988)zbMATHGoogle Scholar
  42. 42.
    Pettis, K., Bailey, T., Jain, T., Dubes, R.: An intrinsic dimensionality estimator from near-neighbor information. IEEE Trans. Pattern Anal. Mach. Intel. 1(1), 25–37 (1979)CrossRefzbMATHGoogle Scholar
  43. 43.
    Pineda, F., Sommerer, J.: Estimating generalized dimensions and choosing time delays: a fast algorithm. In: Weigend, S., Gershenfeld, N.A. (eds.) Time Series Prediction: Forecasting the Future and Understanding the Past, pp. 367–385. Addison Wesley, Reading (1994)Google Scholar
  44. 44.
    Romney, A.K., Shepard, R.N., Nerlove, S.B.: Multidimensionaling Scaling, vol. I. Theory. Seminar Press, New York (1972)Google Scholar
  45. 45.
    Roweis, S., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(12), 2323–2326 (2000)CrossRefGoogle Scholar
  46. 46.
    Rozza, A., Lombardi, G., Rosa, M., Casiraghi, E., Campadelli, P.: IDEA: intrinsic dimension estimation algorithm. In: Maino, G., Foresti, G.L. (eds.) ICIAP 2011, Part I. LNCS, vol. 6978, pp. 433–442. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  47. 47.
    Sammon, J.W.J.: A nonlinear mapping for data structure analysis. IEEE Trans. Comput. C–18, 401–409 (1969)CrossRefGoogle Scholar
  48. 48.
    Smith, R.: Optimal estimation of fractal dimension. In: Casdagli, M., Eubank, S. (eds.) Nonlinear Modeling and Forecasting, pp. 115–135. Addison Wesley, New York (1992)Google Scholar
  49. 49.
    Snyder, D.: Random Point Processes. Wiley, New York (1975)zbMATHGoogle Scholar
  50. 50.
    Takens, F.: On the numerical determination of the dimension of an attractor. In: Dynamical Systems and Bifurcations, Proceedings Groningen 1984, pp. 99–106. Addison Wesley (1985)Google Scholar
  51. 51.
    Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(12), 2319–2323 (2000)CrossRefGoogle Scholar
  52. 52.
    Theiler, J.: Lacunarity in a best estimator of fractal dimension. Phys. Lett. A 133, 195–200 (1988)MathSciNetCrossRefGoogle Scholar
  53. 53.
    Theiler, J.: Statistical precision of dimension estimators. Phys. Rev. A41, 3038–3051 (1990)CrossRefGoogle Scholar
  54. 54.
    Theiler, J., Eubank, S., Longtin, A., Galdrikian, B., Farmer, J.D.: Testing for nonlinearity in time series: the method for surrogate date. Physica D 58, 77–94 (1992)CrossRefzbMATHGoogle Scholar
  55. 55.
    Trunk, G.V.: Statistical estimation of the intrinsic dimensionality of a noisy signal collection. IEEE Trans. Comput. 25, 165–171 (1976)MathSciNetCrossRefzbMATHGoogle Scholar
  56. 56.
    Valle, M., Oganov, A.: Crystal fingerprint space- a novel paradigm for studying crystal-structure sets. Acta Crystallogr. Sect. A A66, 507–517 (2010)CrossRefGoogle Scholar
  57. 57.
    Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  58. 58.
    Verveer, P.J., Duin, R.: An evaluation of intrinsic dimensionality estimators. IEEE Trans. Pattern Anal. Mach. Intel. 17(1), 81–86 (1995)CrossRefGoogle Scholar
  59. 59.
    Villmann, T., Claussen, J.C.: Magnification control in self-organizing maps and neural gas. Neural Comput. 18(2), 446–469 (2000)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Department of Science and TechnologyUniversity of Naples ParthenopeNaplesItaly

Personalised recommendations