Feature Extraction Methods and Manifold Learning Methods

  • Francesco CamastraEmail author
  • Alessandro Vinciarelli
Part of the Advanced Information and Knowledge Processing book series (AI&KP)


What the reader needs to understand this chapter. \(\bullet \) Notions of calculus. \(\bullet \) The fourth chapter.


  1. 1.
    P. Baldi and K. Hornik. Neural networks and principal component analysis: learning from examples without local minima. Neural Networks, 2(1):53–58, 1989.Google Scholar
  2. 2.
    A.R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3):930–945, 1993.Google Scholar
  3. 3.
    J. Beardwood, J. H. Halton, and Hammersley. The shortest path through many points. Proc. Cambridge Philo. Soc., 55:299–327, 1959.Google Scholar
  4. 4.
    M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.Google Scholar
  5. 5.
    A. Bell and T. Sejnowski. An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6):1129–1159, 1995.Google Scholar
  6. 6.
    R. Bellman. Adaptive Control Processes: A Guided Tour. Princeton University Press, 1961.Google Scholar
  7. 7.
    C. Bishop. Neural Networks for Pattern Recognition. Cambridge University Press, 1995.Google Scholar
  8. 8.
    C. M. Bishop. Bayesian pca. In Advances in Neural Information Processing Systems, pages 382–388. MIT Press, 1998.Google Scholar
  9. 9.
    C. Bouveyron, G. Celeux, and S. Girard. Intrinsic dimension estimation by maximum likelihood in isotropic probabilistic pca. Pattern Recognition Letters, 32:1706–1713, 2011.Google Scholar
  10. 10.
    M. Brand. Charting a manifold. In Advances in Neural Information Processing, pages 961–968. MIT Press, 2003.Google Scholar
  11. 11.
    L. Breiman. Hinging hyperplanes for regression, classification, and function approximation. IEEE Transactions on Information Theory, 39(3):999–1013, 1993.Google Scholar
  12. 12.
    J. Bruske and G. Sommer. Intrinsic dimensionality estimation with optimally topology preserving maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(5):572–575, May 1998.Google Scholar
  13. 13.
    F. Camastra and A. Vinciarelli. Intrinsic dimension estimation of data: An approach based on Grassberger-Procaccia’s algorithm. Neural Processing Letters, 14(1):27–34, 2001.Google Scholar
  14. 14.
    F. Camastra and A. Vinciarelli. Estimating the intrinsic dimension of data with a fractal-based method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(10):1404–1407, October 2002.Google Scholar
  15. 15.
    J.-F. Cardoso and B. Laheld. Equivalent adaptive source separation. IEEE Transactions on on Signal Processing, 44(12):3017–3030, 1996.Google Scholar
  16. 16.
    G. Cayton. Algorithms for manifold learning. Technical report, Computer Science and Engineering department, University of California, San Diego, 2005.Google Scholar
  17. 17.
    C. L. Chang and R. C. T. Lee. A heuristic relaxation method for nonlinear mapping in cluster analysis. IEEE Transactions on Computers, C-23:178–184, February 1974.Google Scholar
  18. 18.
    P. Comon. Independent component analysis - a new concept ? Signal Processing, 36:287–314, 1994.Google Scholar
  19. 19.
    T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, 1990.Google Scholar
  20. 20.
    J. Costa and A. O. Hero. Geodetic entropic graphs for dimension and entropy dimension in manifold learning. IEEE Transactions on Signal Processing, 52(8):2210–2221, 2004.Google Scholar
  21. 21.
    T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley & Sons, 1991.Google Scholar
  22. 22.
    P. Demartines and J. Herault. Curvilinear component analysis: A self-organizing neural network for nonlinear mapping in cluster analysis. IEEE Transactions on Neural Networks, 8(1):148–154, January 1997.Google Scholar
  23. 23.
    R. A. DeVore. Degree of nonlinear approximation. In Approximation Theory, Vol. VI, pages 175–201. Academic Press, 1991.Google Scholar
  24. 24.
    R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley, 2001.Google Scholar
  25. 25.
    J. P. Eckmann and D. Ruelle. Ergodic theory of chaos and strange attractors. Review of Modern Physics, 57(3):617–659, 1985.Google Scholar
  26. 26.
    J. P. Eckmann and D. Ruelle. Fundamental limitations for estimating dimensions and lyapounov exponents in dynamical systems. Physica, D-56:185–187, 1992.Google Scholar
  27. 27.
    B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. Chapman and Hall, 1993.Google Scholar
  28. 28.
    J. Einbeck and Z. Kalantana. Intrinsic dimensionality estimation for high-dimensional data sets: New approaches for the computation of correlation dimension. Journal of Emerging Technologies in Web Intelligence, 5(2):91–97, 2013.Google Scholar
  29. 29.
    M. Fan, H. Qiao, and B. Zhang. Intrinsic dimension estimation of manifolds by incising balls. Pattern Recognition, 42:780–787, 2009.Google Scholar
  30. 30.
    M. Fan, X. Zhang, S. Chen, H. Bao, and S. Maybank. Dimension estimation of image manifolds by minimal cover approximation. Neurocomputing, 105:19–29, April 2013.Google Scholar
  31. 31.
    A. M. Farahmand, C. Szepesvari, and J-Y. Audibert. Manifold-adaptive dimension estimation. In Proceedings of the \(\mathit{24}^{th}\) International Conference on Machine Learning, pages 265–272, 2007.Google Scholar
  32. 32.
    R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179–188, 1936.Google Scholar
  33. 33.
    G. S. Fishman. Monte Carlo: Concepts, Algorithms, and Applications. Springer-Verlag, 1996.Google Scholar
  34. 34.
    D. Fotheringhame and R. J. Baddeley. Nonlinear principal component analysis of neuronal spike train data. Biological Cybernetics, 77(4):282–288, 1997.Google Scholar
  35. 35.
    D. Freedman. Efficient simplicial reconstructions of manifolds from their samples. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(10):1349–1357, October 2002.Google Scholar
  36. 36.
    J. H. Friedman. Exploratory projection pursuit. Journal of the American Statistical Association, 82(397):249–260, 1987.Google Scholar
  37. 37.
    J. H. Friedman and J. W. Tukey. A projection pursuit algorithm for expoloratory data analysis. IEEE Transactions on Computers, C-23(9):881–890, 1974.Google Scholar
  38. 38.
    K. Fukunaga. Intrinsic dimensionality extraction. In Classification, Pattern Recognition and Reduction of Dimensionality, Vol. 2 of Handbook of Statistics, pages 347–362. North Holland, 1982.Google Scholar
  39. 39.
    K. Fukunaga and D. R. Olsen. An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, 20(2):165–171, 1976.Google Scholar
  40. 40.
    F. Girosi. Regularization theory, radial basis functions and networks. In From Statistics to Neural Networks, pages 166–187, Springer-Verlag, 1994.Google Scholar
  41. 41.
    F. Girosi and G. Anzellotti. Rates of convergence of approximation by translates. Technical report, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 1993.Google Scholar
  42. 42.
    P. Grassberger and I. Procaccia. Measuring the strangeness of strange attractors. Physica, D 9(1–2):189–208, 1983.Google Scholar
  43. 43.
    Y. Guan and J. G. Dy. Sparse probabilistic principal component analysis. Journal of Machine Learning Research - Proceedings Track, 5:185–192, 2009.Google Scholar
  44. 44.
    M. D. Gupta and T. S. Huang. Regularized maximum likelihood for intrinsic dimension estimation. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010), pages 220–227, 2010.Google Scholar
  45. 45.
    F. Hausdorff. Dimension und äusseres mass. Math. Annalen, 79(1–2):157–179, 1918.Google Scholar
  46. 46.
    M. Hein and J.-Y. Audibert. Intrinsic dimensionality estimation of submanifolds in\({\mathbb{R}}^d\). In ICML’ 05 Proc. of \(\mathit{22}^{nd}\) international conference on Machine Learning, pages 289–296, 2005.Google Scholar
  47. 47.
    A. Heyting and H. Freudenthal. Collected Works of L.E.J Brouwer. North Holland Elsevier, 1975.Google Scholar
  48. 48.
    W. Hoeffding. A class of statistics with asymptotically normal distributions. Annals of Statistics, 19:293–325, 1948.Google Scholar
  49. 49.
    W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of American Statistical Association, 58:13–30, 1963.Google Scholar
  50. 50.
    P. Huber. Projection pursuit. The Annals of Statistics, 13(2):435–475, 1985.Google Scholar
  51. 51.
    U. Hübner, C. O. Weiss, N. B. Abraham, and D. Tang. Lorenz-like chaos in nh\(_3\)-fir lasers. In Time Series Prediction. Forecasting the Future and Understanding the Past, pages 73–104. Addison Wesley, 1994.Google Scholar
  52. 52.
    A. Hyvärinen. New approximations of differential entropy for independent component analysis and projection pursuit. In Advances in Neural Information Processing Systems 10, pages 273–279. MIT Press, 1998.Google Scholar
  53. 53.
    A. Hyvärinen. The fixed-point algorithm and maximum likelihood for independent component analysis. Neural Processing Letters, 10(1):1–5, 1999.Google Scholar
  54. 54.
    A. Hyvärinen and E. Oja. A fast fixed-point algorithm for independent component analysis. Neural Computation, 9(7):1483–1492, 1997.Google Scholar
  55. 55.
    A. Hyvärinen and E. Oja. Independent component analysis: Algorithms and applications. Neural Networks, 13(4–5):411–430, 2000.Google Scholar
  56. 56.
    A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.Google Scholar
  57. 57.
    I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, 1986.Google Scholar
  58. 58.
    L. K. Jones. A simple lemma on greedy approximation in hilbert space and convergence rates for projection pursuit regression and neural network training. Journal of the Royal Statistical Society, 20(1):608–613, March 1992.Google Scholar
  59. 59.
    P. W. Jones. Rectifiable sets and the traveling salesman problem. Inventiones Mathematicae, 102:1–15, April 1990.Google Scholar
  60. 60.
    C. Jutten and J. Herault. Blind separation of sources, part i: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24(1):1–10, 1991.Google Scholar
  61. 61.
    D. Kaplan and L. Glass. Understanding Nonlinear Dynamics. Springer-Verlag, 1995.Google Scholar
  62. 62.
    J. Karhunen and J. Joutsensalo. Representations and separation of signals using nonlinear pca type learning. Neural Networks, 7(1):113–127, 1994.Google Scholar
  63. 63.
    J. Karhunen, E. Oja, L. Wang, R. Vigario, and J. Joutsensalo. A class of neural networks for independent component analysis. IEEE Transactions on Neural Networks, 8(3):486–504, 1997.Google Scholar
  64. 64.
    B. Kégl. Intrinsic dimension estimation using packing numbers. In Advances in Neural Information Processing 15, pages 681–688. MIT Press, 2003.Google Scholar
  65. 65.
    M. Kirby. Geometric Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Patterns. John Wiley and Sons, 2001.Google Scholar
  66. 66.
    T. Kohonen. Self-Organizing Map. Springer-Verlag, 1995.Google Scholar
  67. 67.
    G. A. Korn and T. M. Korn. Mathematical Handbook for Scientists and Engineers. Dover Publications, 1961.Google Scholar
  68. 68.
    J. B. Kruskal. On the shortest spanning subtree of a graph and the travelling salesman problem. Proceedings of the American Mathematical Society, 7:48–50, 1956.Google Scholar
  69. 69.
    J. B. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1):1–27, 1964.Google Scholar
  70. 70.
    J. B. Kruskal. Comments on a nonlinear mapping for data structure analysis. IEEE Transaction on Computers, C-20:1614, December 1971.Google Scholar
  71. 71.
    J. B. Kruskal. Linear transformation of multivariate data to reveal clustering. In Multidimensional Scaling, vol. I, pages 101–115. Academic Press, 1972.Google Scholar
  72. 72.
    J. B. Kruskal and J. D. Carroll. Geometrical models and badness-of-fit functions. In Multivariate Analisys, vol. 2, pages 639–671. Academic Press, 1969.Google Scholar
  73. 73.
    J. M. Lee. Riemannian Manifolds: An Introduction to Curvature. Springer-Verlag, 1997.Google Scholar
  74. 74.
    E. Levina and P. Bickel. Maximum likelihood estimation of intrinsic dimension. In Advances in Neural Information Processing 17, pages 777–784. MIT Press, 2005.Google Scholar
  75. 75.
    J. Li and D. Tao. Simple exponential family PCA. In Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), pages 453–460, 2010.Google Scholar
  76. 76.
    T. Lin and H. Zha. Riemannian manifold learning. Ieee Transactions on Pattern Analysis and Machine Intelligence, 30(5):796–809, May 2008.Google Scholar
  77. 77.
    Y. Linde, A. Buzo, and R. Gray. An algorithm for vector quantizer design. IEEE Transaction on Communications, 28(1):84–95, 1980.Google Scholar
  78. 78.
    A.V. Little, Y.-M. Jung, and M. Maggioni. Multiscale estimation of intrinsic dimensionality of a data set. In Manifold Learning and its Applications: papers from the AAAI Fall Symposium, pages 26–33. IEEE, 2009.Google Scholar
  79. 79.
    S. P. Lloyd. An algorithm for vector quantizer design. IEEE Transaction on Communications, 28(1):84–95, 1982.Google Scholar
  80. 80.
    G. G. Lorentz. Approximation of Functions. Chelsea Publishing, 1986.Google Scholar
  81. 81.
    E. N. Lorenz. Deterministic non-periodic flow. Journal of Atmospheric Science, 20:130–141, 1963.Google Scholar
  82. 82.
    D. J. C. Mac Kay. Probable networks and plausible prediction - a review of practical bayesian methods for supervised neural networks. Network: Computation in Neural Systems, 6(3):469–505, 1995.Google Scholar
  83. 83.
    D.J.C. MacKay and Z. Ghamarani. Comments on ’maximum likelihood estimation of intrinsic dimension by E. Levina and M.Bickel’. University of Cambridge,, 2005
  84. 84.
    E. C. Malthouse. Limitations of nonlinear pca as performed with generic neural networks. IEEE Transaction on Neural Networks, 9(1):165–173, 1998.Google Scholar
  85. 85.
    B. Mandelbrot. Fractals: Form, Chance and Dimension. Freeman, 1977.Google Scholar
  86. 86.
    T. Martinetz and K. Schulten. Topology representing networks. Neural Networks, 7(3):507–522, 1994.Google Scholar
  87. 87.
    B. Mohar. Laplace eigenvalues of graphs: a survey. Discrete Mathematics, 109(1–3):171–183, 1992.Google Scholar
  88. 88.
    P. Mordohai and G. Medioni. Dimensionality estimation, manifold learning and function approximation using tensor voting. Journal of Machine Learning Research, 11:410–450, 2010.Google Scholar
  89. 89.
    J.-P. Nadal and N. Parga. Non-linear neurons in the low noise limit: a factorial code maximizes information transfer. Networks, 5(4):565–581, 1994.Google Scholar
  90. 90.
    E. Ott. Chaos in Dynamical Systems. Cambridge University Press, 1993.Google Scholar
  91. 91.
    B. A. Pearlmutter and L. C. Parra. Maximum likelihood blind source separation: A context-sensitive generalization of ica. In Advances in Neural Information Processing 9, pages 613–619. MIT Press, 1997.Google Scholar
  92. 92.
    K. Pettis, T. Bailey, T. Jain, and R. Dubes. An intrinsic dimensionality estimator from near-neighbor information. IEEE Transaction on Pattern Analysis and Machine Intelligence, 1(1):25–37, 1979.Google Scholar
  93. 93.
    D.-T. Pham, P. Garrat, and C. Jutten. Separation of a mixture of independent sources through a maximum likelihood approach. In Proceeding EUSIPCO92, pages 771–774, 1992.Google Scholar
  94. 94.
    W. H. Press, B. P. Flannery, S. A. Teulkosky, and W. T. Vetterling. Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, 1989.Google Scholar
  95. 95.
    R. C. Prim. Shortest connection networks and some generalizations. Bell System Technical Journal, 36:1389–1401, 1957.Google Scholar
  96. 96.
    M. Raginsky and S. Lazebnik. Estimation of intrinsic dimensionality using high-rate vector quantization. In Advances in Neural Information Processing, pages 1105–1112. MIT Press, 2006.Google Scholar
  97. 97.
    A. K. Romney, R. N. Shepard, and S. B. Nerlove. Multidimensionaling Scaling, vol. 2, Applications. Seminar Press, 1972.Google Scholar
  98. 98.
    A. K. Romney, R. N. Shepard, and S. B. Nerlove. Multidimensionaling Scaling, vol. I, Theory. Seminar Press, 1972.Google Scholar
  99. 99.
    A. Rozza, G. Lombardi, M. Rosa, E. Casiraghi, and P. Campadelli. Idea: Intrinsic dimension estimation algorithm. In Image Analysis and Processing- ICIAP 2011, pages 433–442. Springer, 2011.Google Scholar
  100. 100.
    A. Rozza, G. Lombardi, C. Ceruti, E. Casiraghi, and P. Campadelli. Novel high intrinsic dimensionality estimators. Machine Learning, 89(1–2):37–65, October 2012.Google Scholar
  101. 101.
    O. Samko, A. D. Marshall, and P.L. Rosin. Selection of the optimal parameter value for the isomap algorithm. Pattern Recognition Letters, 27(9):968–979, 2006.Google Scholar
  102. 102.
    J. W. Jr. Sammon. A nonlinear mapping for data structure analysis. IEEE Transaction on Computers, C-18(5):401–409, May 1969.Google Scholar
  103. 103.
    L. K. Saul and S. Roweis. Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4:119–155, June 2003.Google Scholar
  104. 104.
    R. N. Shepard. The analysis of proximities: Multimensional scaling with an unknown distance function. Psychometrika, 27(3):219–246, June 1962.Google Scholar
  105. 105.
    R. N. Shepard. Representation of structure in similarity data problems and prospects. Psychometrika, 39(4):373–421, December 1974.Google Scholar
  106. 106.
    R. N. Shepard and J. D. Carroll. Parametric representation of nonlinear data structures. In Multivariate Analysis, pages 561–592. Academic Press, 1969.Google Scholar
  107. 107.
    R. L. Smith. Optimal estimation of fractal dimension. In Nonlinear Modeling and Forecasting, SFI Studies in the Sciences of Complexity, vol. XII, pages 115–135. Addison-Wesley, 1992.Google Scholar
  108. 108.
    D.L. Snyder. Random Point Processes. Wiley, New York, 1975.Google Scholar
  109. 109.
    F. Takens. On the numerical determination of the dimension of an attractor. In Dynamical Systems and Bifurcations, Proceedings Groningen 1984, pages 99–106. Springer-Verlag, 1984.Google Scholar
  110. 110.
    J. B. Tanenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(12):2319–2323, December 2000.Google Scholar
  111. 111.
    J. Theiler. Lacunarity in a best estimator of fractal dimension. Physics Letters, A 133(4–5):195–200, 1988.Google Scholar
  112. 112.
    J. Theiler. Statistical precision of dimension estimators. Physical Review, A41:3038–3051, 1990.Google Scholar
  113. 113.
    J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, and J. D. Farmer. Testing for nonlinearity in time series: the method for surrogate date. Physica, D58(1–4):77–94, 1992.Google Scholar
  114. 114.
    R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, SeriesB, 58:267–288, 1996.Google Scholar
  115. 115.
    M. E. Tipping and C. M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B(61, Part 3):611–622, 1997.Google Scholar
  116. 116.
    G. V Trunk. Statistical estimation of the intrinsic dimensionality of a noisy signal collection. IEEE Transaction on Computers, 25(2):165–171, 1976.Google Scholar
  117. 117.
    M. Valle and A.R. Oganov. Crystal fingerprint space- a novel paradigm for studying crystal-structure sets. Acta Crystallographica Section A, A66:507–517, September 2010.Google Scholar
  118. 118.
    P. J. Verveer and R. Duin. An evaluation of intrinsic dimensionality estimators. IEEE Transaction on Pattern Analysis and Machine Intelligence, 17(1):81–86, January 1995.Google Scholar
  119. 119.
    X. Wang and J.S. Marron. Intrinsic dimension estimation of manifolds by incising balls. Electronic Journal of Statistics, 2:127–148, 2008.Google Scholar
  120. 120.
    W. H. Wolberg and O. Mangasarian. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, U.S.A., 87(1):9193–9196, 1990.Google Scholar
  121. 121.
    X. Yang, S. Michea, and H. Zha. Conical dimension as an intrinsic dimension estimator and its applications. In Proceedings of the \(\mathit{7}^{th}\) SIAM International Conference on Data Mining, pages 169–179, 2007.Google Scholar
  122. 122.
    H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15:262–286, 2004.Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  1. 1.Department of Science and TechnologyParthenope University of NaplesNaplesItaly
  2. 2.School of Computing Science and the Institute of Neuroscience and PsychologyUniversity of GlasgowGlasgowUK

Personalised recommendations