Abstract
We are living in an increasingly data-dependent world - making sense of large, high-dimensional data sets is an important task for researchers in academia, industry, and government. Techniques from machine learning, namely nonlinear dimension reduction, seek to organize this wealth of data by extracting descriptive features. These techniques, though powerful in their ability to find compact representational forms, are hampered by their high computational costs. In their naive implementation, this prevents them from processing large modern data collections in a reasonable time or with modest computational means. In this summary article we shall discuss some of the important numerical techniques which drastically increase the computational efficiency of these methods while preserving much of their representational power. Specifically, we address random projections, approximate k-nearest neighborhoods, approximate kernel methods, and approximate matrix decomposition methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
A. Andoni, M. Datar, N. Immorlica, V.S. Mirrokni, P. Indyk, Locality-sensitive hashing using stable distributions, in Nearest Neighbor Methods in Learning and Vision: Theory and Practice (2006)
S. Arya, D.M. Mount. Approximate Nearest Neighbor Searching, Proceedings 4th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’93) (1993), pp. 271–280
S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, A.Y. Wu, An optimal algorithm for approximate nearest neighbor searching. J. ACM 45, 891–923 (1998)
S. Arya, D. Mount, N.S. Netanyahu, R. Silverman, A.Y. Wu, An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM, 45(6), 891–923 (1998)
C.M. Bachmann, T.L. Ainsworth, R.A. Fusina, Exploiting manifold geometry in hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 43(3), 441–454 (2005)
R.G. Baraniuk, More is less: signal processing and the data deluge. Science 331(6018), 717–719 (2011)
R. Baraniuk, M. Wakin, Random projections of smooth manifolds. Found. Comput. Math. 9(1), 51–77 (2009)
R. Baraniuk, M. Davenport, R. DeVore, M. Wakin. A simple proof of the restricted isometry property for random matrices. Constr. Approx. 28(3), 253–263 (2007)
J.S. Beis, D.G. Lowe, Shape indexing using approximate nearest-neighbour search in high-dimensional spaces, in Proceedings., 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1997 (IEEE, New York, 1997), pp. 1000–1006
M. Belabbas, P.J. Wolfe, On landmark selection and sampling in high-dimensional data analysis. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 367(1906), 4295–4312 (2009)
M. Belabbas, P.J. Wolfe, Spectral methods in machine learning and new strategies for very large datasets. Proc. Natl. Acad. Sci. 106(2), 369–374 (2009)
M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 1396, 1373–1396 (2003)
M. Belkin, P. Niyogi, Convergence of Laplacian Eigenmaps, Preprint, 2008
R.E. Bellman, Adaptive Control Processes: A Guided Tour, vol. 4 (Princeton University Press, Princeton, 1961)
J.J. Benedetto, W. Czaja, J. Dobrosotskaya, T. Doster, K. Duke, D. Gillis, Integration of heterogeneous data for classification in hyperspectral satellite imagery, in Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XVIII, Proceedings SPIE, vol. 8390 (International Society for Optics and Photonics, Bellingham, 2012), pp. 8390–8378
A. Beygelzimer, S. Kakade, J. Langford, Cover trees for nearest neighbor, in Proceedings of the 23rd International Conference on Machine learning (2006), pp. 97–104
N.D. Cahill, W. Czaja, D.W. Messinger, Schroedinger eigenmaps with nondiagonal potentials for spatial-spectral clustering of hyperspectral imagery, in SPIE Defense+ Security (International Society for Optics and Photonics, Bellingham, 2014), pp. 908804–908804
E. Candes, T. Tao, Decoding via linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005)
E. Candes, T. Tao. Near optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006)
E. Candes, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
E. Candes, J. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)
J. Chen, H. Fang, Y. Saad, Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. J. Mach. Learn. Res. 10, 1989–2012 (2009)
A. Cheriyadat, L.M. Bruce, Why principal component analysis is not an appropriate feature extraction method for hyperspectral data, in Geoscience and Remote Sensing Symposium, 2003. IGARSS’03. Proceedings. 2003 IEEE International, vol. 6 (IEEE, New York, 2003), pp. 3420–3422
M. Chi, A. Plaza, J. Benediktsson, Z. Sun, J. Shen, Y. Zhu, Big data for remote sensing: challenges and opportunities. Proc. IEEE 104, 2207–2219 (2015)
R.R. Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner, S. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc. Natl. Acad. Sci U.S.A. 102(21), 7426–7431 (2005)
T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms, 2nd edn. (The MIT Press, Cambridge, 2001)
W. Czaja, M. Ehler, Schroedinger eigenmaps for the analysis of bio-medical data. IEEE Trans. Pattern Anal. Mach. Intell. 35(5), 1274–1280 (2013)
W. Czaja, A. Hafftka, B. Manning, D. Weinberg. Randomized approximations of operators and their spectral decomposition for diffusion based embeddings of heterogeneous data, in 2015 3rd International Workshop on Compressed Sensing Theory and its Applications to Radar, Sonar and Remote Sensing (CoSeRa) (IEEE, New York, 2015), pp. 75–79
S. Dasgupta, A. Gupta. An elementary proof of the Johnson-Lindenstrauss lemma, Technical Report TR-99-006, Berkeley, CA (1999)
M. Datar, N. Immorlica, P. Indyk, V.S. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, in Symposium on Computational Geometry (2004)
V. De Silva, J.B. Tenenbaum, Sparse multidimensional scaling using landmark points. Technical Report, Stanford University (2004)
K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms, vol. 16 (Wiley, New York, 2001)
L.M. Delves, J.L. Mohamed, Computational Methods for Integral Equations (Cambridge University Press, Cambridge, 1988)
J. Deng, W. Dong, R. Socher, L. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database, in IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009 (IEEE, New York, 2009), pp. 248–255
A. Deshpande, L. Rademacher, S. Vempala, G. Wang, Matrix approximation and projective clustering via volume sampling, in Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm (ACM, New York, 2006), pp. 1117–1126
D.L. Donoho, For most large underdetermined systems of linear equations, the minimal L1-norm solution is also the sparsest solution. Commun. Pure Appl. Math. 59(6), 797–829 (2006)
D.L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
D. Donoho, C. Grimes, Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003)
T. Doster, Harmonic analysis inspired data fusion for applications in remote sensing, PhD thesis, University of Maryland, College Park (2014)
N.R. Draper, H. Smith, Applied Regression Analysis (Wiley, New York, 2014)
P. Drineas, M.W. Mahoney, On the Nyström method for approximating a gram matrix for improved kernel-based learning. J. Mach. Learn. Res. 6, 2153–2175 (2005)
M. Duarte, M. Davenport, D. Takhar, J. Laska, T. Sun, K. Kelly, R. Baraniuk, Single-pixel imaging via compressive sampling. IEEE Signal Process. Mag. 25(2), 83–91 (2008)
E.D. Feigelson, G.J. Babu, Big data in astronomy. Significance 9(4), 22–25 (2012)
C. Fowlkes, S. Belongie, F. Chung, J. Malik, Spectral grouping using the Nyström method. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 214–225 (2004)
J.H. Freidman, J.L. Bentley, R.A. Finkel, An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)
G.H. Givens, J.A. Hoeting, Computational Statistics, vol. 710 (Wiley, New York, 2012)
Y. Goldberg, A. Zakai, D. Kushnir, Y. Ritov, Manifold learning: The price of normalization. J. Mach. Learn. Res. 9, 1909–1939 (2008)
N. Halko, P. Martinsson, J.A. Tropp, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
A. Halevy, Extensions of Laplacian eigenmaps for manifold learning, PhD thesis, University of Maryland, College Park (2011)
X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang, Face recognition using Laplacianfaces. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 328–340 (2005)
P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in STOC (1998)
A. Jacobs, The pathologies of big data. Commun. ACM 52(8), 36–44 (2009)
A.K. Jain, Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
W.B. Johnson, J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, in Conference in modern analysis and probability (New Haven, Conn., 1982), ed. by R. Beals, A. Beck, A. Bellow, et al. Contemporary Mathematics, vol. 26 (American Mathematical Society, Providence, RI, 1984), pp. 189–206
S. Kumar, M. Mohri, A. Talwalkar, Sampling techniques for the Nyström method, in Conference on Artificial Intelligence and Statistics (2009), pp. 304–311
S. Kumar, M. Mohri, A. Talwalkar, Ensemble Nyström method, in Neural Information Processing Systems, vol. 7 (2009), p. 223
E. Kushilevitz, R. Ostrovsky, Y. Rabani, Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput. 30(2), 457–474 (2000)
T.J. Lane, D. Shukla, K.A. Beauchamp, V.S. Pande, To milliseconds and beyond: challenges in the simulation of protein folding. Curr. Opin. Struct. Biol. 23(1), 58–65 (2013)
Y. LeCun, C. Cortes, C. Burges The MNIST database of handwritten digits, (1998)
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015)
C. Lee, D.A. Landgrebe, Analyzing high-dimensional multispectral data. IEEE Trans. Geosci. Remote Sens. 31(4), 792–800 (1993)
J.A. Lee, M. Verleysen, Nonlinear Dimensionality Reduction. (Springer, New York, 2007)
M. Lustig, D.L. Donoho, J.M. Pauly, Sparse MRI: the application of compressed sensing for rapid MR imaging. Magn. Reson. Med. 58(6), 1182–1195 (2007)
L. Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
V. Marx, Biology: the big challenges of big data. Nature 498(7453), 255–260 (2013)
E.J. Nyström, Über die praktische auflösung von integralgleichungen mit anwendungen auf randwertaufgaben. Acta Math. 54(1), 185–204 (1930)
C.C. Olson, T. Doster, A parametric study of unsupervised anomaly detection performance in maritime imagery using manifold learning techniques, in SPIE Defense+ Security (International Society for Optics and Photonics, Bellingham, 2016), pp. 984016–984016
R. Paredes, E. Chavez, K. Figueroa, G. Navarro, Practical construction of k-nearest neighbor graphs in metric spaces, in Proceeding of 5th Workshop on Efficient and Experimental Algorithms (2006)
K. Pearson, On lines and planes of closest fit to systems of points in space. London, Edinburgh, Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901)
R. Ramakrishnan, P.O. Dral, M. Rupp, O.A. Lilienfeld, Big data meets quantum chemistry approximations: the δ-machine learning approach. J. Chem. Theory Comput. 11(5), 2087–2096 (2015)
V. Rokhlin, A. Szlam, M. Tygert, A randomized algorithm for principal component analysis. SIAM J. Matrix Anal. Appl. 31(3), 1100–1124 (2009)
S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
E.E. Schadt, M.D. Linderman, J. Sorenson, L. Lee, G.P. Nolan, Computational solutions to large-scale data management and analysis. Nat. Rev. Genet. 11(9), 647–657 (2010)
B. Scholkopf, S. Mika, Chris J. C. Burges, P. Knirsch, K. Muller, Gu. Ratsch, A.J. Smola, Input space versus feature space in kernel-based methods. IEEE Trans. Neural Netw. 10(5), 1000–1017 (1999)
B. Scholkopf, A.J. Smola, K. Müller, Kernel principal component analysis, in Advances in Kernel Methods-Support Vector Learning (1999)
D.W. Scott, J.R. Thompson, Probability density estimation in higher dimensions. in Computer Science and Statistics: Proceedings of the Fifteenth Symposium on the Interface, vol. 528 (1983), pp. 173–179
A. Smola, B. Scholkopf, Sparse greedy matrix approximation for machine learning, in International Conference on Machine Learning (2000), pp. 911–918
H. Steinhaus, Sur la division des corps matériels en parties. Bull. Acad. Polon. Sci. 4(12), 801–804 (1957)
W. Sun, A. Halevy, J. J. Benedetto, W. Czaja, C. Liu, H. Wu, B. Shi, W. Li, UL-isomap based nonlinear dimensionality reduction for hyperspectral imagery classification. ISPRS J. Photogramm. Remote Sens. 89, 25–36 (2014)
W. Sun, A. Halevy, J. J. Benedetto, W. Czaja, W. Li, C. Liu, B. Shi, R. Wang, Nonlinear dimensionality reduction via the ENH-LTSA method for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 7(2), 375–388 (2014)
R.S. Sutton, A.G. Barto, Introduction to Reinforcement Learning, vol. 135 (MIT Press, Cambridge, 1998)
D. Takhar, J. Laska, M. Wakin, M. Duarte, D. Baron, S. Sarvotham, K. Kelly, R. Baraniuk, A new compressive imaging camera architecture using optical-domain compression, in Computational Imaging IV at SPIE Electronic Imaging, San Jose, California (2006)
A. Talwalkar, S. Kumar, H. Rowley, Large-scale manifold learning, in IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008 (IEEE, New York, 2008), pp. 1–8
R. Talmon, S. Mallat, H. Zaveri, R.R. Coifman, Manifold learning for latent variable inference in dynamical systems. IEEE Trans. Signal Process. 63(15), 3843–3856 (2015)
C.K.I. Williams, M. Seeger, Using the Nyström method to speed up kernel machines, in Advances in Neural Information Processing Systems (2001), pp. 682–688
K. Zhang, J.T. Kwok, Density-weighted Nyström method for computing large kernel eigensystems. Neural Comput. 21(1), 121–146 (2009)
Z. Zhang, H. Zha, Principal manifolds and nonlinear dimension reduction via local tangent space alignment. SIAM J. Sci. Comput. 26, 313–338 (2002)
K. Zhang, I.W. Tsang, J.T. Kwok, Improved Nyström low-rank approximation and error analysis. in Proceedings of the 25th international conference on Machine learning (ACM, New York, 2008), pp. 1232–1239
Acknowledgements
This work was supported in part by Defense Threat Reduction Agency grant HDTRA1-13-1-0015 and by Army Research Office grant W911NF1610008.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Czaja, W., Doster, T., Halevy, A. (2017). An Overview of Numerical Acceleration Techniques for Nonlinear Dimension Reduction. In: Pesenson, I., Le Gia, Q., Mayeli, A., Mhaskar, H., Zhou, DX. (eds) Recent Applications of Harmonic Analysis to Function Spaces, Differential Equations, and Data Science. Applied and Numerical Harmonic Analysis. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-55556-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-55556-0_12
Published:
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-55555-3
Online ISBN: 978-3-319-55556-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)