Abstract
Many dimension reduction methods have been proposed to discover the intrinsic, lower dimensional structure of a high-dimensional dataset. However, determining critical features in datasets that consist of a large number of features is still a challenge. In this article, through a series of carefully designed experiments on real-world datasets, we investigate the performance of different dimension reduction techniques, ranging from feature subset selection to methods that transform the features into a lower dimensional space. We also discuss methods that calculate the intrinsic dimensionality of a dataset in order to understand the reduced dimension. Using several evaluation strategies, we show how these different methods can provide useful insights into the data. These comparisons enable us to provide guidance to users on the selection of a technique for their dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Becker, R.H., White, R.L., Helfand, D.J.: The FIRST survey: Faint images of the Radio Sky at Twenty-cm. Astrophys. J. 450, 55–9 (1995)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. CRC, Boca Raton (1984)
Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmonic Anal. 21(1), 5–30 (2006)
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1, 269–271 (1959)
Donoho, D.L., Grimes, C.: Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003)
Floyd, R.W.: Algorithm 97: Shortest path. Commun. ACM. 5, 345 (1962)
Fukunaga, K., Olsen, D.: An algorithm for finding intrinsic dimensionality of data. IEEE Trans. Comput. C-20(2), 176–183 (1971)
Gabriel, K.R.: The biplot graphic display of matrices with application to principal component analysis. Biometrika 58(3), 453–467 (1971)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Machine Learn. Res. 3, 1157–1182 (2003)
Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3, 610–621 (1973)
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River (1998)
He, X., Cai, D., Yan, S., Zhang, H.J.: Neighborhood preserving embedding. In: 10th IEEE International Conference on Computer Vision, vol. 2, pp. 1208–1213 (2005)
Huang, S.H.: Dimensionality reduction on automatic knowledge acquisition: A simple greedy search approach. IEEE Trans. Knowl. Data Eng. 15(6), 1364–1373 (2003)
Kamath, C.: Associating weather conditions with ramp events in wind power generation. In: Power Systems Conference and Exposition (PSCE), IEEE/PES, pp. 1-8, 20-23 (2011). http://ckamath.org/publications_by_project/windsense. Accessed date March 2011
Kamath, C., Cantú-Paz, E., Fodor, I.K., Tang, N.: Searching for bent-double galaxies in the first survey. In: Grossman, R., Kamath, C., Kegelmeyer, W.P., Kumar, V., Buru, R.N. (eds.) Data Mining for Scientific and Engineering Applications, pp. 95–114. Kluwer, Boston (2001)
Kamath, C., Cantú-Paz, E., Littau, D.: Approximate splitting for ensembles of trees using histograms. In: Proceedings, 2nd SIAM International Conference on Data Mining, pp. 370–383 (2002)
Kegl, B.: Intrinsic dimension estimation using packing numbers. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, Cambridge, MA, MIT Press (2003)
Kohavi, R., John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43, 59–69 (1982)
Kokiopoulou, E., Saad, Y.: Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2143–2156 (2007)
Lee, J.A., Verleysen, M.: Nonlinear Dimensionality Reduction. Springer, New York (2007)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Machine Learn. Res. 9, 2579–2605 (2008)
van der Maaten, L., Postma, E., van den Herik, J.: Dimensionality reduction: a comparative review. Tech. Rep. TiCC TR 2009–005, Tilburg University (2009)
Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 837–842 (1996)
Newsam, S., Kamath, C.: Retrieval using texture features in high-resolution, multi-spectral satellite imagery. In: Data Mining and Knowledge Discovery: Theory, Tools, and Technology, VI, Proceedings of SPIE, vol. 5433, pp. 21–32. SPIE Press (2004)
Niskanen, M., Silvén, O.: Comparison of dimensionality reduction methods for wood surface inspection. In: Proceedings of the 6th International Conference on Quality Control by Artificial Vision, pp. 178–188 (2003)
Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Phenomenol. 2(6), 559–572 (1901)
Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learn. 53, 23–69 (2003)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Sabato, S., Shalev-Shwartz, S.: Ranking categorical features using generalization properties. J. Machine Learn. Res. 9, 1083–1114 (2008). http://dl.acm.org/citation.cfm?id=1390681.1390718. Accessed date June 1, 2008
Saul, L.K., Roweis, S.T., Singer, Y.: Think globally, fit locally: Unsupervised learning of low dimensional manifolds. J. Machine Learn. Res. 4, 119–155 (2003)
Shaw, B., Jebara, T.: Structure preserving embedding. In: Proceedings of the 26th International Conference on Machine Learning (2009)
Smith, L.A.: Intrinsic limits on dimension calculations. Phys. Lett. A 133(6), 283–288 (1988)
Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)
Trunk, G.V.: Statistical estimation of the intrinsic dimensionality of a noisy signal collection. IEEE Trans. Comput. C-25(2), 165–171 (1976)
Tsai, F.S.: Comparative study of dimensionality reduction techniques for data visualization. J. Artif. Intell. 3(3), 119–134 (2010)
Valle, S., Li, W., Qin, S.J.: Selection of the number of principal components: The variance of the reconstruction error criterion with a comparison to other methods. Ind. Eng. Chem. Res. 38(11), 4389–4401 (1999)
Weinberger, K., Saul, L.K.: An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In: Proceedings of the National Conference on Artificial Intelligence, pp. 1683–1686. Boston, MA (2006)
Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimension reduction via local tangent space alignment. SIAM J. Sci. Comput. 26, 313–338 (2002)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67(2), 301–320 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Fan, Y., Kamath, C. (2015). On the Selection of Dimension Reduction Techniques for Scientific Applications. In: Abou-Nasr, M., Lessmann, S., Stahlbock, R., Weiss, G. (eds) Real World Data Mining Applications. Annals of Information Systems, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-319-07812-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-07812-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07811-3
Online ISBN: 978-3-319-07812-0
eBook Packages: Business and EconomicsBusiness and Management (R0)