On the Selection of Dimension Reduction Techniques for Scientific Applications

Fan, Ya Ju; Kamath, Chandrika

doi:10.1007/978-3-319-07812-0_6

Ya Ju Fan⁷ &
Chandrika Kamath⁷

Part of the book series: Annals of Information Systems ((AOIS,volume 17))

2873 Accesses

Abstract

Many dimension reduction methods have been proposed to discover the intrinsic, lower dimensional structure of a high-dimensional dataset. However, determining critical features in datasets that consist of a large number of features is still a challenge. In this article, through a series of carefully designed experiments on real-world datasets, we investigate the performance of different dimension reduction techniques, ranging from feature subset selection to methods that transform the features into a lower dimensional space. We also discuss methods that calculate the intrinsic dimensionality of a dataset in order to understand the reduced dimension. Using several evaluation strategies, we show how these different methods can provide useful insights into the data. These comparisons enable us to provide guidance to users on the selection of a technique for their dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Becker, R.H., White, R.L., Helfand, D.J.: The FIRST survey: Faint images of the Radio Sky at Twenty-cm. Astrophys. J. 450, 55–9 (1995)
Article Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Article Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. CRC, Boca Raton (1984)
Google Scholar
Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmonic Anal. 21(1), 5–30 (2006)
Article Google Scholar
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1, 269–271 (1959)
Article Google Scholar
Donoho, D.L., Grimes, C.: Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003)
Article Google Scholar
Floyd, R.W.: Algorithm 97: Shortest path. Commun. ACM. 5, 345 (1962)
Article Google Scholar
Fukunaga, K., Olsen, D.: An algorithm for finding intrinsic dimensionality of data. IEEE Trans. Comput. C-20(2), 176–183 (1971)
Article Google Scholar
Gabriel, K.R.: The biplot graphic display of matrices with application to principal component analysis. Biometrika 58(3), 453–467 (1971)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Machine Learn. Res. 3, 1157–1182 (2003)
Google Scholar
Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3, 610–621 (1973)
Article Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River (1998)
Google Scholar
He, X., Cai, D., Yan, S., Zhang, H.J.: Neighborhood preserving embedding. In: 10th IEEE International Conference on Computer Vision, vol. 2, pp. 1208–1213 (2005)
Google Scholar
Huang, S.H.: Dimensionality reduction on automatic knowledge acquisition: A simple greedy search approach. IEEE Trans. Knowl. Data Eng. 15(6), 1364–1373 (2003)
Article Google Scholar
Kamath, C.: Associating weather conditions with ramp events in wind power generation. In: Power Systems Conference and Exposition (PSCE), IEEE/PES, pp. 1-8, 20-23 (2011). http://ckamath.org/publications_by_project/windsense. Accessed date March 2011
Kamath, C., Cantú-Paz, E., Fodor, I.K., Tang, N.: Searching for bent-double galaxies in the first survey. In: Grossman, R., Kamath, C., Kegelmeyer, W.P., Kumar, V., Buru, R.N. (eds.) Data Mining for Scientific and Engineering Applications, pp. 95–114. Kluwer, Boston (2001)
Google Scholar
Kamath, C., Cantú-Paz, E., Littau, D.: Approximate splitting for ensembles of trees using histograms. In: Proceedings, 2nd SIAM International Conference on Data Mining, pp. 370–383 (2002)
Google Scholar
Kegl, B.: Intrinsic dimension estimation using packing numbers. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, Cambridge, MA, MIT Press (2003)
Google Scholar
Kohavi, R., John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Article Google Scholar
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43, 59–69 (1982)
Article Google Scholar
Kokiopoulou, E., Saad, Y.: Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2143–2156 (2007)
Article Google Scholar
Lee, J.A., Verleysen, M.: Nonlinear Dimensionality Reduction. Springer, New York (2007)
Book Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Machine Learn. Res. 9, 2579–2605 (2008)
Google Scholar
van der Maaten, L., Postma, E., van den Herik, J.: Dimensionality reduction: a comparative review. Tech. Rep. TiCC TR 2009–005, Tilburg University (2009)
Google Scholar
Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 837–842 (1996)
Article Google Scholar
Newsam, S., Kamath, C.: Retrieval using texture features in high-resolution, multi-spectral satellite imagery. In: Data Mining and Knowledge Discovery: Theory, Tools, and Technology, VI, Proceedings of SPIE, vol. 5433, pp. 21–32. SPIE Press (2004)
Google Scholar
Niskanen, M., Silvén, O.: Comparison of dimensionality reduction methods for wood surface inspection. In: Proceedings of the 6th International Conference on Quality Control by Artificial Vision, pp. 178–188 (2003)
Google Scholar
Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Phenomenol. 2(6), 559–572 (1901)
Google Scholar
Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learn. 53, 23–69 (2003)
Article Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Article Google Scholar
Sabato, S., Shalev-Shwartz, S.: Ranking categorical features using generalization properties. J. Machine Learn. Res. 9, 1083–1114 (2008). http://dl.acm.org/citation.cfm?id=1390681.1390718. Accessed date June 1, 2008
Saul, L.K., Roweis, S.T., Singer, Y.: Think globally, fit locally: Unsupervised learning of low dimensional manifolds. J. Machine Learn. Res. 4, 119–155 (2003)
Google Scholar
Shaw, B., Jebara, T.: Structure preserving embedding. In: Proceedings of the 26th International Conference on Machine Learning (2009)
Google Scholar
Smith, L.A.: Intrinsic limits on dimension calculations. Phys. Lett. A 133(6), 283–288 (1988)
Article Google Scholar
Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)
Google Scholar
Trunk, G.V.: Statistical estimation of the intrinsic dimensionality of a noisy signal collection. IEEE Trans. Comput. C-25(2), 165–171 (1976)
Article Google Scholar
Tsai, F.S.: Comparative study of dimensionality reduction techniques for data visualization. J. Artif. Intell. 3(3), 119–134 (2010)
Article Google Scholar
Valle, S., Li, W., Qin, S.J.: Selection of the number of principal components: The variance of the reconstruction error criterion with a comparison to other methods. Ind. Eng. Chem. Res. 38(11), 4389–4401 (1999)
Article Google Scholar
Weinberger, K., Saul, L.K.: An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In: Proceedings of the National Conference on Artificial Intelligence, pp. 1683–1686. Boston, MA (2006)
Google Scholar
Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimension reduction via local tangent space alignment. SIAM J. Sci. Comput. 26, 313–338 (2002)
Article Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67(2), 301–320 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, USA
Ya Ju Fan & Chandrika Kamath

Authors

Ya Ju Fan
View author publications
You can also search for this author in PubMed Google Scholar
Chandrika Kamath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ya Ju Fan .

Editor information

Editors and Affiliations

Research & Advanced Engineering, Ford Motor Company, Dearborn, Michigan, USA
Mahmoud Abou-Nasr
Universität Hamburg Inst. Wirtschaftsinformatik, Hamburg, Germany
Stefan Lessmann
Universität Hamburg Inst. Wirtschaftsinformatik, Hamburg, Germany
Robert Stahlbock
Deptartment of Computer & Information Science, Fordham University, Bronx, New York, USA
Gary M. Weiss

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fan, Y., Kamath, C. (2015). On the Selection of Dimension Reduction Techniques for Scientific Applications. In: Abou-Nasr, M., Lessmann, S., Stahlbock, R., Weiss, G. (eds) Real World Data Mining Applications. Annals of Information Systems, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-319-07812-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-07812-0_6
Published: 14 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07811-3
Online ISBN: 978-3-319-07812-0
eBook Packages: Business and EconomicsBusiness and Management (R0)

Publish with us

Policies and ethics