Skip to main content

On the Selection of Dimension Reduction Techniques for Scientific Applications

  • Chapter
  • First Online:
Real World Data Mining Applications

Part of the book series: Annals of Information Systems ((AOIS,volume 17))

  • 2873 Accesses

Abstract

Many dimension reduction methods have been proposed to discover the intrinsic, lower dimensional structure of a high-dimensional dataset. However, determining critical features in datasets that consist of a large number of features is still a challenge. In this article, through a series of carefully designed experiments on real-world datasets, we investigate the performance of different dimension reduction techniques, ranging from feature subset selection to methods that transform the features into a lower dimensional space. We also discuss methods that calculate the intrinsic dimensionality of a dataset in order to understand the reduced dimension. Using several evaluation strategies, we show how these different methods can provide useful insights into the data. These comparisons enable us to provide guidance to users on the selection of a technique for their dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Becker, R.H., White, R.L., Helfand, D.J.: The FIRST survey: Faint images of the Radio Sky at Twenty-cm. Astrophys. J. 450, 55–9 (1995)

    Article  Google Scholar 

  2. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)

    Article  Google Scholar 

  3. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. CRC, Boca Raton (1984)

    Google Scholar 

  4. Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmonic Anal. 21(1), 5–30 (2006)

    Article  Google Scholar 

  5. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1, 269–271 (1959)

    Article  Google Scholar 

  6. Donoho, D.L., Grimes, C.: Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003)

    Article  Google Scholar 

  7. Floyd, R.W.: Algorithm 97: Shortest path. Commun. ACM. 5, 345 (1962)

    Article  Google Scholar 

  8. Fukunaga, K., Olsen, D.: An algorithm for finding intrinsic dimensionality of data. IEEE Trans. Comput. C-20(2), 176–183 (1971)

    Article  Google Scholar 

  9. Gabriel, K.R.: The biplot graphic display of matrices with application to principal component analysis. Biometrika 58(3), 453–467 (1971)

    Article  Google Scholar 

  10. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Machine Learn. Res. 3, 1157–1182 (2003)

    Google Scholar 

  11. Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3, 610–621 (1973)

    Article  Google Scholar 

  12. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River (1998)

    Google Scholar 

  13. He, X., Cai, D., Yan, S., Zhang, H.J.: Neighborhood preserving embedding. In: 10th IEEE International Conference on Computer Vision, vol. 2, pp. 1208–1213 (2005)

    Google Scholar 

  14. Huang, S.H.: Dimensionality reduction on automatic knowledge acquisition: A simple greedy search approach. IEEE Trans. Knowl. Data Eng. 15(6), 1364–1373 (2003)

    Article  Google Scholar 

  15. Kamath, C.: Associating weather conditions with ramp events in wind power generation. In: Power Systems Conference and Exposition (PSCE), IEEE/PES, pp. 1-8, 20-23 (2011). http://ckamath.org/publications_by_project/windsense. Accessed date March 2011

  16. Kamath, C., Cantú-Paz, E., Fodor, I.K., Tang, N.: Searching for bent-double galaxies in the first survey. In: Grossman, R., Kamath, C., Kegelmeyer, W.P., Kumar, V., Buru, R.N. (eds.) Data Mining for Scientific and Engineering Applications, pp. 95–114. Kluwer, Boston (2001)

    Google Scholar 

  17. Kamath, C., Cantú-Paz, E., Littau, D.: Approximate splitting for ensembles of trees using histograms. In: Proceedings, 2nd SIAM International Conference on Data Mining, pp. 370–383 (2002)

    Google Scholar 

  18. Kegl, B.: Intrinsic dimension estimation using packing numbers. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, Cambridge, MA, MIT Press (2003)

    Google Scholar 

  19. Kohavi, R., John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)

    Article  Google Scholar 

  20. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43, 59–69 (1982)

    Article  Google Scholar 

  21. Kokiopoulou, E., Saad, Y.: Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2143–2156 (2007)

    Article  Google Scholar 

  22. Lee, J.A., Verleysen, M.: Nonlinear Dimensionality Reduction. Springer, New York (2007)

    Book  Google Scholar 

  23. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Machine Learn. Res. 9, 2579–2605 (2008)

    Google Scholar 

  24. van der Maaten, L., Postma, E., van den Herik, J.: Dimensionality reduction: a comparative review. Tech. Rep. TiCC TR 2009–005, Tilburg University (2009)

    Google Scholar 

  25. Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 837–842 (1996)

    Article  Google Scholar 

  26. Newsam, S., Kamath, C.: Retrieval using texture features in high-resolution, multi-spectral satellite imagery. In: Data Mining and Knowledge Discovery: Theory, Tools, and Technology, VI, Proceedings of SPIE, vol. 5433, pp. 21–32. SPIE Press (2004)

    Google Scholar 

  27. Niskanen, M., Silvén, O.: Comparison of dimensionality reduction methods for wood surface inspection. In: Proceedings of the 6th International Conference on Quality Control by Artificial Vision, pp. 178–188 (2003)

    Google Scholar 

  28. Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Phenomenol. 2(6), 559–572 (1901)

    Google Scholar 

  29. Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learn. 53, 23–69 (2003)

    Article  Google Scholar 

  30. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)

    Article  Google Scholar 

  31. Sabato, S., Shalev-Shwartz, S.: Ranking categorical features using generalization properties. J. Machine Learn. Res. 9, 1083–1114 (2008). http://dl.acm.org/citation.cfm?id=1390681.1390718. Accessed date June 1, 2008

  32. Saul, L.K., Roweis, S.T., Singer, Y.: Think globally, fit locally: Unsupervised learning of low dimensional manifolds. J. Machine Learn. Res. 4, 119–155 (2003)

    Google Scholar 

  33. Shaw, B., Jebara, T.: Structure preserving embedding. In: Proceedings of the 26th International Conference on Machine Learning (2009)

    Google Scholar 

  34. Smith, L.A.: Intrinsic limits on dimension calculations. Phys. Lett. A 133(6), 283–288 (1988)

    Article  Google Scholar 

  35. Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Article  Google Scholar 

  36. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)

    Google Scholar 

  37. Trunk, G.V.: Statistical estimation of the intrinsic dimensionality of a noisy signal collection. IEEE Trans. Comput. C-25(2), 165–171 (1976)

    Article  Google Scholar 

  38. Tsai, F.S.: Comparative study of dimensionality reduction techniques for data visualization. J. Artif. Intell. 3(3), 119–134 (2010)

    Article  Google Scholar 

  39. Valle, S., Li, W., Qin, S.J.: Selection of the number of principal components: The variance of the reconstruction error criterion with a comparison to other methods. Ind. Eng. Chem. Res. 38(11), 4389–4401 (1999)

    Article  Google Scholar 

  40. Weinberger, K., Saul, L.K.: An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In: Proceedings of the National Conference on Artificial Intelligence, pp. 1683–1686. Boston, MA (2006)

    Google Scholar 

  41. Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimension reduction via local tangent space alignment. SIAM J. Sci. Comput. 26, 313–338 (2002)

    Article  Google Scholar 

  42. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67(2), 301–320 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ya Ju Fan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Fan, Y., Kamath, C. (2015). On the Selection of Dimension Reduction Techniques for Scientific Applications. In: Abou-Nasr, M., Lessmann, S., Stahlbock, R., Weiss, G. (eds) Real World Data Mining Applications. Annals of Information Systems, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-319-07812-0_6

Download citation

Publish with us

Policies and ethics