Using SOM-Based Data Binning to Support Supervised Variable Selection
We propose a robust and understandable algorithm for supervised variable selection. The user defines a problem by manually selecting the variables Y that are used to train a Self-Organizing Map (SOM), which best describes the problem of interest. This is an illustrative problem definition even in multivariate case. The user also defines another set X, which contains variables that may be related to the problem. Our algorithm browses subsets of X and returns the one, which contains most information of the user’s problem. We measure information by mapping small areas of the studied subset to the SOM lattice. We return the variable set providing, on average, the most compact mapping. By analysis of public domain data sets and by comparison against other variable selection methods, we illustrate the main benefit of our method: understandability to the common user.
KeywordsVariable Selection Variable Selection Method Horse Power Well Match Unit Variable Selection Algorithm
Unable to display preview. Download preview PDF.
- 1.Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discovery in Databases. AI Magazine 17, 37–54 (1996)Google Scholar
- 2.Chapman, P., Clinton, J., Khabaza, T., Reinartz, T., Wirth, R.: CRISP-DM 1.0 Step-by-Step Data Mining Guide. Technical report, CRISM-DM Consortium (2000), http://www.crisp-dm.org
- 4.Vesanto, J.: Data Exploration Process Based on the Self-Organizing Map. PhD thesis, Helsinki University of Technology (2002), http://lib.hut.fi/Diss/2002/isbn9512258978/
- 5.Bonnlander, B., Weigend, A.: Selecting Input Variables Using Mutual Information and Nonparametric Density Estimation. In: Procoodings of the International Symposium on Artificial Neural Networks (ISANN), pp. 42–50 (1994)Google Scholar
- 6.Laine, S.: Using Visualization, Variable Selection and Feature Extraction to Learn from Industrial Data. PhD thesis, Helsinki University of Technology (2003), http://lib.hut.fi/Diss/2003/isbn9512266709/
- 8.Dash, M., Liu, H., Yao, J.: Dimensionality Reduction for Unsupervised Data. In: Proceedings of the 9th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 532–539 (1997)Google Scholar
- 9.Lagus, K., Alhoniemi, E., Valpola, H.: Independent Variable Group Analysis. In: Proceedings of the International Conference on Artificial Neural Networks (ICANN), pp. 203–210 (2001)Google Scholar
- 10.Jolliffe, I.T.: Principal Component Analysis. Springer, Heidelberg (1986)Google Scholar
- 14.Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall, Englewood Cliffs (1998)Google Scholar