Abstract
Feature selection for clustering is a problem rarely addressed in the literature. Although recently there has been some work on the area, there is a lack of extensive empirical evaluation to assess the potential of each method. In this paper, we propose a new implementation of a wrapper and adapt an existing filter method to perform experiments over several data sets and compare both approaches. Results confirm the utility of feature selection for clustering and the theoretical superiority of wrapper methods. However, it raises some problems that arise from using greedy search procedures and also suggest evidence that filters are a reasonably alternative with limited computational cost.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97, 245–271 (1997)
Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering - a filter solution. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 115–122. IEEE Computer Society Press, Los Alamitos (2002)
Dash, M., Liu, H.: Feature selection for clustering. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 110–121. Springer, Heidelberg (2000)
Dash, M., Liu, H., Yao, J.: Dimensionality reduction for unsupervised data. In: Ninth IEEE International Conference on Tools with AI, ICTAI 1997 (1997)
Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. Journal of Machine Learning Research 5, 845–889 (2004)
Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2, 139–172 (1987)
Gennari, J.H.: Concept formation and attention. In: Proceedings of the Seventh Annual Conference of the Cognitive Science Society, Irvine,CA, pp. 724–728. Lawrence Erlbaum Associates, Mahwah (1991)
Kim, Y., Street, W.N., Menczer, F.: Evolutionary model selection in unsupervised learning. Intelligent Data Analysis 6(6), 531–556 (2002)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)
Law, M.H.C., Figueiredo, M.A.T., Jain, A.K.: Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(9), 1154–1166 (2004)
Meila, M., Heckerman, D.: An experimental comparison of model-based clustering methods. Machine Learning 42(1/2), 9–29 (2001)
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explorations 6(1), 90–105 (2004)
Peña, J.M., Lozano, J.A., Larrañaga, P., Inza, I.: Dimensionality reduction in unsupervised learning of conditional gaussian networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(6), 590–603 (2001)
Talavera, L.: Feature selection as a preprocessing step for hierarchical clustering. In: Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, pp. 389–397. Morgan Kaufmann, San Francisco (1999)
Talavera, L.: Dependency-based feature selection for symbolic clustering. Intelligent Data Analysis 4(1) (2000)
Talavera, L.: Feature selection and incremental learning of probabilistic conc ept hierarchies. In: Proceedings of the Seveteenth International Conference on Machine Learning, Stanford, CA, pp. 951–958. Morgan Kaufmann, San Francisco (2000)
Vaithyanathan, S., Dom, B.: Model selection in unsupervised learning with applications to do cument clustering. In: Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, pp. 433–443. Morgan Kaufmann, San Francisco (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Talavera, L. (2005). An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_40
Download citation
DOI: https://doi.org/10.1007/11552253_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28795-7
Online ISBN: 978-3-540-31926-9
eBook Packages: Computer ScienceComputer Science (R0)