An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering

Talavera, Luis

doi:10.1007/11552253_40

An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering

Luis Talavera²¹

Conference paper

2298 Accesses
47 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3646))

Abstract

Feature selection for clustering is a problem rarely addressed in the literature. Although recently there has been some work on the area, there is a lack of extensive empirical evaluation to assess the potential of each method. In this paper, we propose a new implementation of a wrapper and adapt an existing filter method to perform experiments over several data sets and compare both approaches. Results confirm the utility of feature selection for clustering and the theoretical superiority of wrapper methods. However, it raises some problems that arise from using greedy search procedures and also suggest evidence that filters are a reasonably alternative with limited computational cost.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97, 245–271 (1997)
Article MATH MathSciNet Google Scholar
Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering - a filter solution. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 115–122. IEEE Computer Society Press, Los Alamitos (2002)
Chapter Google Scholar
Dash, M., Liu, H.: Feature selection for clustering. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 110–121. Springer, Heidelberg (2000)
Chapter Google Scholar
Dash, M., Liu, H., Yao, J.: Dimensionality reduction for unsupervised data. In: Ninth IEEE International Conference on Tools with AI, ICTAI 1997 (1997)
Google Scholar
Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. Journal of Machine Learning Research 5, 845–889 (2004)
MathSciNet Google Scholar
Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2, 139–172 (1987)
Google Scholar
Gennari, J.H.: Concept formation and attention. In: Proceedings of the Seventh Annual Conference of the Cognitive Science Society, Irvine,CA, pp. 724–728. Lawrence Erlbaum Associates, Mahwah (1991)
Google Scholar
Kim, Y., Street, W.N., Menczer, F.: Evolutionary model selection in unsupervised learning. Intelligent Data Analysis 6(6), 531–556 (2002)
MATH Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)
Article MATH Google Scholar
Law, M.H.C., Figueiredo, M.A.T., Jain, A.K.: Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(9), 1154–1166 (2004)
Article Google Scholar
Meila, M., Heckerman, D.: An experimental comparison of model-based clustering methods. Machine Learning 42(1/2), 9–29 (2001)
Article MATH Google Scholar
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explorations 6(1), 90–105 (2004)
Article Google Scholar
Peña, J.M., Lozano, J.A., Larrañaga, P., Inza, I.: Dimensionality reduction in unsupervised learning of conditional gaussian networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(6), 590–603 (2001)
Article Google Scholar
Talavera, L.: Feature selection as a preprocessing step for hierarchical clustering. In: Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, pp. 389–397. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Talavera, L.: Dependency-based feature selection for symbolic clustering. Intelligent Data Analysis 4(1) (2000)
Google Scholar
Talavera, L.: Feature selection and incremental learning of probabilistic conc ept hierarchies. In: Proceedings of the Seveteenth International Conference on Machine Learning, Stanford, CA, pp. 951–958. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Vaithyanathan, S., Dom, B.: Model selection in unsupervised learning with applications to do cument clustering. In: Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, pp. 433–443. Morgan Kaufmann, San Francisco (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Jordi Girona 1-3, 08034, Barcelona, Spain
Luis Talavera

Authors

Luis Talavera
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Information Technology, National Research Council Canada, Ottawa, Canada
A. Fazel Famili
LIACS, Leiden University, The Netherlands
Joost N. Kok
IFM, Linköping University, SE-58183, Linköping, Sweden
José M. Peña
Department of Computer Science, Universiteit Utrecht,
Arno Siebes
Utrecht University, TB Utrecht,, P.O. box 80 089, NL-3508, the Netherlands
Ad Feelders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Talavera, L. (2005). An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_40

Download citation

DOI: https://doi.org/10.1007/11552253_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28795-7
Online ISBN: 978-3-540-31926-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics