Skip to main content

An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3646))

Abstract

Feature selection for clustering is a problem rarely addressed in the literature. Although recently there has been some work on the area, there is a lack of extensive empirical evaluation to assess the potential of each method. In this paper, we propose a new implementation of a wrapper and adapt an existing filter method to perform experiments over several data sets and compare both approaches. Results confirm the utility of feature selection for clustering and the theoretical superiority of wrapper methods. However, it raises some problems that arise from using greedy search procedures and also suggest evidence that filters are a reasonably alternative with limited computational cost.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97, 245–271 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  2. Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering - a filter solution. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 115–122. IEEE Computer Society Press, Los Alamitos (2002)

    Chapter  Google Scholar 

  3. Dash, M., Liu, H.: Feature selection for clustering. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 110–121. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  4. Dash, M., Liu, H., Yao, J.: Dimensionality reduction for unsupervised data. In: Ninth IEEE International Conference on Tools with AI, ICTAI 1997 (1997)

    Google Scholar 

  5. Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. Journal of Machine Learning Research 5, 845–889 (2004)

    MathSciNet  Google Scholar 

  6. Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2, 139–172 (1987)

    Google Scholar 

  7. Gennari, J.H.: Concept formation and attention. In: Proceedings of the Seventh Annual Conference of the Cognitive Science Society, Irvine,CA, pp. 724–728. Lawrence Erlbaum Associates, Mahwah (1991)

    Google Scholar 

  8. Kim, Y., Street, W.N., Menczer, F.: Evolutionary model selection in unsupervised learning. Intelligent Data Analysis 6(6), 531–556 (2002)

    MATH  Google Scholar 

  9. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)

    Article  MATH  Google Scholar 

  10. Law, M.H.C., Figueiredo, M.A.T., Jain, A.K.: Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(9), 1154–1166 (2004)

    Article  Google Scholar 

  11. Meila, M., Heckerman, D.: An experimental comparison of model-based clustering methods. Machine Learning 42(1/2), 9–29 (2001)

    Article  MATH  Google Scholar 

  12. Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explorations 6(1), 90–105 (2004)

    Article  Google Scholar 

  13. Peña, J.M., Lozano, J.A., Larrañaga, P., Inza, I.: Dimensionality reduction in unsupervised learning of conditional gaussian networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(6), 590–603 (2001)

    Article  Google Scholar 

  14. Talavera, L.: Feature selection as a preprocessing step for hierarchical clustering. In: Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, pp. 389–397. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  15. Talavera, L.: Dependency-based feature selection for symbolic clustering. Intelligent Data Analysis 4(1) (2000)

    Google Scholar 

  16. Talavera, L.: Feature selection and incremental learning of probabilistic conc ept hierarchies. In: Proceedings of the Seveteenth International Conference on Machine Learning, Stanford, CA, pp. 951–958. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  17. Vaithyanathan, S., Dom, B.: Model selection in unsupervised learning with applications to do cument clustering. In: Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, pp. 433–443. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Talavera, L. (2005). An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_40

Download citation

  • DOI: https://doi.org/10.1007/11552253_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28795-7

  • Online ISBN: 978-3-540-31926-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics