Skip to main content

Abstract

This paper introduces a filter, named FCF (Fuzzy Clustering-based Filter), for removing redundant features, thus making it possible to improve the efficacy and the efficiency of data mining algorithms. FCF is based on the fuzzy partitioning of features into clusters. The number of clusters is automatically estimated from data. After the clustering process, FCF selects a subset of features from the obtained clusters. To do so, we study four different strategies that are based on the information provided by the fuzzy partition matrix. We also show that these strategies can be combined for better performance. Empirical results illustrate the performance of FCF, which in general has obtained competitive results in classification tasks when compared to a related filter that is based on the hard partitioning of features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Know. Data Eng. 17(4), 491–502 (2005)

    Article  Google Scholar 

  2. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Dordrecht (1998)

    Book  MATH  Google Scholar 

  3. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)

    Article  MATH  Google Scholar 

  4. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  5. Au, W., Chan, K., Wong, A., Wang, Y.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(2), 83–101 (2005)

    Article  Google Scholar 

  6. Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)

    Article  Google Scholar 

  7. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML1997, pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

  8. Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)

    MathSciNet  MATH  Google Scholar 

  9. Covões, T.F., Hruschka, E.R., de Castro, L.N., dos Santos, A.M.: A cluster-based feature selection approach. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds.) HAIS 2009. LNCS (LNAI), vol. 5572, pp. 169–176. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Covões, T.F., Hruschka, E.R.: An experimental study on unsupervised clustering-based feature selection methods. In: ISDA 2009, pp. 993–1000. IEEE Press, Los Alamitos (2009)

    Google Scholar 

  11. Arabie, P., Hubert, L.J.: 1. In: An Overview of Combinatorial Data Analysis, pp. 5–64. World Scientific Publishing Company, Singapore (1999)

    Google Scholar 

  12. Hruschka, E.R., Campello, R.J.G.B., de Castro, L.N.: Evolving clusters in gene-expression data. Information Sciences 176(13), 1898–1927 (2006)

    Article  MathSciNet  Google Scholar 

  13. Everitt, B.S.: Cluster Analysis. Edward Arnold and Halsted Press (2001)

    Google Scholar 

  14. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley, Chichester (1990)

    Book  MATH  Google Scholar 

  15. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)

    Article  Google Scholar 

  16. Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Trans. Fuzzy Syst. 9(4), 595–607 (2001)

    Article  Google Scholar 

  17. Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Dordrecht (1981)

    Book  MATH  Google Scholar 

  18. Campello, R., Hruschka, E.: A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets and Systems 157(21), 2858–2875 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  19. Yeung, K., Medvedovic, M., Bumgarner, R.: Clustering gene-expression data with repeated measurements. Genome Biology 4(5), R34 (2003)

    Google Scholar 

  20. Alon, U., Barkai, N., Notterman, D., Gishdagger, K., Ybarradagger, S., Mackdagger, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sciences USA 96(12), 6745–6750 (1999)

    Article  Google Scholar 

  21. Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

  22. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  23. Reunanen, J., Guyon, I., Elisseeff, A.: Overfitting in making comparisons between variable selection methods. J. of Mach. Learn. Res. 3, 1371–1382 (2003)

    MATH  Google Scholar 

  24. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  25. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., Mclachlan, G., Ng, A., Liu, B., Yu, P., Zhou, Z., Steinbach, M., Hand, D., Steinberg, D.: Top 10 algorithms in data mining. Know. Inform. Systems 14(1), 1–37 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Coletta, L.F.S., Hruschka, E.R., Covoes, T.F., Campello, R.J.G.B. (2010). Fuzzy Clustering-Based Filter. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Methods. IPMU 2010. Communications in Computer and Information Science, vol 80. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14055-6_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14055-6_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14054-9

  • Online ISBN: 978-3-642-14055-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics