Fuzzy Clustering-Based Filter

Coletta, Luiz F. S.; Hruschka, Eduardo R.; Covoes, Thiago F.; Campello, Ricardo J. G. B.

doi:10.1007/978-3-642-14055-6_42

Luiz F. S. Coletta⁴,
Eduardo R. Hruschka⁴,
Thiago F. Covoes⁴ &
…
Ricardo J. G. B. Campello⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 80))

Included in the following conference series:

International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems

984 Accesses

Abstract

This paper introduces a filter, named FCF (Fuzzy Clustering-based Filter), for removing redundant features, thus making it possible to improve the efficacy and the efficiency of data mining algorithms. FCF is based on the fuzzy partitioning of features into clusters. The number of clusters is automatically estimated from data. After the clustering process, FCF selects a subset of features from the obtained clusters. To do so, we study four different strategies that are based on the information provided by the fuzzy partition matrix. We also show that these strategies can be combined for better performance. Empirical results illustrate the performance of FCF, which in general has obtained competitive results in classification tasks when compared to a related filter that is based on the hard partitioning of features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Know. Data Eng. 17(4), 491–502 (2005)
Article Google Scholar
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Dordrecht (1998)
Book MATH Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Article MATH Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Au, W., Chan, K., Wong, A., Wang, Y.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(2), 83–101 (2005)
Article Google Scholar
Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)
Article Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML1997, pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Google Scholar
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
MathSciNet MATH Google Scholar
Covões, T.F., Hruschka, E.R., de Castro, L.N., dos Santos, A.M.: A cluster-based feature selection approach. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds.) HAIS 2009. LNCS (LNAI), vol. 5572, pp. 169–176. Springer, Heidelberg (2009)
Chapter Google Scholar
Covões, T.F., Hruschka, E.R.: An experimental study on unsupervised clustering-based feature selection methods. In: ISDA 2009, pp. 993–1000. IEEE Press, Los Alamitos (2009)
Google Scholar
Arabie, P., Hubert, L.J.: 1. In: An Overview of Combinatorial Data Analysis, pp. 5–64. World Scientific Publishing Company, Singapore (1999)
Google Scholar
Hruschka, E.R., Campello, R.J.G.B., de Castro, L.N.: Evolving clusters in gene-expression data. Information Sciences 176(13), 1898–1927 (2006)
Article MathSciNet Google Scholar
Everitt, B.S.: Cluster Analysis. Edward Arnold and Halsted Press (2001)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley, Chichester (1990)
Book MATH Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)
Article Google Scholar
Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Trans. Fuzzy Syst. 9(4), 595–607 (2001)
Article Google Scholar
Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Dordrecht (1981)
Book MATH Google Scholar
Campello, R., Hruschka, E.: A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets and Systems 157(21), 2858–2875 (2006)
Article MathSciNet MATH Google Scholar
Yeung, K., Medvedovic, M., Bumgarner, R.: Clustering gene-expression data with repeated measurements. Genome Biology 4(5), R34 (2003)
Google Scholar
Alon, U., Barkai, N., Notterman, D., Gishdagger, K., Ybarradagger, S., Mackdagger, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sciences USA 96(12), 6745–6750 (1999)
Article Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Reunanen, J., Guyon, I., Elisseeff, A.: Overfitting in making comparisons between variable selection methods. J. of Mach. Learn. Res. 3, 1371–1382 (2003)
MATH Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., Mclachlan, G., Ng, A., Liu, B., Yu, P., Zhou, Z., Steinbach, M., Hand, D., Steinberg, D.: Top 10 algorithms in data mining. Know. Inform. Systems 14(1), 1–37 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of São Paulo (USP) at São Carlos, Brazil
Luiz F. S. Coletta, Eduardo R. Hruschka, Thiago F. Covoes & Ricardo J. G. B. Campello

Authors

Luiz F. S. Coletta
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo R. Hruschka
View author publications
You can also search for this author in PubMed Google Scholar
Thiago F. Covoes
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo J. G. B. Campello
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fachbereich Mathematik und Informatik, Philipps-Universität Marburg, Marburg, Germany
Eyke Hüllermeier
Department of Knowledge Processing and Language Engineering, Otto-von-Guericke University of Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Rudolf Kruse
Fakultät für Elektrotechnik und Informationstechnik, Technische Universität Dortmund, 44221, Dortmund, Germany
Frank Hoffmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Coletta, L.F.S., Hruschka, E.R., Covoes, T.F., Campello, R.J.G.B. (2010). Fuzzy Clustering-Based Filter. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Methods. IPMU 2010. Communications in Computer and Information Science, vol 80. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14055-6_42

Download citation

DOI: https://doi.org/10.1007/978-3-642-14055-6_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14054-9
Online ISBN: 978-3-642-14055-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics