Abstract
This paper proposes a filter-based method for feature selection. The filter is based on the partitioning of the feature space into clusters of similar features. The number of clusters and, consequently, the cardinality of the subset of selected features, is automatically estimated from the data. Empirical results illustrate the performance of the proposed algorithm, which in general has obtained competitive results in terms of classification accuracy when compared to a state of the art algorithm for feature selection, but with more modest computing time requirements.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Arabie, P., Hubert, L.J.: An Overview of Combinatorial Data Analysis. In: Arabie, P., Hubert, L.J., DeSoete, G. (eds.) Clustering and Classification. World Scientific, Singapore (1999)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.htm
Au, W., Chan, K.C.C., Wong, A.K.C., Wang, Y.: Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(2), 83–101 (2005)
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis. Arnold Publishers, London (2001)
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Hruschka, E.R., Campello, R.J.G.B., de Castro, L.N.: Evolving Clusters in Gene-Expression Data. Information Sciences 176(13), 1898–1927 (2006)
John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proc. of the Eleventh Int. Conf. on Machine Learning. Morgan Kaufmann, San Francisco (1994)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data – An Introduction to Cluster Analysis. Wiley Series in Probability and Mathematical Statistics (1990)
Kohavi, R., John, G.H.: Wrappers for Feature Subset Selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proc. of the 13th Int. Conf. on Machine Learning, pp. 284–292 (1996)
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Dordrecht (1998)
Liu, H., Yu, L.: Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Transactions on Knowledge and Data Engineering 17(3), 1–12 (2005)
Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised Feature Selection using Feature Similarity. IEEE Trans. on Pattern Analysis & Machine Intelligence 24(4), 301–312 (2002)
Reunanen, J.: Overfitting in Making Comparisons Between Variable Selection Methods. Journal of Machine Learning Research 3, 1371–1382 (2003)
Witten, I.H., Frank, E.: Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, USA (2000)
Yang, Y., Pederson, J.: A comparative study on feature selection in text categorization. In: Proc. of the Fourteenth International Conference on Machine Learning (1997)
Yu, L., Liu, H.: Efficient Feature Selection via Analysis of Relevance and Redundancy. Journal of Machine Learning Research (5), 1205–1224 (2004)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Covões, T.F., Hruschka, E.R., de Castro, L.N., Santos, Á.M. (2009). A Cluster-Based Feature Selection Approach. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds) Hybrid Artificial Intelligence Systems. HAIS 2009. Lecture Notes in Computer Science(), vol 5572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02319-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-02319-4_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02318-7
Online ISBN: 978-3-642-02319-4
eBook Packages: Computer ScienceComputer Science (R0)