Abstract
Scalable feature selection algorithms should remove irrelevant and redundant features and scale well on very large datasets. We identify that the currently best state-of-art methods perform well on binary classification tasks but often underperform on multi-class tasks. We suggest that they suffer from the so-called accumulative effect which becomes more visible with the growing number of classes and results in removing relevant and unredundant features. To remedy the problem, we propose two new feature filtering methods which are both scalable and well adapted for the multi-class cases. We report the evaluation results on 17 different datasets which include both binary and multi-class cases.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Das, S.: Filters, wrappers and a boosting-based hybrid for feature selection. In: Proc. 18th Intern. Conf. Machine Learning, pp. 74–81 (2001)
Dash, M., Liu, H.: Feature selection for clustering. In: Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 110–121 (2000)
Koller, D., Sahami, M.: Toward optimal feature selection. In: ICML 1996: Proc. 13th International Conference on Machine Learning, pp. 284–292. Morgan Kaufmann Publishers Inc., San Francisco (1996)
Geng, X., Liu, T.-Y., Qin, T., Li, H.: Feature selection for ranking. In: SIGIR 2007: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 407–414. ACM, New York (2007)
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: ICML 2000: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366 (2000)
Kira, L., Rendell, L.: The feature selection problem: Traditional methods and a new algorithm. In: Proc. 10th National Conf. Artificial Intelligence, pp. 129–134 (1992)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–323 (1997)
Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Liu, H., Motoda, H.: Computational Methods of Feature Selection. Chapman and Hall/CRC, Boca Raton (2007)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17(4), 491–502 (2005)
Ruiz, R., Aguilar-Ruiz, J.S., Riquelme, J.C.: Efficient incremental-ranking feature selection in massive data. In: Liu, H., Motoda, H. (eds.) Computational Methods of Feature Selection, pp. 147–166. Chapman and Hall/CRC, Boca Raton (2007)
Samet, H.: Foundations of Multidimensional And Metric Data Structure. Morgan Kaufmann Publishers, Reading (2006)
Singhi, S.K., Liu, H.: Feature subset selection bias for classification learning. In: ICML 2006: Proceedings of the 23rd international conference on Machine learning, pp. 849–856. ACM, New York (2006)
Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: ICML 2001: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 601–608. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
Zhao, Z., Liu, H.: Searching for interacting features. In: Proc. Intern. Joint Conf. Artificial Intelligence, IJCAI, pp. 1156–1161 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chidlovskii, B., Lecerf, L. (2008). Scalable Feature Selection for Multi-class Problems. In: Daelemans, W., Goethals, B., Morik, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008. Lecture Notes in Computer Science(), vol 5211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87479-9_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-87479-9_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87478-2
Online ISBN: 978-3-540-87479-9
eBook Packages: Computer ScienceComputer Science (R0)