Abstract
In multi-label learning, some redundant and irrelevant features increase computational cost and even degrade classification performance, which are widely dealt with via feature selection procedure. Unbiased Hilbert-Schmidt independence criterion (HSIC) is a kernel-based dependence measure between feature and label data, which has been combined with greedy search techniques (e.g., sequential forward selection) to search for a locally optimal feature subset. Alternatively, it is possible to achieve a globally optimal solution using genetic algorithm (GA), but usually the final solution prefers to select about a half of original features. In this paper, we propose a new GA variant to control the number of selected features (simply CGA). Then CGA is integrated with HSIC to formulate a novel multi-label feature selection technique (CGAHSIC) for a given size of feature subset. The effectiveness of our proposed CGAHSIC is validated through comparing with four existing algorithms, on four benchmark data sets, according to four indicative multi-label classification evaluation metrics (Hamming loss, accuracy, F1 and subset accuracy).
This work was supported by Natural Science Foundation of China under grant No. 61273246.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J.: Multilabel Classification: Problem Analysis: Metrics and Techniques. Springer, Switzerland (2016). https://doi.org/10.1007/978-3-319-41111-8
Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehouse Min. 3(3), 1–13 (2007)
Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1338–1351 (2014)
Kashef, S., Nezamabadi-pour, H., Nipour, B.: Multilabel feature selection: a comprehensiove review and guide experiments. WIREs Data Min. Knowl. Discov. 8(2), e1240 (2018)
Pereira, R., Plastino, A., Zadrozny, B., Merschmann, L.H.C.: Categorizing feature selection methods for multi-label classification. Artif. Intell. Rev. 49(1), 57–78 (2018)
Lee, J., Kim, D.W.: Feature selection for multi-label classification using multivariate mutual information. Pattern Recogn. Lett. 34(3), 349–357 (2013)
Lee, J., Kim, D.W.: Fast multi-label feature selection based on information-theoretic feature ranking. Pattern Recogn. 48(9), 2761–2771 (2015)
Lee, J., Kim, D.W.: SCLS: multi-label feature selection based on scalable criterion for large label set. Pattern Recogn. 66, 342–352 (2017)
Lin, Y., Hu, Q., Liu, J., Duan, J.: Multi-label feature selection based on max-dependency and min-redundancy. Neurocompting 168, 92–103 (2015)
Spolaor, N., Chermana, E.A., Monarda, M.C., Lee, H.D.: A comparison of multi-label feature selection methods using the problem transformation approach. Eletronic Notes Theoret. Comput. Sci. 292, 135–151 (2013)
Spolaor, N., Monard, M.C., Tsoumakas, G., Lee, H.D.: A systematic review of multi-label feature selection and a new method based on label construction. Neurocomputing 180, 3–15 (2016)
Chen, W., Yan, J., Zhang, B., Chen, Z., Yang, Q.: Document transformation for multi-label feature selection text categorization. In: 7th IEEE International Conference on Data Mining (ICDM2007), pp. 451–456. IEEE Press, New York (2007)
Pupo, O.G.R., Morell, C., Soto, S.V.: ReliefF-ML: an extension of relieff algorithm to multi-label learning. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds.) CIARP 2013. LNCS, vol. 8259, pp. 528–535. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41827-3_66
Reyes, O., Morell, C., Ventura, S.: Scalable extensions of the relieff algorithm for weighting and selecting features on the multi-label learning context. Neurocomputing 161, 168–182 (2015)
Spolaor, N., Cherman, E., Monard, M., Lee, H.: Relief for multilabel feature selection. In: 2013 Brazlian Conference on Intelligent Systems (BRACIS2013), pp. 6–11. IEEE Press, New York (2013)
Kong, D., Ding, C., Huang, H., Zhao, H.: Multi-label relieff and f-statistics feature selection for image annotation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR2012), pp. 2352–2359. IEEE Press, New York (2012)
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
Xu, J.: Effective and efficient multi-label feature selection approaches via modifying Hilbert-Schmidt independence criterion. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9949, pp. 385–395. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46675-0_42
Jungjit, S., Freitas, A.A., Michaelis, M., Cinatl, J.: A multi-label correlation based feature selection method for the classification of neuroblastoma microarray data. In: 12th Industrial Conference on Data Mining (ICDM2012): Workshop on Data Mining and Life Sciences (DMLS2012), pp. 149–157 (2012)
Jungjit, S., Freitas, A.A.: A new genetic algorithm for multi-label correlation-based feature selection. In: 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN2015), pp. 285–290 (2015)
Lee, J., Kim, D.W.: Memetic feature selection algorithm for multi-label classification. Inf. Sci. 293, 80–95 (2015)
Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.: Measuring statistical dependence with Hilbert-Schmidt norms. In: Jain, S., Simon, H.U., Tomita, E. (eds.) ALT 2005. LNCS (LNAI), vol. 3734, pp. 63–77. Springer, Heidelberg (2005). https://doi.org/10.1007/11564089_7
Song, L., Smola, A., Bedo, A.G.J., Borgwardt, K.: Feature selection via dependence maximization. J. Mach. Learn. Res. 13, 1393–1434 (2012)
Yin, J., Tao, T., Xu, J.: A multi-label feature selection algorithm based on multi -objective optimization. In: 27th IEEE International Joint Conference on Neural Networks (IJCNN2015), pp. 1–7. IEEE Press, New York (2015)
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vectors, Regulization, Optimization and Beyond. MIT Press, Cambridge (2001)
Holland, J.: Adaptation in Nature and Artificial Systems. MIT Press, Cambridge (1992)
Oh, I.S., Lee, J.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1424–1437 (2004)
Zhang, M., Zhou, Z.: Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, C., Ma, Q., Xu, J. (2018). Multi-label Feature Selection Method Combining Unbiased Hilbert-Schmidt Independence Criterion with Controlled Genetic Algorithm. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11304. Springer, Cham. https://doi.org/10.1007/978-3-030-04212-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-04212-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04211-0
Online ISBN: 978-3-030-04212-7
eBook Packages: Computer ScienceComputer Science (R0)