Effective and Efficient Multi-label Feature Selection Approaches via Modifying Hilbert-Schmidt Independence Criterion

  • Jianhua XuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9949)


Hilbert-Schmidt independence criterion (HSIC) is a nonparametric dependence measure to depict all modes of dependencies between two sets of variables via matrix trace. When this criterion with linear feature and label kernels is directly applied to multi-label feature selection, an efficient feature ranking is achieved using diagonal elements, which considers only feature-label relevance. But non-diagonal elements essentially characterize feature-feature conditional redundancy. In this paper, two novel criteria are defined by all matrix elements. For a candidate feature, we both maximize its relevance and minimize its average or maximal redundancy. Then an efficient hybrid strategy combining simple feature ranking and sequential forward selection is implemented, where the former sorts all features in descending order using their relevance and the latter finds out the top discriminative features with relevance maximization and redundancy minimization. Experiments on four data sets illustrate that our proposed methods are effective and efficient, compared with several existing techniques, according to classification performance and computational efficiency.


Multi-label classification Feature selection Feature ranking Hilbert-Schmidt independence criterion Sequential forward selection 


  1. 1.
    Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)Google Scholar
  2. 2.
    Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1338–1351 (2014)CrossRefGoogle Scholar
  3. 3.
    Gibaji, E., Ventura, S.: A tutorial on multilabel learning. ACM Comput. Surv. 47(3), 52:1–52:38 (2015)Google Scholar
  4. 4.
    Sun, L., Ji, S., Ye, J.: Multi-Label Dimensionality Reduction. CRC Press, Boca Raton (2014)Google Scholar
  5. 5.
    Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. In: Aggarwal, C.C. (ed.) Data Classification: Algorithms and Applications, Chap. 2, pp. 37–64. CRC Press, Boca Raton (2014)Google Scholar
  6. 6.
    Zhang, M.L., Pena, J.M., Robles, V.: Feature selection for multi-label naive Bayes classification. Inform. Sci. 179(19), 3218–3229 (2009)CrossRefzbMATHGoogle Scholar
  7. 7.
    Yin, J., Tao, T., Xu, J.: A multi-label feature selection algorithm based on multi-objective optimization. In: Proceedings of 2015 International Joint Conference on Neural Networks (IJCNN 2015), pp. 1–7. IEEE Press, New York (2015)Google Scholar
  8. 8.
    Gu, Q., Li, Z., Han, J.: Correlated multi-label feature selection. In: Proceedings of ACM International Conference on Information and Knowledge Management (CIKM2011), pp. 1087–1096. ACM, New York (2011)Google Scholar
  9. 9.
    Spolaor, N., Cherman, E., Monard, M., Lee, H.: A comparison of multi-label feature selection methods using the problem transformation approach. Electron. Notes Theor. Comput. Sci. 292, 135–151 (2013)CrossRefGoogle Scholar
  10. 10.
    Robnik-Sikonja, M., Kononenko, I.: Theoretical and emperical analysis of ReliefF and RReliefF. Mach. Learn. 53(1), 23–69 (2003)CrossRefzbMATHGoogle Scholar
  11. 11.
    Reyes, O., Morell, C., Ventura, S.: Scalable extensions of the relieff algorithm for weighting and selecting features on the multi-label learning context. Neurocomputing 161, 168–182 (2015)CrossRefGoogle Scholar
  12. 12.
    Lee, J., Kim, W.: Feature selection for multi-label classification using multivariate mutual information. Pattern Recogn. Lett. 34(3), 349–357 (2013)CrossRefGoogle Scholar
  13. 13.
    Lee, J., Kim, D.W.: Mutual information-based multi-label feature selection using interaction information. Expert Syst. Appl. 42(4), 2013–2025 (2015)CrossRefGoogle Scholar
  14. 14.
    Lin, Y., Hu, Q., Liu, J., Duan, J.: Multi-label feature selection based on max-dependency and min-redundancy. Neurocompting 168, 92–103 (2015)CrossRefGoogle Scholar
  15. 15.
    Lee, J., Kim, D.W.: Fast multi-label feature selection based on information-theoretic feature ranking. Pattern Recogn. 48(9), 2761–2771 (2015)CrossRefGoogle Scholar
  16. 16.
    Gretton, A., Bousquet, O., Smola, A.J., Schölkopf, B.: Measuring statistical dependence with Hilbert-Schmidt norms. In: Jain, S., Simon, H.U., Tomita, E. (eds.) ALT 2005. LNCS (LNAI), vol. 3734, pp. 63–77. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Song, L., Smola, A., Gretton, A., Bedo, J., Borgwardt, K.: Feature selection via dependency maximization. J. Mach. Learn. Res. 13, 1393–1434 (2012)MathSciNetzbMATHGoogle Scholar
  18. 18.
  19. 19.
    Zhang, M.L., Zhou, Z.H.: ML-kNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(5), 2038–2048 (2007)CrossRefzbMATHGoogle Scholar
  20. 20.
    Brazdil, P.B., Soares, C.: A comparison of ranking methods for classification algorithm selection. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 63–74. Springer, Heidelberg (2000)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyNanjing Normal UniversityNanjingChina

Personalised recommendations