Multi-label Feature Selection Method Based on Multivariate Mutual Information and Particle Swarm Optimization

  • Xidong Wang
  • Lei Zhao
  • Jianhua XuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11304)


Multi-label feature selection has become an indispensable pre-processing step to deal with possible irrelevant and redundant features, to decrease computational burdens, improve classification performance and enhance model interpretability, in multi-label learning. Mutual information (MI) between two random variables is widely used to describe feature-label relevance and feature-feature redundancy. Furthermore, multivariate mutual information (MMI) is approximated via limiting three-degree interactions to speed up its computation, and then is used to characterize relevance between selected feature subset and label subset. In this paper, we combine MMI-based relevance with MI-based redundancy to define a new max-relevance and min-redundancy feature selection criterion (simply MMI). To search for a globally optimal solution, we add an auxiliary mutation operation to existing binary particle swarm optimization with mutation to control the number of selected features strictly to form a new PSO variant: M2BPSO. Integrating MMI with M2BPSO builds a novel multi-label feature selection method: MMI-PSO. The experiments on four benchmark data sets demonstrate the effectiveness of our proposed algorithm, according to four instance-based classification evaluation metrics, compared with three state-of-the-art feature selection approaches.


Multi-label classification Feature selection Multivariate mutual information Particle swarm optimization Mutation operation 



This work was supported by the Natural Science Foundation of China (NSFC) under Grant 61273246.


  1. 1.
    Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehouse Min. 3(3), 1–13 (2007)CrossRefGoogle Scholar
  2. 2.
    Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1338–1351 (2014)CrossRefGoogle Scholar
  3. 3.
    Gibaja, E., Ventura, S.: A tutorial on multilabel learning. ACM Comput. Surv. 47(3), 1–38 (2015). Article 52CrossRefGoogle Scholar
  4. 4.
    Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J.: Multilabel Classification: Problem Analysis, Metrics and Techniques. Springer, Switzerland (2016). Scholar
  5. 5.
    Kashef, S., Nezamabadi-pour, H., Nipour, B.: Multilabel feature selection: a comprehensiove review and guide experiments. WIREs Data Min. Knowl. Discov. 8(2), e1240 (2018)CrossRefGoogle Scholar
  6. 6.
    Pereira, R., Plastino, A., Zadrozny, B., Merschmann, L.H.C.: Categorizing feature selection methods for multi-label classification. Artif. Intell. Rev. 49(1), 57–78 (2018)CrossRefGoogle Scholar
  7. 7.
    Vergara, J.R., Estevez, P.A.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24(1), 175–186 (2014)CrossRefGoogle Scholar
  8. 8.
    Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)CrossRefGoogle Scholar
  9. 9.
    McGill, W.J.: Multivariate information transmission. Trans. IRE Prof. Group Inf. Theor. 4(4), 93–111 (1954)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Lee, J., Kim, D.W.: Feature selection for multi-label classification using multivariate mutual information. Pattern Recognit. Lett. 34(3), 349–357 (2013)CrossRefGoogle Scholar
  11. 11.
    Lee, J., Kim, D.W.: Fast multi-label feature selection based on information-theoretic feature ranking. Pattern Recognit. 48(9), 2761–2771 (2015)CrossRefGoogle Scholar
  12. 12.
    Lin, Y., Hu, Q., Liu, J., Duan, J.: Multi-label feature selection based on max-dependency and min-redundancy. Neurocomputing 168, 92–103 (2015)CrossRefGoogle Scholar
  13. 13.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criterion of max-dependency, max-relevance and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  14. 14.
    Lin, Y., Hu, Q., Liu, J., Chen, J., Duan, J.: Multi-label feature selection based on neighborhood mutual information. Appl. Soft Comput. 38, 244–256 (2016)CrossRefGoogle Scholar
  15. 15.
    Lee, J., Kim, D.W.: Mutual information-based multi-label feature selection using interaction information. Expert Syst. Appl. 42(4), 2013–2025 (2015)CrossRefGoogle Scholar
  16. 16.
    Lee, J., Kim, D.: Memetic feature selection algorithm for multi-label classification. Inf. Sci. 293(293), 80–96 (2015)CrossRefGoogle Scholar
  17. 17.
    Lim, H., Lee, J., Kim, D.W.: Multi-label learning using mathematical programming. IEICE Trans. Inform. Syst. 98(1), 197–200 (2015)CrossRefGoogle Scholar
  18. 18.
    Lim, H., Lee, J., Kim, D.W.: Low-rank approximation for multi-label feature selection. Int. J. Mach. Learn. Comput. 6(1), 42–46 (2016)Google Scholar
  19. 19.
    Xu, J., Ma, Q.: Multi-label regularized quadratic programming feature selection algorithm with frank-wolfe method. Expert Syst. Appl. 95, 14–31 (2018)CrossRefGoogle Scholar
  20. 20.
    Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization: an overview. Swarm Intell. 1(1), 33–57 (2007)CrossRefGoogle Scholar
  21. 21.
    Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)CrossRefGoogle Scholar
  22. 22.
    Zhang, M., Zhou, Z.: ML-kNN: A lazy approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)CrossRefGoogle Scholar
  23. 23.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyNanjing Normal UniversityNanjingChina

Personalised recommendations