Neural Processing Letters

, Volume 50, Issue 3, pp 2323–2344 | Cite as

Alignment Based Feature Selection for Multi-label Learning

  • Linlin Chen
  • Degang ChenEmail author


Multi-label learning deals with data sets in which each example is assigned with a set of labels, and the goal is to construct a learning model to predict the label set for unseen examples. Multi-label data sets share the same problems with single-label data sets that usually possess high-dimensional features and may exist redundancy features, which will influence the performance of the algorithm. Thus it is obviously necessary to address feature selection in multi-label learning. Meanwhile, information among labels play an important role in multi-label learning, thereby it is significance to measure information among labels in order to improve the performance of learning algorithms. In this paper, we introduce kernel alignment into multi-label learning to measure the consistency between feature space and label space by which features are ranked and selected. Firstly we define an ideal kernel in label space as a convex combination of ideal kernels defined by each label, and a linear combination of kernels where each kernel corresponds to a feature. Secondly, through maximizing the kernel alignment value between linear combination kernel and ideal kernel, both weights in the two defined kernels are learned in this process simultaneously, and the learned weights of labels can be employed as the degree of labeling importance regarded as a kind of information among labels. Finally, features are ranked according to their weights in linear combined kernel, and a proper feature subset consisting of top ranking features is selected. Thus a novel method of feature selection for multi-label learning is developed which can learn and address importance degree of labels automatically, and effectiveness of this method is demonstrated by experimental comparisons.


Multi-label learning Feature selection Kernel alignment Label weight 



This work is supported by grants of NSFC (71471060).


  1. 1.
    Liu W, Xu D, Tsang I, Zhang W (2018) Metric learning for multi-output tasks. IEEE Trans Pattern Anal Mach Intell 41:408–422CrossRefGoogle Scholar
  2. 2.
    Sanden C, Zhang JZ (2011) Enhancing multi-label music genre classification through ensemble techniques. In: Proceedings of the 34th annual international ACM SIGIR conference on research and development in information retrieval. pp 705–714Google Scholar
  3. 3.
    Barutcuoglu Z, Schapire RE, Troyanskaya OG (2006) Hierarchical multi-label prediction of gene function. Bioinformatics 22(7):830–836CrossRefGoogle Scholar
  4. 4.
    Qi G-J, Hua X-S, Rui Y, Tang J, Mei T, Zhang H-J (2007) Correlative multi-label video annotation. In: Proceedings of the 15th ACM international conference on multimedia. pp 17–26Google Scholar
  5. 5.
    Tang LL, Rajan S, Narayanan VK (2009) Large scale multi-label classification via metalabeler. In Proceedings of the 19th international conference on World Wide Web. pp 211–220Google Scholar
  6. 6.
    Liu W, Tsang I (2017) Making decision trees feasible in ultrahigh feature and label dimensions. J Mach Learn Res 18:1–36MathSciNetzbMATHGoogle Scholar
  7. 7.
    Liu W, Tsang I, Muller K (2017) An Easy-to-hard learning paradigm for multiple classes and multiple labels. J Mach Learn Res 18:1–38MathSciNetzbMATHGoogle Scholar
  8. 8.
    Chen W, Yan J, Zhang B, Chen Z, Yang Q (2007) Document transformation for multi-label feature selection in text categorization. In: Proceedings of seventh IEEE international conference on data mining (ICDM’07), vol 80, No 1–3. pp 451–456Google Scholar
  9. 9.
    Trohidis K, Tsoumakas G, Kalliris G, Vlahavas IP (2008) Multilabel classification of music into emotions. In: Proceedings of ninth international conference music information retrieval (ISMIR’08). Philadelphia. pp 325–330Google Scholar
  10. 10.
    Doquire G, Verleysen M (2011) Feature selection for multi-label classification problems. In: International work-conference on artificial neural networks, vol 6691. pp 9–16CrossRefGoogle Scholar
  11. 11.
    Zhang M, Peria J, Robles V (2009) Feature selection for multi-label naive Bayes classification. Inf Sci 179:3218–3229CrossRefGoogle Scholar
  12. 12.
    Zhang Y, Zhou Z (2010) Multi-label dimensionality reduction via dependence maximization. ACM Trans Knowl Discov Data 4:1–21CrossRefGoogle Scholar
  13. 13.
    Gretton A, Bousquet O, Smola AJ, SchÄolkopf B (2005) Measuring statistical dependence with Hilbert–Schmidt norms. In: Proceedings of the 16th international conference on algorithmic learning theory. Singapore, pp 63–77Google Scholar
  14. 14.
    Spolaor N, Cherman E, Monard M (2011) Using ReliefF for multi-label feature selection. In: Conferencia Latinoamericana de Informatica. pp 960–975Google Scholar
  15. 15.
    Robnik-Sikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1–2):23–69CrossRefGoogle Scholar
  16. 16.
    Lee J, Kim D (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recognit Lett 34:349–357CrossRefGoogle Scholar
  17. 17.
    Lin Y-J, Hu Q-H, Liu J-H, Li J-J, Wu X-D (2017) Streaming feature selection for multi-label learning based on fuzzy mutual information. IEEE Trans Fuzzy Syst 25(6):1491–1507CrossRefGoogle Scholar
  18. 18.
    Xu J-H, Ma Q (2018) Multi-label regularized quadratic programming feature selection algorithm with Frank-Wolfe method. Expert Syst Appl 95:14–31CrossRefGoogle Scholar
  19. 19.
    Liu J, Lin Y, Wu S, Wang C (2018) Online Multi-label Group Feature Selection. Knowl-Based Syst 143:42–57CrossRefGoogle Scholar
  20. 20.
    Zhu P-F, Xu Q, Hu Q-H, Zhang C-Q, Zhao H (2018) Multi-label feature selection with missing labels. Pattern Recognit 74:488–502CrossRefGoogle Scholar
  21. 21.
    Li F, Miao D, Pedrycz W (2017) Granular multi-label feature selection based on mutual information. Pattern Recognit 67:410–423CrossRefGoogle Scholar
  22. 22.
    Lee J, Kim DW (2017) SCLS: multi-label feature selection based on scalable criterion for large label set. Pattern Recognit 66:342–352MathSciNetCrossRefGoogle Scholar
  23. 23.
    Teisseyre P (2017) CCnet: joint multi-label classification and feature selection using classifier chains and elastic net regularization. Neurocomputing 235:98–111CrossRefGoogle Scholar
  24. 24.
    Cheng W, Dembczy´nski K, Hüllermeier E (2010) Graded multilabel classification: the ordinal case. In Proceedings of the 27th international conference on machine learning. Haifa, pp 223–230Google Scholar
  25. 25.
    Xu M, Li Y-F, Zhou Z-H (2013) Multi-label learning with PRO loss. In: Proceedings of the 27th AAAI conference on artificial intelligence. Bellevue, pp 998–1004Google Scholar
  26. 26.
    Geng X, Yin C, Zhou Z-H (2013) Facial age estimation by learning from label distributions. IEEE Trans Pattern Anal Mach Intell 35(10):2401–2412CrossRefGoogle Scholar
  27. 27.
    Geng X (2016) Label distribution learning. IEEE T Knowl Data Eng 28(7):1734–1748CrossRefGoogle Scholar
  28. 28.
    Gao N, Huang S-J, Chen S (2016) Multi-label active learning by model guided distribution matching. Front Comput Sci-chi 10(5):845–855CrossRefGoogle Scholar
  29. 29.
    Li Y-K, Zhang M-L, Geng X (2015) Leveraging implicit relative labeling-importance information for effective multi-label learning. In: Proceedings of the 15th IEEE international conference on data mining. Atlantic City, pp 251–260Google Scholar
  30. 30.
    Cristianini N, Elisseeff A, Shawe-Taylor J, Kandola J (2001) On kernel-target alignment. In: Neural information processing systems 14 (NIPS 14)Google Scholar
  31. 31.
    Cristianini N, Kandola J, Elisseeff A, Shawe-Taylor J (2006) On kernel target alignment. In Innovations machine learning. pp 205–256Google Scholar
  32. 32.
    Wang T, Zhao D, Tian S (2015) An overview of kernel alignment and its applications. Artif Intell Rev 43:179–192CrossRefGoogle Scholar
  33. 33.
    Cortes C, Mohri M, Rostamizadeh A (2012) Algorithms for learning kernels based on centered alignment. J Mach Learn Res 13:795–828MathSciNetzbMATHGoogle Scholar
  34. 34.
    Kandola J, Shawe-Taylor J, Cristianini N (2002a) On the extensions of kernel alignment. Technical report 120, Department of Computer Science, University of LondonGoogle Scholar
  35. 35.
    Kandola J, Shawe-Taylor J, Cristianini N (2002b) Optimizing kernel alignment over combinations of kernels. Technical report 121, Department of Computer Science, University of LondonGoogle Scholar
  36. 36.
    Igel C, Glasmachers T, Mersch B, Pfeifer N, Meinicke P (2007) Gradient-based optimization of kernel-target alignment for sequence kernels applied to bacterial gene start detection. IEEE/ACM Trans Comput Biol Bioinform 4(2):1–11CrossRefGoogle Scholar
  37. 37.
    Wong WW, Burkowski FJ (2011) Using kernel alignment to select features of molecular descriptors in a QSAR study. IEEE/ACM Trans Comput Biol Bioinform 8(5):1373–1384CrossRefGoogle Scholar
  38. 38.
    Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Wareh 3(3):1–13CrossRefGoogle Scholar
  39. 39.
    Boutell M, Luo J, Shen X, Brown C (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771CrossRefGoogle Scholar
  40. 40.
    Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359MathSciNetCrossRefGoogle Scholar
  41. 41.
    Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: Proceedings of the 8th IEEE international conference on data mining. Pisa, pp 995–1000Google Scholar
  42. 42.
    Tsoumakas G, Vlahavas I (2007) Random k-label sets: an ensemble method for multi-label classification. In: Proceedings of the 18th European Conference on Machine Learning. Springer, Warsaw, pp 406–417Google Scholar
  43. 43.
    Zhang M-L, Zhou Z-H (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048CrossRefGoogle Scholar
  44. 44.
    Zhang M-L, Zhou Z-H (2006) Multi-label neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351CrossRefGoogle Scholar
  45. 45.
    Zhang M-L, Wang Z-J (2009) MIMLRBF: RBF neural networks for multi-instance multi-label learning. Neurocomputing 72(16):3951–3956CrossRefGoogle Scholar
  46. 46.
    Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2/3):135–168CrossRefGoogle Scholar
  47. 47.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Control and Computer EngineeringNorth China Electric Power UniversityBeijingChina
  2. 2.School of Mathematics and PhysicsNorth China Electric Power UniversityBeijingChina

Personalised recommendations