Advertisement

Applied Intelligence

, Volume 48, Issue 10, pp 3577–3590 | Cite as

Multi-label imbalanced classification based on assessments of cost and value

  • Mengxiao Ding
  • Youlong Yang
  • Zhiqing Lan
Article

Abstract

Multi-label imbalanced data comprise data with a disproportionate number of samples in the classes. Traditional classifiers are more suitable for classifying balanced data because the classification performance declines dramatically when the class sizes are imbalanced in multi-label data. In this study, we propose an algorithm that assesses the cost of the majority class and the value of the minority classes to handle the multi-label imbalanced data classification problem. The main idea of our algorithm is to provide a quantitative assessment of the cost of the majority class and the value of the minority class based on an imbalance ratio. In the data preprocessing step, we employ a penalty function to determine the number of majority class instances for elimination. The contributions of an instance determine whether a majority class instance is to be eliminated. In the classification step, we propose a metric to control the cost of the majority class and the value of the minority class. Experiments showed that this algorithm can improve the performance of multi-label imbalanced data classification.

Keywords

Contribution factor Cost and value Metric Multi-label imbalance classification Penalty function 

Notes

Acknowledgments

The authors thank the editor and anonymous reviewers for their helpful comments and suggestions. This study was supported by the National Natural Science Foundation of China(Grant Nos. 61573266).

References

  1. 1.
    Bielza C, Li G, Larranga P (2011) Multi-dimensional classification with Bayesian network. Int J Proximate Reason 52:705– 727MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Zhang M, Zhou Z (2014) A review on multi-label learning algorithms. IEEE Trans 8:1819–1831Google Scholar
  3. 3.
    Ying Y, Pedrycz W, Miao D (2014) Multi-label classification by exploiting label correlations. Expert Syst Appl 41:2989–3004CrossRefGoogle Scholar
  4. 4.
    Vens C, Struyf J, Schietgat L (2008) Decision trees for hierarchical multi-label classification. Mach Leaning 73:185–214.  https://doi.org/10.1007/s10994-008-5077-3 CrossRefGoogle Scholar
  5. 5.
    Blockeel H, Schietgat L, Struyf J, Dzeroki S et al (2006) Decision tree for hierarchical multilabel classification: a case study in functional genomics, vol 2006. Springer, Berlin, pp 18–29Google Scholar
  6. 6.
    Goncalves T, Quaresma P (2008) A preliminary approach to the multilabel classification problem of portuguese juridical documents, progress in artificial intelligence. EPIA 2003. Springer, Berlin, pp 435–444Google Scholar
  7. 7.
    Hllermeier E, Frnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif Intell 172(16-17):1897–1916MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Tsoumakas G, Vlahavas I (2007) Random k-Labelsets: an ensemble method for multilabel classification. In: Machine learning ECML 2007. Lecture notes in computer science, vol 4701. Springer, Berlin, HeidelbergGoogle Scholar
  9. 9.
    Schapire RE, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39:135–168.  https://doi.org/10.1023/A:1007649029923 CrossRefzbMATHGoogle Scholar
  10. 10.
    Zhang M-L, Zhou Z-H (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048CrossRefzbMATHGoogle Scholar
  11. 11.
    Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Disc 28:92–122.  https://doi.org/10.1007/s10618-012-0295-5 MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Mrquez-Vera C, Cano A, Romero C et al (2013) Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl Intell 38:315–330.  https://doi.org/10.1007/s10489-012-0374-8 CrossRefGoogle Scholar
  13. 13.
    Giraldo-Forero AF, Jaramillo-Garzn JA, Ruiz-Muoz JF, Castellanos-Domnguez CG (2013) Managing imbalanced data sets in multi-label problems: a case study with the SMOTE algorithm. In: Proceedings of the 18th Iberoamerican congress, CIARP 2013. Springer, pp 334–342Google Scholar
  14. 14.
    Lin W, Xu D (2016) Imbalanced Muli-label learning for identifying antimicrobial peptides and their functional types. Bioinformatics.  https://doi.org/10.1093/bioinformatics/btw560
  15. 15.
    Charte F, Rivera A, del Jesus MJ, Herrera F (2013) A first approach to deal with imbalance in multi-label datasets. Springer, Berlin, pp 150–160Google Scholar
  16. 16.
    Akkasi A, Varoglu E, Dimililer N (2017) Balanced undersampling: a novel sentence-based undersampling method to improve recognition of named entities in chemical and biomedical text. Appl Intell.  https://doi.org/10.1007/s10489-017-0920-5
  17. 17.
    Fang M, Xiao Y, Wang C, Xie J (2014) Multi-label classification: dealing with imbalance by combining labels. In: IEEE international conference on TOOLS with artificial intelligence, pp 233–237Google Scholar
  18. 18.
    Zhang M-L, Li Y-K, Liu X-Y (2015) Towards class-imbalance aware multi-label learning. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence, pp 4041–4147Google Scholar
  19. 19.
    Zhang X, Song Q et al (2015) Guangtaowang and a dissimilarity-based imbalance data classification algorithm. Appl Intell 42:544–565.  https://doi.org/10.1007/s10489-014-0610-5 CrossRefGoogle Scholar
  20. 20.
    Yi L, Guo H (2004) Murphey neural learning from unbalanced data. Appl Intell 21:117–128CrossRefzbMATHGoogle Scholar
  21. 21.
    Varando G, Bielza C, Larranga P (2016) Decision function for chain classifiers based on Bayesian network for multi-label classification. Int J Approx Reason 68:164–178MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Varando G, Bielza C, Larranaga P (2014) Expressive power of binary relevance and chain classifiers based on Bayesian networks for multi-label classification. Springer, Berlin, pp 519–534zbMATHGoogle Scholar
  23. 23.
    Varando G, Bielza C, Larranga P (2015) Decision boundary for disctete Bayesian network classifiers. J Mach Learn Res 16:2725–2749MathSciNetzbMATHGoogle Scholar
  24. 24.
    Yang Y, Yan W (2012) On the properties of concept classes induced by multivalued Bayesian network. Infor Sci 184(1):155–165MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. Springer, Berlin, pp 22–30Google Scholar
  26. 26.
    Read J, Pfahringer B, Holmes G et al (2011) Classifier chains for multi-label classification. Mach Learn 85:333–359.  https://doi.org/10.1007/s10994-011-5256-5 MathSciNetCrossRefGoogle Scholar
  27. 27.
    Sucar L, Bielza C, Eduardo F et al (2014) Morales Enrique multi-label classification with Bayesian network-based chain classifiers. Pattern Recogn Lett 41:14–22CrossRefGoogle Scholar
  28. 28.
    O’Donnell R, Rocco A (2010) Servedio new degree bounds for polynomial threshold functions. Combinatorica 30(3):327–358.  https://doi.org/10.1007/s00493-010-2173-3 MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Devi D, Biswas S, Purkayastha B (2017) Redundancy-driven modified Tomek-link based undersampling: a solution to class imbalance. Pattern Recogn Lett 93:3–12Google Scholar
  30. 30.
    Cano A, Luna JM, Gibaja EL, Ventura S (2016) Laim discretization for multi-label data. Inform Sci 330(C):370–384CrossRefGoogle Scholar
  31. 31.
    Jiang L, Li C, Wang S et al (2016) Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39CrossRefGoogle Scholar
  32. 32.
    Jiang L, Cai Z, Wang D et al (2012) Improving tree augmented naive Bayes for class probability estimation. Knowl-Based Syst 26:239–245CrossRefGoogle Scholar
  33. 33.
    Melki G, Cano A, Kecman V et al (2017) Multi-target support vector regression via correlation regressor chains. Inform Sci 415– 416:53–69Google Scholar
  34. 34.
    Petterson J, Caetano T (2010) Reverse multi-label learning. Advan Neural Inform Process Syst 23:1912–1920Google Scholar
  35. 35.
    Charte F, Rivera AJ, del Jesus MJ et al (2015) Addressing imbalance in multilabel classification; Measures and random resampling algorithms. Neurocomputing 163:3–16CrossRefGoogle Scholar
  36. 36.
    Charte F, Rivera AJ, del Jesus MJ et al (2014) MLeNN: a first approach to heuristic multilabel undersampling. In: International conference on intelligent data engineering and automated learning. Springer International Publishing, pp 1–9Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Mathematics and StatisticsXidian UniversityXi’anChina
  2. 2.College of Mathematics and Computer ScienceHunan Normal UniversityChangshaChina

Personalised recommendations