Skip to main content

Non-classical Imbalanced Classification Problems

  • Chapter
  • First Online:
Learning from Imbalanced Data Sets

Abstract

Most of the research in class imbalance are carried out in standard (binary or multi-class) classification problems. However, in recent years, researchers have addressed new classification frameworks beyond standard classification in different aspects. Several variations of class imbalance problem appear within these frameworks. This chapter reviews the problem of class imbalance for a spectrum of these non-classical problems. Throughout this chapter, in Sect. 12.2 some research studies related to class imbalance where only partially labeled data is available (SSL) are reviewed. Then, in Sect. 12.3 the problem of label imbalance in problems where more than a label can be associated to an instance (Multilabel Learning) is discussed. In Sect. 12.4 the problem of class imbalance when labels are associated to bags of instances, rather than individually (Multi-instance Learning), is analyzed. Next, Sect. 12.5 refers to the problem of class imbalance when there exists an ordinal relation among classes (Ordinal Classification). Finally, in Sect. 12.6 some concluding remarks are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Multilabel learning differs from multi-class classifier as in the latter only one label, from a set larger than two possible classes, is associated to each instance.

References

  1. Attenberg, J., Ertekin, S.: Class imbalance and active learning. In: He, H., Ma, Y. (eds.) Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 101–149. IEEE Press/Wiley, Hoboken (2013)

    Chapter  Google Scholar 

  2. Attenberg, J., Provost, F.: Why label when you can search? Alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 423–432. ACM (2010)

    Google Scholar 

  3. Attenberg, J., Provost, F.: Inactive learning? Difficulties employing active learning in practice. ACM SIGKDD Explor. Newsl. 12(2), 36–41 (2011)

    Article  Google Scholar 

  4. Baccianella, S., Esuli, A., Sebastiani, F.: Evaluation measures for ordinal regression. In: Ninth International Conference on Intelligent Systems Design and Applications, ISDA’09, Pisa, 30 Nov–2 Dec 2009, pp. 283–287 (2009)

    Google Scholar 

  5. Balcan, M.F., Hanneke, S.: Robust interactive learning. In: Conference on Learning Theory, New York, pp. 20–1 (2012)

    Google Scholar 

  6. Beygelzimer, A., Hsu, D.J., Langford, J., Zhang, C.: Search improves label for active learning. In: Advances in Neural Information Processing Systems, pp. 3342–3350 (2016)

    Google Scholar 

  7. Bloodgood, M., Vijay-Shanker, K.: Taking into account the differences between actively and passively acquired data: the case of active learning with support vector machines for imbalanced datasets. In: Proceedings of Human Language Technologies, New York, pp. 137–140. Association for Computational Linguistics (2009)

    Google Scholar 

  8. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, Madison, pp. 92–100. ACM (1998)

    Google Scholar 

  9. Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 31:1–31:50 (2016)

    Article  Google Scholar 

  10. Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542–542 (2009)

    Article  Google Scholar 

  11. Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: A first approach to deal with imbalance in multi-label datasets. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 150–160. Springer, Berlin/Heidelberg (2013)

    Google Scholar 

  12. Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: Concurrence among imbalanced labels and its influence on multilabel resampling algorithms. In: International Conference on Hybrid Artificial Intelligence Systems, Salamanca, pp. 110–121. Springer (2014)

    Google Scholar 

  13. Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015)

    Article  Google Scholar 

  14. Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Mlsmote: approaching imbalanced multilabel learning through synthetic instance generation. Knowl.-Based Syst. 89, 385–397 (2015)

    Article  Google Scholar 

  15. Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Dealing with difficult minority labels in imbalanced mutilabel data sets. Neurocomputing (2017, in press). https://doi.org/10.1016/j.neucom.2016.08.158

  16. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  17. Chen, Y., Wang, G., Dong, S.: Learning with progressive transductive support vector machine. Pattern Recogn. Lett. 24(12), 1845–1855 (2003)

    Article  Google Scholar 

  18. Chen, K., Lu, B.L., Kwok, J.T.: Efficient classification of multi-label and imbalanced data using min-max modular classifiers. In: International Joint Conference on Neural Networks (IJCNN’06), Vancouver, pp. 1770–1775. IEEE (2006)

    Google Scholar 

  19. Cieslak, D.A., Hoens, T.R., Chawla, N.V., Kegelmeyer, W.P.: Hellinger distance decision trees are robust and skew-insensitive. Data Min. Knowl. Disc. 24(1), 136–158 (2012)

    Article  MathSciNet  Google Scholar 

  20. Cruz-Ramírez, M., Hervás-Martínez, C., Sánchez-Monedero, J., Gutiérrez, P.A.: Metrics to guide a multi-objective evolutionary algorithm for ordinal classification. Neurocomputing 135, 21–31 (2014)

    Article  Google Scholar 

  21. Daniels, Z.A., Metaxas, D.N.: Addressing imbalance in multi-label classification using structured Hellinger forests. In: Thirty-First AAAI Conference on Artificial Intelligence, San Francisco (2017)

    Google Scholar 

  22. Dembczynski, K., Jachnik, A., Kotlowski, W., Waegeman, W., Hüllermeier, E.: Optimizing the f-measure in multi-label classification: plug-in rule approach versus structured loss minimization. ICML 28(3), 1130–1138 (2013)

    Google Scholar 

  23. Dendamrongvit, S., Kubat, M.: Undersampling approach for imbalanced training sets and induction from multi-label text-categorization domains. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Melbourne, pp. 40–52. Springer (2009)

    Chapter  Google Scholar 

  24. Ertekin, S.: Adaptive oversampling for imbalanced data classification. In: Proceedings of the 28th International Symposium on Computer and Information Sciences, Paris. Lecture Notes in Electrical Engineering, vol. 264, pp. 261–269. Springer (2013)

    Google Scholar 

  25. Ertekin, S., Huang, J., Bottou, L., Giles, L.: Learning on the border: active learning in imbalanced data classification. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 127–136. ACM (2007)

    Google Scholar 

  26. Gammerman, A., Vovk, V., Vapnik, V.: Learning by transduction. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, pp. 148–155. Morgan Kaufmann Publishers Inc. (1998)

    Google Scholar 

  27. Giraldo-Forero, A.F., Jaramillo-Garzón, J.A., Ruiz-Muñoz, J.F., Castellanos-Domínguez, C.G.: Managing imbalanced data sets in multi-label problems: a case study with the smote algorithm. In: Iberoamerican Congress on Pattern Recognition, La Havana, pp. 334–342. Springer (2013)

    Google Scholar 

  28. Gutiérrez, P.A., Pérez-Ortiz, M., Sánchez-Monedero, J., Fernández-Navarro, F., Hervás-Martínez, C.: Ordinal regression methods: survey and experimental study. IEEE Trans. Knowl. Data Eng. 28(1), 127–146 (2016)

    Article  Google Scholar 

  29. He, J., Gu, H., Liu, W.: Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites. PloS One 7(6), e37155 (2012)

    Article  Google Scholar 

  30. Hernández-González, J., Inza, I., Lozano, J.A.: Weak supervision and other non-standard classification problems: a taxonomy. Pattern Recogn. Lett. 69, 49–55 (2016)

    Article  Google Scholar 

  31. Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J.: Multilabel Classification: Problem Analysis, Metrics and Techniques. Springer, Cham (2016)

    Google Scholar 

  32. Herrera, F., Ventura, S., Bello, R., Cornelis, C., Zafra, A., Sánchez-Tarragó, D., Vluymans, S.: Multiple Instance Learning: Foundations and Algorithms. Springer, Cham (2016)

    Book  Google Scholar 

  33. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)

    Article  Google Scholar 

  34. Jacobusse, G., Veenman, C.: On selection bias with imbalanced classes. In: International Conference on Discovery Science, Bari, pp. 325–340. Springer (2016)

    Chapter  Google Scholar 

  35. Joachims, T.: Transductive inference for text classification using support vector machines. In: International Conference on Machine Learning, Bled, pp. 200–209 (1999)

    Google Scholar 

  36. Juszczak, P., Duin, R.P.: Uncertainty sampling methods for one-class classifiers. In: Proceedings of the ICML, Washington, DC, vol. 3 (2003)

    Google Scholar 

  37. Kim, S., Kim, H., Namkoong, Y.: Ordinal classification of imbalanced data with application in emergency and disaster information services. IEEE Intell. Syst. 31(5), 50–56 (2016)

    Article  Google Scholar 

  38. Kourtis, I., Stamatatos, E.: Author identification using semi-supervised learning. In: CLEF’2011 Conference on Multilingual and Multimodal Information Access Evaluation (Lab and Workshop Notebook Papers), Amsterdam (2011)

    Google Scholar 

  39. Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. Ann Arbor MI 48109, 1092 (2004)

    Google Scholar 

  40. Li, S., Wang, Z., Zhou, G., Lee, S.Y.M.: Semi-supervised learning for imbalanced sentiment classification. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence IJCAI’2011, Barcelona, pp. 1826–1831 (2011)

    Google Scholar 

  41. Melki, G., Cano, A., Ventura, S.: MIRSVM : multi-instance support vector machine with bag representatives. Pattern Recogn. 79, 228–241 (2018)

    Article  Google Scholar 

  42. Mera, C., Orozco-Alzate, M., Branch, J.: Improving representation of the positive class in imbalanced multiple-instance learning. In: International Conference Image Analysis and Recognition, Vilamoura, pp. 266–273. Springer (2014)

    Google Scholar 

  43. Mera, C., Arrieta, J., Orozco-Alzate, M., Branch, J.: A bag oversampling approach for class imbalance in multiple instance learning. In: Iberoamerican Congress on Pattern Recognition, pp. 724–731. Springer (2015)

    Google Scholar 

  44. Nekooeimehr, I., Lai-Yuen, S.K.: Cluster-based weighted oversampling for ordinal regression (CWOS-Ord). Neurocomputing 218, 51–60 (2016)

    Article  Google Scholar 

  45. Pakrashi, A., Mac Namee, B.: Stacked-MLkNN: a stacking based improvement to multi-label k-nearest neighbours. In: First International Workshop on Learning with Imbalanced Domains: Theory and Applications, pp. 51–63 (2017)

    Google Scholar 

  46. Pang, S., Ban, T., Kadobayashi, Y., Kasabov, N.: Personalized mode transductive spanning SVM classification tree. Inf. Sci. 181(11), 2071–2085 (2011)

    Article  Google Scholar 

  47. Pathak, D., Shelhamer, E., Long, J., Darrell, T.: Fully convolutional multi-class multiple instance learning. In: International Conference on Learning Representations (ICLR) Workshop, San Diego, arXiv:1412.7144 (2015)

    Google Scholar 

  48. Pérez-Ortiz, M., Gutiérrez, P.A., Hervás-Martínez, C., Yao, X.: Graph-based approaches for over-sampling in the context of ordinal regression. IEEE Trans. Knowl. Data Eng. 27(5), 1233–1245 (2015)

    Article  Google Scholar 

  49. Pérez-Ortiz, M., Sáez, A., Sánchez-Monedero, J., Gutiérrez, P.A., Hervás-Martínez, C.: Tackling the ordinal and imbalance nature of a melanoma image classification problem. In: 2016 International Joint Conference on Neural Networks, IJCNN’2016, Vancouver, 24–29 July 2016, pp. 2156–2163 (2016)

    Google Scholar 

  50. Prez-Ortiz, M., Gutirrez, P., Aylln-Tern, M., Heaton, N., Ciria, R., Briceo, J., Hervs-Martnez, C.: Synthetic semi-supervised learning in imbalanced domains. Knowl.-Based Syst. 123(C), 75–87 (2017)

    Article  Google Scholar 

  51. Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMS: a case study. ACM SIGKDD Explor. Newsl. 6(1), 60–69 (2004)

    Article  Google Scholar 

  52. Stamatatos, E.: Author identification using imbalanced and limited training texts. In: 18th International Workshop on Database and Expert Systems Applications (DEXA’07), pp. 237–241. IEEE (2007)

    Google Scholar 

  53. Stanescu, A., Caragea, D.: Semi-supervised self-training approaches for imbalanced splice site datasets. In: Proceedings of the Sixth International Conference on Bioinformatics and Computational Biology, BICoB’2014, Las Vegas, pp. 131–136 (2014)

    Google Scholar 

  54. Sun, K.W., Lee, C.H.: Addressing class-imbalance in multi-label learning via two-stage multi-label hypernetwork. Neurocomputing 266, 375–389 (2017)

    Article  Google Scholar 

  55. Tahir, M.A., Kittler, J., Bouridane, A.: Multilabel classification using heterogeneous ensemble of multi-label classifiers. Pattern Recogn. Lett. 33(5), 513–523 (2012)

    Article  Google Scholar 

  56. Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn. 45(10), 3738–3750 (2012)

    Article  Google Scholar 

  57. Tepvorachai, G., Papachristou, C.: Multi-label imbalanced data enrichment process in neural net classifier training. In: IEEE International Joint Conference on Neural Networks (IJCNN’2008), Hong Kong, pp. 1301–1307. IEEE (2008)

    Google Scholar 

  58. Tomanek, K., Hahn, U.: Reducing class imbalance during active learning for named entity annotation. In: Proceedings of the Fifth International Conference on Knowledge Capture, Redondo Beach, pp. 105–112. ACM (2009)

    Google Scholar 

  59. Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC-6(6), 448–452 (1976)

    Article  MathSciNet  Google Scholar 

  60. Torgo, L., Branco, P., Ribeiro, R.P., Pfahringer, B.: Resampling strategies for regression. Exp. Syst. 32(3), 465–476 (2015)

    Article  Google Scholar 

  61. Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience, New York/Chichester (1998)

    MATH  Google Scholar 

  62. Vluymans, S., Tarragó, D.S., Saeys, Y., Cornelis, C., Herrera, F.: Fuzzy rough classifiers for class imbalanced multi-instance data. Pattern Recogn. 53, 36–45 (2016)

    Article  Google Scholar 

  63. Waegeman, W., Baets, B.D., Boullart, L.: ROC analysis in ordinal regression learning. Pattern Recogn. Lett. 29(1), 1–9 (2008)

    Article  Google Scholar 

  64. Wang, J., Chang, S.F., Zhou, X., Wong, S.T.: Active microscopic cellular image annotation by superposable graph transduction with imbalanced labels. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’2008), Anchorage, pp. 1–8. IEEE (2008)

    Google Scholar 

  65. Wang, J., Jebara, T., Chang, S.F.: Graph transduction via alternating minimization. In: Proceedings of the 25th International Conference on Machine Learning, Helsinki, pp. 1144–1151. ACM (2008)

    Google Scholar 

  66. Wang, X., Liu, X., Japkowicz, N., Matwin, S.: Resampling and cost-sensitive methods for imbalanced multi-instance learning. In: 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW), Dallas, pp. 808–816. IEEE (2013)

    Google Scholar 

  67. Wang, X., Matwin, S., Japkowicz, N., Liu, X.: Cost-sensitive boosting algorithms for imbalanced multi-instance datasets. In: Canadian Conference on Artificial Intelligence, Regina, pp. 174–186. Springer (2013)

    Google Scholar 

  68. Wang, A., Liu, L., Jin, X., Li, Y.: Adapting TSVM for fault diagnosis with imbalanced class data. In: Control and Decision Conference (CCDC), 2016 Chinese, Yinchuan, pp. 2919–2923. IEEE (2016)

    Google Scholar 

  69. Wang, S., Minku, L.L., Yao, X.: A systematic study of online class imbalance learning with concept drift. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4802–4821 (2018)

    Article  Google Scholar 

  70. Xu, X., Li, B.: Multiple class multiple-instance learning and its application to image categorization. Int. J. Image Graph. 7(3), 427–444 (2007)

    Article  MathSciNet  Google Scholar 

  71. Youngs, N., Shasha, D., Bonneau, R.: Positive-unlabeled learning in the face of labeling bias. In: 2015 IEEE International Conference on Data Mining Workshop (ICDMW), New Jersey, pp. 639–645. IEEE (2015)

    Google Scholar 

  72. Zhang, M.L., Li, Y.K., Liu, X.Y.: Towards class-imbalance aware multi-label learning. In: IJCAI, pp. 4041–4047 (2015)

    Google Scholar 

  73. Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, vol. 16, pp. 321–328. MIT Press, Cambridge (2004)

    Google Scholar 

  74. Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)

    Article  Google Scholar 

  75. Zhu, J., Hovy, E.H.: Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In: EMNLP-CoNLL, vol. 7, pp. 783–790 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F. (2018). Non-classical Imbalanced Classification Problems. In: Learning from Imbalanced Data Sets. Springer, Cham. https://doi.org/10.1007/978-3-319-98074-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98074-4_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98073-7

  • Online ISBN: 978-3-319-98074-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics