Non-classical Imbalanced Classification Problems

Fernández, Alberto; García, Salvador; Galar, Mikel; Prati, Ronaldo C.; Krawczyk, Bartosz; Herrera, Francisco

doi:10.1007/978-3-319-98074-4_12

Alberto Fernández⁷,
Salvador García⁷,
Mikel Galar⁸,
Ronaldo C. Prati⁹,
Bartosz Krawczyk¹⁰ &
…
Francisco Herrera¹¹

6400 Accesses

Abstract

Most of the research in class imbalance are carried out in standard (binary or multi-class) classification problems. However, in recent years, researchers have addressed new classification frameworks beyond standard classification in different aspects. Several variations of class imbalance problem appear within these frameworks. This chapter reviews the problem of class imbalance for a spectrum of these non-classical problems. Throughout this chapter, in Sect. 12.2 some research studies related to class imbalance where only partially labeled data is available (SSL) are reviewed. Then, in Sect. 12.3 the problem of label imbalance in problems where more than a label can be associated to an instance (Multilabel Learning) is discussed. In Sect. 12.4 the problem of class imbalance when labels are associated to bags of instances, rather than individually (Multi-instance Learning), is analyzed. Next, Sect. 12.5 refers to the problem of class imbalance when there exists an ordinal relation among classes (Ordinal Classification). Finally, in Sect. 12.6 some concluding remarks are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Multilabel learning differs from multi-class classifier as in the latter only one label, from a set larger than two possible classes, is associated to each instance.

References

Attenberg, J., Ertekin, S.: Class imbalance and active learning. In: He, H., Ma, Y. (eds.) Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 101–149. IEEE Press/Wiley, Hoboken (2013)
Chapter Google Scholar
Attenberg, J., Provost, F.: Why label when you can search? Alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 423–432. ACM (2010)
Google Scholar
Attenberg, J., Provost, F.: Inactive learning? Difficulties employing active learning in practice. ACM SIGKDD Explor. Newsl. 12(2), 36–41 (2011)
Article Google Scholar
Baccianella, S., Esuli, A., Sebastiani, F.: Evaluation measures for ordinal regression. In: Ninth International Conference on Intelligent Systems Design and Applications, ISDA’09, Pisa, 30 Nov–2 Dec 2009, pp. 283–287 (2009)
Google Scholar
Balcan, M.F., Hanneke, S.: Robust interactive learning. In: Conference on Learning Theory, New York, pp. 20–1 (2012)
Google Scholar
Beygelzimer, A., Hsu, D.J., Langford, J., Zhang, C.: Search improves label for active learning. In: Advances in Neural Information Processing Systems, pp. 3342–3350 (2016)
Google Scholar
Bloodgood, M., Vijay-Shanker, K.: Taking into account the differences between actively and passively acquired data: the case of active learning with support vector machines for imbalanced datasets. In: Proceedings of Human Language Technologies, New York, pp. 137–140. Association for Computational Linguistics (2009)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, Madison, pp. 92–100. ACM (1998)
Google Scholar
Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 31:1–31:50 (2016)
Article Google Scholar
Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542–542 (2009)
Article Google Scholar
Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: A first approach to deal with imbalance in multi-label datasets. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 150–160. Springer, Berlin/Heidelberg (2013)
Google Scholar
Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: Concurrence among imbalanced labels and its influence on multilabel resampling algorithms. In: International Conference on Hybrid Artificial Intelligence Systems, Salamanca, pp. 110–121. Springer (2014)
Google Scholar
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015)
Article Google Scholar
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Mlsmote: approaching imbalanced multilabel learning through synthetic instance generation. Knowl.-Based Syst. 89, 385–397 (2015)
Article Google Scholar
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Dealing with difficult minority labels in imbalanced mutilabel data sets. Neurocomputing (2017, in press). https://doi.org/10.1016/j.neucom.2016.08.158
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Chen, Y., Wang, G., Dong, S.: Learning with progressive transductive support vector machine. Pattern Recogn. Lett. 24(12), 1845–1855 (2003)
Article Google Scholar
Chen, K., Lu, B.L., Kwok, J.T.: Efficient classification of multi-label and imbalanced data using min-max modular classifiers. In: International Joint Conference on Neural Networks (IJCNN’06), Vancouver, pp. 1770–1775. IEEE (2006)
Google Scholar
Cieslak, D.A., Hoens, T.R., Chawla, N.V., Kegelmeyer, W.P.: Hellinger distance decision trees are robust and skew-insensitive. Data Min. Knowl. Disc. 24(1), 136–158 (2012)
Article MathSciNet Google Scholar
Cruz-Ramírez, M., Hervás-Martínez, C., Sánchez-Monedero, J., Gutiérrez, P.A.: Metrics to guide a multi-objective evolutionary algorithm for ordinal classification. Neurocomputing 135, 21–31 (2014)
Article Google Scholar
Daniels, Z.A., Metaxas, D.N.: Addressing imbalance in multi-label classification using structured Hellinger forests. In: Thirty-First AAAI Conference on Artificial Intelligence, San Francisco (2017)
Google Scholar
Dembczynski, K., Jachnik, A., Kotlowski, W., Waegeman, W., Hüllermeier, E.: Optimizing the f-measure in multi-label classification: plug-in rule approach versus structured loss minimization. ICML 28(3), 1130–1138 (2013)
Google Scholar
Dendamrongvit, S., Kubat, M.: Undersampling approach for imbalanced training sets and induction from multi-label text-categorization domains. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Melbourne, pp. 40–52. Springer (2009)
Chapter Google Scholar
Ertekin, S.: Adaptive oversampling for imbalanced data classification. In: Proceedings of the 28th International Symposium on Computer and Information Sciences, Paris. Lecture Notes in Electrical Engineering, vol. 264, pp. 261–269. Springer (2013)
Google Scholar
Ertekin, S., Huang, J., Bottou, L., Giles, L.: Learning on the border: active learning in imbalanced data classification. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 127–136. ACM (2007)
Google Scholar
Gammerman, A., Vovk, V., Vapnik, V.: Learning by transduction. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, pp. 148–155. Morgan Kaufmann Publishers Inc. (1998)
Google Scholar
Giraldo-Forero, A.F., Jaramillo-Garzón, J.A., Ruiz-Muñoz, J.F., Castellanos-Domínguez, C.G.: Managing imbalanced data sets in multi-label problems: a case study with the smote algorithm. In: Iberoamerican Congress on Pattern Recognition, La Havana, pp. 334–342. Springer (2013)
Google Scholar
Gutiérrez, P.A., Pérez-Ortiz, M., Sánchez-Monedero, J., Fernández-Navarro, F., Hervás-Martínez, C.: Ordinal regression methods: survey and experimental study. IEEE Trans. Knowl. Data Eng. 28(1), 127–146 (2016)
Article Google Scholar
He, J., Gu, H., Liu, W.: Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites. PloS One 7(6), e37155 (2012)
Article Google Scholar
Hernández-González, J., Inza, I., Lozano, J.A.: Weak supervision and other non-standard classification problems: a taxonomy. Pattern Recogn. Lett. 69, 49–55 (2016)
Article Google Scholar
Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J.: Multilabel Classification: Problem Analysis, Metrics and Techniques. Springer, Cham (2016)
Google Scholar
Herrera, F., Ventura, S., Bello, R., Cornelis, C., Zafra, A., Sánchez-Tarragó, D., Vluymans, S.: Multiple Instance Learning: Foundations and Algorithms. Springer, Cham (2016)
Book Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Article Google Scholar
Jacobusse, G., Veenman, C.: On selection bias with imbalanced classes. In: International Conference on Discovery Science, Bari, pp. 325–340. Springer (2016)
Chapter Google Scholar
Joachims, T.: Transductive inference for text classification using support vector machines. In: International Conference on Machine Learning, Bled, pp. 200–209 (1999)
Google Scholar
Juszczak, P., Duin, R.P.: Uncertainty sampling methods for one-class classifiers. In: Proceedings of the ICML, Washington, DC, vol. 3 (2003)
Google Scholar
Kim, S., Kim, H., Namkoong, Y.: Ordinal classification of imbalanced data with application in emergency and disaster information services. IEEE Intell. Syst. 31(5), 50–56 (2016)
Article Google Scholar
Kourtis, I., Stamatatos, E.: Author identification using semi-supervised learning. In: CLEF’2011 Conference on Multilingual and Multimodal Information Access Evaluation (Lab and Workshop Notebook Papers), Amsterdam (2011)
Google Scholar
Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. Ann Arbor MI 48109, 1092 (2004)
Google Scholar
Li, S., Wang, Z., Zhou, G., Lee, S.Y.M.: Semi-supervised learning for imbalanced sentiment classification. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence IJCAI’2011, Barcelona, pp. 1826–1831 (2011)
Google Scholar
Melki, G., Cano, A., Ventura, S.: MIRSVM : multi-instance support vector machine with bag representatives. Pattern Recogn. 79, 228–241 (2018)
Article Google Scholar
Mera, C., Orozco-Alzate, M., Branch, J.: Improving representation of the positive class in imbalanced multiple-instance learning. In: International Conference Image Analysis and Recognition, Vilamoura, pp. 266–273. Springer (2014)
Google Scholar
Mera, C., Arrieta, J., Orozco-Alzate, M., Branch, J.: A bag oversampling approach for class imbalance in multiple instance learning. In: Iberoamerican Congress on Pattern Recognition, pp. 724–731. Springer (2015)
Google Scholar
Nekooeimehr, I., Lai-Yuen, S.K.: Cluster-based weighted oversampling for ordinal regression (CWOS-Ord). Neurocomputing 218, 51–60 (2016)
Article Google Scholar
Pakrashi, A., Mac Namee, B.: Stacked-MLkNN: a stacking based improvement to multi-label k-nearest neighbours. In: First International Workshop on Learning with Imbalanced Domains: Theory and Applications, pp. 51–63 (2017)
Google Scholar
Pang, S., Ban, T., Kadobayashi, Y., Kasabov, N.: Personalized mode transductive spanning SVM classification tree. Inf. Sci. 181(11), 2071–2085 (2011)
Article Google Scholar
Pathak, D., Shelhamer, E., Long, J., Darrell, T.: Fully convolutional multi-class multiple instance learning. In: International Conference on Learning Representations (ICLR) Workshop, San Diego, arXiv:1412.7144 (2015)
Google Scholar
Pérez-Ortiz, M., Gutiérrez, P.A., Hervás-Martínez, C., Yao, X.: Graph-based approaches for over-sampling in the context of ordinal regression. IEEE Trans. Knowl. Data Eng. 27(5), 1233–1245 (2015)
Article Google Scholar
Pérez-Ortiz, M., Sáez, A., Sánchez-Monedero, J., Gutiérrez, P.A., Hervás-Martínez, C.: Tackling the ordinal and imbalance nature of a melanoma image classification problem. In: 2016 International Joint Conference on Neural Networks, IJCNN’2016, Vancouver, 24–29 July 2016, pp. 2156–2163 (2016)
Google Scholar
Prez-Ortiz, M., Gutirrez, P., Aylln-Tern, M., Heaton, N., Ciria, R., Briceo, J., Hervs-Martnez, C.: Synthetic semi-supervised learning in imbalanced domains. Knowl.-Based Syst. 123(C), 75–87 (2017)
Article Google Scholar
Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMS: a case study. ACM SIGKDD Explor. Newsl. 6(1), 60–69 (2004)
Article Google Scholar
Stamatatos, E.: Author identification using imbalanced and limited training texts. In: 18th International Workshop on Database and Expert Systems Applications (DEXA’07), pp. 237–241. IEEE (2007)
Google Scholar
Stanescu, A., Caragea, D.: Semi-supervised self-training approaches for imbalanced splice site datasets. In: Proceedings of the Sixth International Conference on Bioinformatics and Computational Biology, BICoB’2014, Las Vegas, pp. 131–136 (2014)
Google Scholar
Sun, K.W., Lee, C.H.: Addressing class-imbalance in multi-label learning via two-stage multi-label hypernetwork. Neurocomputing 266, 375–389 (2017)
Article Google Scholar
Tahir, M.A., Kittler, J., Bouridane, A.: Multilabel classification using heterogeneous ensemble of multi-label classifiers. Pattern Recogn. Lett. 33(5), 513–523 (2012)
Article Google Scholar
Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn. 45(10), 3738–3750 (2012)
Article Google Scholar
Tepvorachai, G., Papachristou, C.: Multi-label imbalanced data enrichment process in neural net classifier training. In: IEEE International Joint Conference on Neural Networks (IJCNN’2008), Hong Kong, pp. 1301–1307. IEEE (2008)
Google Scholar
Tomanek, K., Hahn, U.: Reducing class imbalance during active learning for named entity annotation. In: Proceedings of the Fifth International Conference on Knowledge Capture, Redondo Beach, pp. 105–112. ACM (2009)
Google Scholar
Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC-6(6), 448–452 (1976)
Article MathSciNet Google Scholar
Torgo, L., Branco, P., Ribeiro, R.P., Pfahringer, B.: Resampling strategies for regression. Exp. Syst. 32(3), 465–476 (2015)
Article Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience, New York/Chichester (1998)
MATH Google Scholar
Vluymans, S., Tarragó, D.S., Saeys, Y., Cornelis, C., Herrera, F.: Fuzzy rough classifiers for class imbalanced multi-instance data. Pattern Recogn. 53, 36–45 (2016)
Article Google Scholar
Waegeman, W., Baets, B.D., Boullart, L.: ROC analysis in ordinal regression learning. Pattern Recogn. Lett. 29(1), 1–9 (2008)
Article Google Scholar
Wang, J., Chang, S.F., Zhou, X., Wong, S.T.: Active microscopic cellular image annotation by superposable graph transduction with imbalanced labels. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’2008), Anchorage, pp. 1–8. IEEE (2008)
Google Scholar
Wang, J., Jebara, T., Chang, S.F.: Graph transduction via alternating minimization. In: Proceedings of the 25th International Conference on Machine Learning, Helsinki, pp. 1144–1151. ACM (2008)
Google Scholar
Wang, X., Liu, X., Japkowicz, N., Matwin, S.: Resampling and cost-sensitive methods for imbalanced multi-instance learning. In: 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW), Dallas, pp. 808–816. IEEE (2013)
Google Scholar
Wang, X., Matwin, S., Japkowicz, N., Liu, X.: Cost-sensitive boosting algorithms for imbalanced multi-instance datasets. In: Canadian Conference on Artificial Intelligence, Regina, pp. 174–186. Springer (2013)
Google Scholar
Wang, A., Liu, L., Jin, X., Li, Y.: Adapting TSVM for fault diagnosis with imbalanced class data. In: Control and Decision Conference (CCDC), 2016 Chinese, Yinchuan, pp. 2919–2923. IEEE (2016)
Google Scholar
Wang, S., Minku, L.L., Yao, X.: A systematic study of online class imbalance learning with concept drift. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4802–4821 (2018)
Article Google Scholar
Xu, X., Li, B.: Multiple class multiple-instance learning and its application to image categorization. Int. J. Image Graph. 7(3), 427–444 (2007)
Article MathSciNet Google Scholar
Youngs, N., Shasha, D., Bonneau, R.: Positive-unlabeled learning in the face of labeling bias. In: 2015 IEEE International Conference on Data Mining Workshop (ICDMW), New Jersey, pp. 639–645. IEEE (2015)
Google Scholar
Zhang, M.L., Li, Y.K., Liu, X.Y.: Towards class-imbalance aware multi-label learning. In: IJCAI, pp. 4041–4047 (2015)
Google Scholar
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, vol. 16, pp. 321–328. MIT Press, Cambridge (2004)
Google Scholar
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)
Article Google Scholar
Zhu, J., Hovy, E.H.: Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In: EMNLP-CoNLL, vol. 7, pp. 783–790 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and AI, University of Granada, Granada, Granada, Spain
Alberto Fernández & Salvador García
Institute of Smart Cities, Public University of Navarre, Pamplona, Spain
Mikel Galar
Department of Computer Science, Universidade Federal do ABC, Santo Andre, Brazil
Ronaldo C. Prati
Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
Bartosz Krawczyk
Department of Computer Science and AI, University of Granada, Granada, Spain
Francisco Herrera

Authors

Alberto Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Salvador García
View author publications
You can also search for this author in PubMed Google Scholar
Mikel Galar
View author publications
You can also search for this author in PubMed Google Scholar
Ronaldo C. Prati
View author publications
You can also search for this author in PubMed Google Scholar
Bartosz Krawczyk
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F. (2018). Non-classical Imbalanced Classification Problems. In: Learning from Imbalanced Data Sets. Springer, Cham. https://doi.org/10.1007/978-3-319-98074-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-98074-4_12
Published: 23 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98073-7
Online ISBN: 978-3-319-98074-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics