Empowering Imbalanced Data in Supervised Learning: A Semi-supervised Learning Approach

Almogahed, Bassam A.; Kakadiaris, Ioannis A.

doi:10.1007/978-3-319-11179-7_66

Bassam A. Almogahed²¹ &
Ioannis A. Kakadiaris²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8681))

Included in the following conference series:

International Conference on Artificial Neural Networks

4349 Accesses
2 Citations

Abstract

We present a framework to address the imbalanced data problem using semi-supervised learning. Specifically, from a supervised problem, we create a semi-supervised problem and then use a semi-supervised learning method to identify the most relevant instances to establish a well-defined training set. We present extensive experimental results, which demonstrate that the proposed framework significantly outperforms all other sampling algorithms in 67% of the cases across three different classifiers and ranks second best for the remaining 33% of the cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)
Article Google Scholar
Chawla, N., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
MATH Google Scholar
Oh, S.: Error back-propagation algorithm for classification of imbalanced data. Neurocomputing 74(6), 1058–1061 (2011)
Article Google Scholar
Elkan, C.: The foundations of cost-sensitive learning. In: Proc. International Joint Conference on Artificial Intelligence, Seattle, WA, vol. 17, pp. 973–978 (August 2001)
Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proc. 14th International Conference on Machine Learning, Nashville, TN, USA, July 8-12, pp. 179–186 (1997)
Google Scholar
Yen, S., Lee, Y., Lin, C., Ying, J.: Investigating the effect of sampling methods for imbalanced data distributions. In: Proc. IEEE International Conference on Systems, Man and Cybernetics, Taipei, vol. 5, pp. 4163–4168 (October 2006)
Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)
Chapter Google Scholar
Batista, G., Prati, R., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1), 20–29 (2004)
Article Google Scholar
García, V., Sánchez, J.S., Mollineda, R.A.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowledge-Based Systems 25(1), 13–21 (2012)
Article Google Scholar
Weiss, G.M.: Mining with rarity: A unifying framework. ACM SIGKDD Explorations Newsletter 6(1), 7–19 (2004)
Article Google Scholar
Holte, R.C., Acker, L.E., Porter, B.W.: Concept learning and the problem of small disjuncts. In: Proc. 11th International Joint Conference on Artificial Intelligence, Detroit, vol. 1 (August 1989)
Google Scholar
Wang, B.X., Japkowicz, N.: Imbalanced data set learning with synthetic samples. In: Proc. IRIS Machine Learning Workshop, Canada (June 2004)
Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proc. IEEE International Joint Conference on Neural Networks, Hong Kong, pp. 1322–1328 (June 2008)
Google Scholar
Yoon, K., Kwek, S.: An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics. In: Proc. Hybrid Intelligent Systems, p. 6. Rio de Janeiro, Brazil (2005)
Google Scholar
Mani, I., Zhang, I.: Knn approach to unbalanced data distributions: A case study involving information extraction. In: Proc. Proceedings of Workshop on Learning from Imbalanced Datasets, Washington DC (January 2003)
Google Scholar
Yen, S., Lee, Y.: Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCIS, vol. 344, pp. 731–740. Springer, Heidelberg (2006)
Google Scholar
Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 6, 769–772 (1976)
Article MATH MathSciNet Google Scholar
Ramanna, S., Jain, L.C., Howlett, R.J.: Emerging paradigms in machine learning. Springer Publishing Company, Incorporated (2012)
Google Scholar
Zhou, D., Bousquet, O., Navin Lal, T., Scholkopf, B.: Learning with local and global consistency. Advances in Neural Information Processing Systems 16(16), 321–328 (2004)
Google Scholar
Driessens, K., Reutemann, P., Pfahringer, B., Leschi, C.: Using weighted nearest neighbor to benefit from unlabeled data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 60–69. Springer, Heidelberg (2006)
Chapter Google Scholar
Leistner, C., Saffari, A., Bischof, H.: Semi-supervised random forests. In: Proc. 12th International Conference on Computer Vision, Kyoto, Japan, pp. 506–513 (October 2009)
Google Scholar
Murphy, P.M., Aha, D.W.: UCI repository of machine learning databases. Machine-readable repository. University of California, Department of Information and Computer Science, Irvine (1992)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, H.: WEKA data mining software. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computational Biomedicine Lab, Dept. of Computer Science, Univ. of Houston, USA
Bassam A. Almogahed & Ioannis A. Kakadiaris

Authors

Bassam A. Almogahed
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis A. Kakadiaris
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, University of Hamburg, Vogt-Kölln-Straße 30, 22527, Hamburg, Germany
Stefan Wermter , Cornelius Weber & Sven Magg , &
Department of Informatics, Nicolaus Compernicus University, ul. Grudziądzka 5, 87-100, Torun, Poland
Włodzisław Duch
Department of Modern Languages, University of Helsinki, P.O. Box 24, 00014, Helsinki, Finland
Timo Honkela
Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. G. Bonchev str. bl. 25A, 1113, Sofia, Bulgaria
Petia Koprinkova-Hristova
Institute of Neural Information Processing, University of Ulm, 89069, Oberer Eselsberg, Ulm, Germany
Günther Palm
Department of Information Systems, Quartier UNIL-Dorigny, Bâtiment Internef, University of Lausanne, 1015, Lausanne, Switzerland
Alessandro E. P. Villa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Almogahed, B.A., Kakadiaris, I.A. (2014). Empowering Imbalanced Data in Supervised Learning: A Semi-supervised Learning Approach. In: Wermter, S., et al. Artificial Neural Networks and Machine Learning – ICANN 2014. ICANN 2014. Lecture Notes in Computer Science, vol 8681. Springer, Cham. https://doi.org/10.1007/978-3-319-11179-7_66

Download citation

DOI: https://doi.org/10.1007/978-3-319-11179-7_66
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11178-0
Online ISBN: 978-3-319-11179-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics