Skip to main content

Empowering Imbalanced Data in Supervised Learning: A Semi-supervised Learning Approach

  • Conference paper
Book cover Artificial Neural Networks and Machine Learning – ICANN 2014 (ICANN 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8681))

Included in the following conference series:

Abstract

We present a framework to address the imbalanced data problem using semi-supervised learning. Specifically, from a supervised problem, we create a semi-supervised problem and then use a semi-supervised learning method to identify the most relevant instances to establish a well-defined training set. We present extensive experimental results, which demonstrate that the proposed framework significantly outperforms all other sampling algorithms in 67% of the cases across three different classifiers and ranks second best for the remaining 33% of the cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  2. Chawla, N., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    MATH  Google Scholar 

  3. Oh, S.: Error back-propagation algorithm for classification of imbalanced data. Neurocomputing 74(6), 1058–1061 (2011)

    Article  Google Scholar 

  4. Elkan, C.: The foundations of cost-sensitive learning. In: Proc. International Joint Conference on Artificial Intelligence, Seattle, WA, vol. 17, pp. 973–978 (August 2001)

    Google Scholar 

  5. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proc. 14th International Conference on Machine Learning, Nashville, TN, USA, July 8-12, pp. 179–186 (1997)

    Google Scholar 

  6. Yen, S., Lee, Y., Lin, C., Ying, J.: Investigating the effect of sampling methods for imbalanced data distributions. In: Proc. IEEE International Conference on Systems, Man and Cybernetics, Taipei, vol. 5, pp. 4163–4168 (October 2006)

    Google Scholar 

  7. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Batista, G., Prati, R., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1), 20–29 (2004)

    Article  Google Scholar 

  9. García, V., Sánchez, J.S., Mollineda, R.A.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowledge-Based Systems 25(1), 13–21 (2012)

    Article  Google Scholar 

  10. Weiss, G.M.: Mining with rarity: A unifying framework. ACM SIGKDD Explorations Newsletter 6(1), 7–19 (2004)

    Article  Google Scholar 

  11. Holte, R.C., Acker, L.E., Porter, B.W.: Concept learning and the problem of small disjuncts. In: Proc. 11th International Joint Conference on Artificial Intelligence, Detroit, vol. 1 (August 1989)

    Google Scholar 

  12. Wang, B.X., Japkowicz, N.: Imbalanced data set learning with synthetic samples. In: Proc. IRIS Machine Learning Workshop, Canada (June 2004)

    Google Scholar 

  13. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proc. IEEE International Joint Conference on Neural Networks, Hong Kong, pp. 1322–1328 (June 2008)

    Google Scholar 

  14. Yoon, K., Kwek, S.: An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics. In: Proc. Hybrid Intelligent Systems, p. 6. Rio de Janeiro, Brazil (2005)

    Google Scholar 

  15. Mani, I., Zhang, I.: Knn approach to unbalanced data distributions: A case study involving information extraction. In: Proc. Proceedings of Workshop on Learning from Imbalanced Datasets, Washington DC (January 2003)

    Google Scholar 

  16. Yen, S., Lee, Y.: Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCIS, vol. 344, pp. 731–740. Springer, Heidelberg (2006)

    Google Scholar 

  17. Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 6, 769–772 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  18. Ramanna, S., Jain, L.C., Howlett, R.J.: Emerging paradigms in machine learning. Springer Publishing Company, Incorporated (2012)

    Google Scholar 

  19. Zhou, D., Bousquet, O., Navin Lal, T., Scholkopf, B.: Learning with local and global consistency. Advances in Neural Information Processing Systems 16(16), 321–328 (2004)

    Google Scholar 

  20. Driessens, K., Reutemann, P., Pfahringer, B., Leschi, C.: Using weighted nearest neighbor to benefit from unlabeled data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 60–69. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  21. Leistner, C., Saffari, A., Bischof, H.: Semi-supervised random forests. In: Proc. 12th International Conference on Computer Vision, Kyoto, Japan, pp. 506–513 (October 2009)

    Google Scholar 

  22. Murphy, P.M., Aha, D.W.: UCI repository of machine learning databases. Machine-readable repository. University of California, Department of Information and Computer Science, Irvine (1992)

    Google Scholar 

  23. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, H.: WEKA data mining software. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Almogahed, B.A., Kakadiaris, I.A. (2014). Empowering Imbalanced Data in Supervised Learning: A Semi-supervised Learning Approach. In: Wermter, S., et al. Artificial Neural Networks and Machine Learning – ICANN 2014. ICANN 2014. Lecture Notes in Computer Science, vol 8681. Springer, Cham. https://doi.org/10.1007/978-3-319-11179-7_66

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11179-7_66

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11178-0

  • Online ISBN: 978-3-319-11179-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics