Skip to main content

The Impact of Local Data Characteristics on Learning from Imbalanced Data

  • Conference paper
Rough Sets and Intelligent Systems Paradigms

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8537))

Abstract

Problems of learning classifiers from imbalanced data are discussed. First, we look at different data difficulty factors corresponding to complex distributions of the minority class and show that they could be approximated by analysing the neighbourhood of the learning examples from the minority class. We claim that the results of this analysis could be a basis for developing new algorithms. In this paper we show such possibilities by discussing modifications of informed pre-processing method LN–SMOTE as well as by incorporating types of examples into rule induction algorithm BRACID.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. An, A.: Learning classification rules from data. Computers and Mathematics with Applications 45, 737–748 (2003)

    Article  MathSciNet  Google Scholar 

  2. Anyfantis, D., Karagiannopoulos, M., Kotsiantis, S., Pintelas, P.: Robustness of learning techniques in handling class noise in imbalanced datasets. In: Boukis, C., Pnevmatikakis, A., Polymenakos, L. (eds.) AIAI 2007. IFIP, vol. 247, pp. 21–28. Springer, Boston (2007)

    Google Scholar 

  3. Batista, G., Prati, R., Monard, M.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1), 20–29 (2004)

    Article  Google Scholar 

  4. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 475–482. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Błaszczyński, J., Stefanowski, J., Idkowiak, Ł.: Extending bagging for imbalanced data. In: Burduk, R., Jackowski, K., Kurzynski, M., Wozniak, M., Zolnierek, A. (eds.) CORES 2013. AISC, vol. 226, pp. 273–282. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  6. Błaszczyński, J., Stefanowski, J., Szajek, M.: Local Neighbourhood in Generalizing Bagging for Imbalanced Data. In: Proc. of COPEM 2013 - Solving Complex Machine Learning Problems with Ensemble Methods Workshop at ECML PKDD 2013, Praque, pp. 10–24 (2013)

    Google Scholar 

  7. Chawla, N.: Data mining for imbalanced datasets: An overview. In: Maimon, O., Rokach, L. (eds.) The Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: Synthetic Minority Over-sampling Technique. J. of Artificial Intelligence Research 16, 341–378 (2002)

    Article  Google Scholar 

  9. Cost, S., Salzberg, S.: A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning Journal 10(1), 1213–1228 (1993)

    Google Scholar 

  10. Fernández, A., García, S., Herrera, F.: Addressing the Classification with Imbalanced Data: Open Problems and New Challenges on Class Distribution. In: Corchado, E., Kurzyński, M., Woźniak, M. (eds.) HAIS 2011, Part I. LNCS, vol. 6678, pp. 1–10. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Furnkranz, J.: Separate-and-conquer rule learning. Artificial Intelligence Review 13(1), 3–54 (1999)

    Article  Google Scholar 

  12. Furnkranz, J., Gamberger, D., Lavrac, N.: Foundations of Rule Learning. Springer (2012)

    Google Scholar 

  13. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 99, 1–22 (2011)

    Google Scholar 

  14. García, V., Sánchez, J., Mollineda, R.A.: An empirical study of the behavior of classifiers on imbalanced and overlapped data sets. In: Rueda, L., Mery, D., Kittler, J. (eds.) CIARP 2007. LNCS, vol. 4756, pp. 397–406. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  15. Grzymala-Busse, J.W., Goodwin, L.K., Grzymala-Busse, W., Zheng, X.: An approach to imbalanced data sets based on changing rule strength. In: Proceedings of Learning from Imbalanced Data Sets, AAAI Workshop at the 17th Conference on AI, pp. 69–74 (2000)

    Google Scholar 

  16. Grzymala-Busse, J.W., Stefanowski, J., Wilk, S.: A comparison of two approaches to data mining from imbalanced data. Journal of Intelligent Manufacturing 16(6), 565–574 (2005)

    Article  Google Scholar 

  17. Han, H., Wang, W., Mao, B.: Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  18. He, H., Garcia, E.: Learning from imbalanced data. IEEE Transactions on Data and Knowledge Engineering 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  19. He, H., Yungian, M. (eds.): Imbalanced Learning. Foundations, Algorithms and Applications. IEEE - Wiley (2013)

    Google Scholar 

  20. Hido, S., Kashima, H.: Roughly balanced bagging for imbalance data. Statistical Analysis and Data Mining 2(5-6), 412–426 (2009)

    Article  MathSciNet  Google Scholar 

  21. Japkowicz, N.: Class imbalance: Are we focusing on the right issue? In: Proc. II Workshop on Learning from Imbalanced Data Sets, ICML Conf., pp. 17–23 (2003)

    Google Scholar 

  22. Jo, T., Japkowicz, N.: Class Imbalances versus small disjuncts. ACM SIGKDD Explorations Newsletter 6(1), 40–49 (2004)

    Article  Google Scholar 

  23. Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Transactions on Systems, Man, and Cybernetics–Part A 41(3), 552–568 (2011)

    Article  Google Scholar 

  24. Kubat, M., Matwin, S.: Addresing the curse of imbalanced training sets: one-side selection. In: Proc. of the 14th Int. Conf. on Machine Learning, ICML 1997, pp. 179–186 (1997)

    Google Scholar 

  25. Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. Tech. Report A-2001-2, University of Tampere (2001)

    Chapter  Google Scholar 

  26. Liu, A., Zhu, Z.: Ensemble methods for class imbalance learning. In: He, H., Yungian, M. (eds.) Imbalanced Learning. Foundations, Algorithms and Apllications, pp. 61–82. Wiley (2013)

    Google Scholar 

  27. Lumijarvi, J., Laurikkala, J., Juhola, M.: A comparison of different heterogeneous proximity functions and Euclidean distance. Stud Health Technol. Inform. 107 (pt. 2), 1362–1366 (2004)

    Google Scholar 

  28. Lopez, V., Fernandez, A., Garcia, S., Palade, V., Herrera, F.: An Insight into Classification with Imbalanced Data: Empirical Results and Current Trends on Using Data Intrinsic Characteristics. Information Sciences 257, 113–141 (2014)

    Article  Google Scholar 

  29. Lopez, V., Triguero, I., Garcia, S., Carmona, C., Herrera, F.: Addressing imbalanced classification with instance generation techniques: IPADE-ID. Neurocomputing 126, 15–28 (2014)

    Article  Google Scholar 

  30. Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of SMOTE for mining imbalanced data. In: Proc. IEEE Symp. on Computational Intelligence and Data Mining, pp. 104–111 (2011)

    Google Scholar 

  31. Napierala, K.: Improving rule classifiers for imbalanced data. Ph.D. Thesis. Poznan University of Technology (2013)

    Google Scholar 

  32. Napierała, K., Stefanowski, J., Wilk, S.: Learning from Imbalanced Data in Presence of Noisy and Borderline Examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 158–167. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  33. Napierała, K., Stefanowski, J.: Argument Based Generalization of MODLEM Rule Induction Algorithm. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS (LNAI), vol. 6086, pp. 138–147. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  34. Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012, Part II. LNCS (LNAI), vol. 7209, pp. 139–150. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  35. Napierala, K., Stefanowski, J.: BRACID: a comprehensive approach to learning rules from imbalanced data. Journal of Intelligent Information Systems 39(2), 335–373 (2012)

    Article  Google Scholar 

  36. Prati, R., Batista, G., Monard, M.: Class imbalance versus class overlapping: An analysis of a learning system behavior. In: Proc. 3rd Mexican Int. Conf. on Artificial Intelligence, pp. 312–321 (2004)

    Chapter  Google Scholar 

  37. Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB *: A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowledge Inform. Systems 33(2), 245–265 (2012)

    Article  Google Scholar 

  38. Sikora, M., Wrobel, L.: Data-driven adaptive selection of rule quality measures for improving rule induction and filtration algorithms. Int. J. General Systems 42(6), 594–613 (2013)

    Article  Google Scholar 

  39. Stefanowski, J.: On combined classifiers, rule induction and rough sets. In: Peters, J.F., Skowron, A., Düntsch, I., Grzymała-Busse, J.W., Orłowska, E., Polkowski, L. (eds.) Transactions on Rough Sets VI. LNCS, vol. 4374, pp. 329–350. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  40. Stefanowski, J.: Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data. In: Ramanna, S., Jain, L.C., Howlett, R.J. (eds.) Emerging Paradigms in Machine Learning, pp. 277–306 (2013)

    Chapter  Google Scholar 

  41. Stefanowski, J., Wilk, S.: Selective pre-processing of imbalanced data for improving classification performance. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 283–292. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  42. Stefanowski, J., Wilk, S.: Extending rule-based classifiers to improve recognition of imbalanced classes. In: Ras, Z.W., Dardzinska, A. (eds.) Advances in Data Management. SCI, vol. 223, pp. 131–154. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  43. Tomasev, N., Mladenic, D.: Class imbalance and the curse of minority hubs. Knowledge-Based Systems 53, 157–172 (2013)

    Article  Google Scholar 

  44. Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter 6(1), 7–19 (2004)

    Article  Google Scholar 

  45. Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. Journal of Artifical Intelligence Research 6, 1–34 (1997)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Stefanowski, J. (2014). The Impact of Local Data Characteristics on Learning from Imbalanced Data. In: Kryszkiewicz, M., Cornelis, C., Ciucci, D., Medina-Moreno, J., Motoda, H., Raś, Z.W. (eds) Rough Sets and Intelligent Systems Paradigms. Lecture Notes in Computer Science(), vol 8537. Springer, Cham. https://doi.org/10.1007/978-3-319-08729-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08729-0_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08728-3

  • Online ISBN: 978-3-319-08729-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics