Skip to main content

Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise

  • Conference paper
  • First Online:
ICT Based Innovations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 653))

Abstract

Class imbalance learning is a recent topic, which helps us to detect the classes from unbalanced datasets. In various real scenarios, where we need to find the exceptional cases like credit card problem, brain tumor detection, etc., the traditional classification algorithms fail because they are designed in such a way that either their results are overwhelmed by the bigger class and or they ignore the smaller class as a noise and avoid it. In recent studies, it has been found that class imbalance problem itself is not a problem but there are certain other data distribution complexities, which when combined with the class imbalance problem degrade the performance of classifier. One of the major issues is noise in the data, which is a part of every real data in one form or another. This paper compares the oversampling and undersampling approaches of class imbalance learning in noisy environment and tries to find out which is the better approach in such case.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yong, Y.: The Research of Imbalanced data-set of sample sampling method based on K-means cluster and Genetic algorithm. Energy Procedia, vol. 17, pp. 164–170. Sciverse ScienceDirect (2012)

    Google Scholar 

  2. Garcia, V., et. al.: The class imbalance problem in pattern classification and learning. Pattern analysis and learning group, Conreso Espanol de Informatica; pp. 283–291

    Google Scholar 

  3. Napierala, K., et. al.: Learning from Imbalance data in presence of Noisy and Borderline Examples. RSCTC, LNAI 6086, pp. 158–167, Springer-Verlag, Berlin Heidelberg (2010)

    Google Scholar 

  4. Satyashree, K.P.N.V., Murthy, J.V.R.: An exhaustiv literature review on class imbalance problem. Int. J. Emerging Trends Technol. Comput. Sci. 2(3), 109–118 (2013)

    Google Scholar 

  5. Fernandez, A., et al.: A study of the behaviour of linguistic fuzzy rule base classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)

    Article  Google Scholar 

  6. Fernandez, A., et al.: Hierarchical fuzzy rule base classification system with genetic rule selection for imbalanced data-sets. 50, 561–577 (2009)

    Google Scholar 

  7. Batuwita, R., Palade, V.: FSVM-CIL: fuzzy support vector machine for class imbalanced learning. IEEE Trans. Fuzzy Syst. 18(3), 558–571 (2010)

    Article  Google Scholar 

  8. Zhao, Z., Zhong, P., Zhao, Y.: Learning SVM with weighted maximun margin criterion for classification of imbalanced data. Math. Comput. Model. 54, 1093–1099 (2011)

    Article  MATH  Google Scholar 

  9. Galar, M., et al.: Dynamic classifier selection for one-vs-one strategy: avoiding non-competent classifiers. Pattern Recogn. 46, 3412–3424 (2013)

    Article  Google Scholar 

  10. Gu, X., et al.: New fuzzy support vector machine for the class imbalance problem in medical data-sets classification. The Scientific World Journal, vol. 2014, pp. 1–12, Hindawi Publishing Corporation (2014)

    Google Scholar 

  11. Seiffert, C., et al.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. On Sys. Man and Cyber.-Part A 40(1), 185–197 (2010)

    Article  Google Scholar 

  12. Hido, S., Kashima, H., Takahashi, Y.: Roughly balanced bagging for imbalanced data. Stat Anal Data Min 2, 412–426 (2009)

    Article  MathSciNet  Google Scholar 

  13. Blaszczynski, J., Deckert, M., Stefanowski, J., Wilk, S.: Integrating Selective pre-processing of imbalanced data with ivotes ensemble. Rough sets and Current trends in Computing (Lecture notes in Computer Science Series 6086), Springer-Verlag, pp. 148–157, (2010)

    Google Scholar 

  14. Chawla, N., V., Lazarevic, A., Hall, L., O., Bowyer, K., W.: SMOTBoost: improving pridiction of the minority class in boosting. In proc. Knowledge Discovery databases, pp. 107–119 (2003)

    Google Scholar 

  15. Saez, J.A., Luengo, J., Stefanowksi, J., Herrera, F.: Managing BorderLine and Noisy examples in Imbalanced Classification by combining SMOTE with Ensemble filtering. IDEAL 2014, LNCS 8669, pp. 61–68, Springer, (2014)

    Google Scholar 

  16. Garcia, V., et. al.: Combined effects of Class Imbalance and Class Overlap on Instance-based Classification, IDEAL 2006, LNCS, vol. 4224, pp. 371–378, Springer Heidelberg (2006)

    Google Scholar 

  17. Prati, R.C., Batista, G.E., Monard, M.C.: Class imbalance versus class overlapping: an analysis of a learning system behaviour, In: Proc. 3rd Mexican International Conference on Artificial Intelligence, pp. 312–321, (2004)

    Google Scholar 

  18. Japkowicz, N.: Class imbalance: are we focussing on the right issue?, In proc. International workshop on learning from imbalanced data-sets II, (2003)

    Google Scholar 

  19. Jo, T., Japkowicz, N.: Class imbalance versus small disjuncts. SIGKDD Explorations 6, 40–49 (2004)

    Article  Google Scholar 

  20. Batista, G.E.A.P.A., et al.: A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Expl. Newl. 6(1), 20–29 (2004)

    Article  Google Scholar 

  21. Chawla, N.V., et al.: SMOTE: synthetic minority over sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  22. MATLAB version 7.10.0. Natick, Massachusetts: The MathWorks Inc., 2010

    Google Scholar 

  23. Mark, H., Eibe, F., Geoffrey, H., Bernhard, P., Peter, R., Ian, H. Witten: the weka data mining software: an update. SIGKDD Explorations 11(1) (2009)

    Google Scholar 

  24. Wang, Q.: A hybrid sampling SVM approach to imbalanced data classification. Abstract and Applied Analysis, vol. 2014, pp 1–7, Hindwani Publishing Corporation, (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prabhjot Kaur .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Kaur, P., Gosain, A. (2018). Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise. In: Saini, A., Nayak, A., Vyas, R. (eds) ICT Based Innovations. Advances in Intelligent Systems and Computing, vol 653. Springer, Singapore. https://doi.org/10.1007/978-981-10-6602-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6602-3_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6601-6

  • Online ISBN: 978-981-10-6602-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics