A Wrapper for Reweighting Training Instances for Handling Imbalanced Data Sets

  • M. Karagiannopoulos
  • D. Anyfantis
  • S. Kotsiantis
  • P. Pintelas
Part of the IFIP The International Federation for Information Processing book series (IFIPAICT, volume 247)


A classifier induced from an imbalanced data set has a low error rate for the majority class and an undesirable error rate for the minority class. This paper firstly provides a systematic study on the various methodologies that have tried to handle this problem. Finally, it presents an experimental study of these methodologies with a proposed wrapper for reweighting training instances and it concludes that such a framework can be a more valuable solution to the problem.


Base Classifier Threshold Method Training Instance Minority Class Positive Instance 


  1. 1.
    Aha, D. (1997). Lazy Learning. Dordrecht: Kluwer Academic Publishers.MATHGoogle Scholar
  2. 2.
    Blake, C, Keogh, E. & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California.Google Scholar
  3. 3.
    Chawla N., Bowyer K., Hall L., Kegelmeyer W. (2002), SMOTE: Synthetic Minority Over-sampling Technique, Journal of Artificial Intelligence Research 16, 321–357.MATHGoogle Scholar
  4. 4.
    Domingos P. (1998), How to get a free lunch: A simple cost model for machine learning applications. Proc. AAAI-98/ICML98, Workshop on the Methodology of Applying Machine Learning, pp 1–7.Google Scholar
  5. 5.
    Domingos P. & Pazzani M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130.MATHCrossRefGoogle Scholar
  6. 6.
    Domingos, P. (1999). MetaCost: A General Method for Making Classifiers Cost-Sensitive. Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, 155–164. ACM Press.Google Scholar
  7. 7.
    Fan, W., Stolfo, S.J., Zhang, J. & Chan, P.K. (1999). AdaCost: Misclassification costsensitive boosting. Proceedings of the Sixteenth International Conference on Machine Learning, 97–105. San Francisco: Morgan Kaufmann.Google Scholar
  8. 8.
    Japkowicz N. (2000), The class imbalance problem: Significance and strategies. In Proceedings of the International Conference on Artificial Intelligence, Las Vegas.Google Scholar
  9. 9.
    Japkowicz N. and Stephen, S. (2002), The Class Imbalance Problem: A Systematic Study Intelligent Data Analysis, Volume 6, Number 5.Google Scholar
  10. 10.
    Kotsiantis, S., Pierrakeas, C, and Pintelas, P., Preventing student dropout in distance learning systems using machine learning techniques, LNAI, Vol 2774, pp 267–274, 2003Google Scholar
  11. 11.
    Kotsiantis S., Kanellopoulos, D. Pintelas, P. (2006), Handling imbalanced datasets: A review, GESTS International Transactions on Computer Science and Engineering, Vol.30(1), pp. 25–36.Google Scholar
  12. 12.
    Kubat, M., Holte, R. and Matwin, S. (1998), ‘Machine Learning for the Detection of Oil Spills in Radar Images’, Machine Learning, 30:195–215.CrossRefGoogle Scholar
  13. 13.
    Ling, C, & Li, C. (1998). Data Mining for Direct Marketing Problems and Solutions. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98) New York, NY. AAAI Press.Google Scholar
  14. 14.
    Provost, F. and Fawcett, T. (2001). “Robust Classification for Imprecise Environments”, Machine Learning, 42, 203–231.MATHCrossRefGoogle Scholar
  15. 15.
    Quinlan J.R. (1993), C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco.Google Scholar
  16. 16.
    Schapire R., Singer Y. and Singhal A. (1998). Boosting and Rochhio applied to text filtering. In SIGIR’98.Google Scholar
  17. 17.
    Witten Ian H. and Frank Eibe (2005) “Data Mining: Practical machine learning tools and techniques”, 2nd Edition, Morgan Kaufmann, San Francisco, 2005.MATHGoogle Scholar

Copyright information

© International Federation for Information Processing 2007

Authors and Affiliations

  • M. Karagiannopoulos
    • 1
  • D. Anyfantis
    • 1
  • S. Kotsiantis
    • 1
  • P. Pintelas
    • 1
  1. 1.Educational Software Development Laboratory, Department of MathematicsUniversity of PatrasGreece

Personalised recommendations