Improving Risk Predictions by Preprocessing Imbalanced Credit Data

  • Vicente García
  • Ana Isabel Marqués
  • Jose Salvador Sánchez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7664)


Imbalanced credit data sets refer to databases in which the class of defaulters is heavily under-represented in comparison to the class of non-defaulters. This is a very common situation in real-life credit scoring applications, but it has still received little attention. This paper investigates whether data resampling can be used to improve the performance of learners built from imbalanced credit data sets, and whether the effectiveness of resampling is related to the type of classifier. Experimental results demonstrate that learning with the resampled sets consistently outperforms the use of the original imbalanced credit data, independently of the classifier used.


Credit scoring Class imbalance Classification Resampling Finance 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Thomas, L.C., Edelman, D.B., Crook, J.N.: Credit Scoring and Its Applications. SIAM, Philadelphia (2002)zbMATHCrossRefGoogle Scholar
  2. 2.
    Phua, C., Alahakoon, D., Lee, V.: Minority Report in Fraud Detection: Classification of Skewed Data. SIGKDD Explorations Newsletter 6(1), 50–59 (2004)CrossRefGoogle Scholar
  3. 3.
    Kiefer, N.M.: Default estimation for Low-Default Portfolios. J. Empiri. Finance 16(1), 164–173 (2009)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Japkowicz, N., Stephen, S.: The Class Imbalance Problem: A Systematic Study. Intell. Data Anal. 6(5), 429–449 (2002)zbMATHGoogle Scholar
  5. 5.
    Kennedy, K., Mac Namee, B., Delany, S.J.: Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem. In: Coyle, L., Freyne, J. (eds.) AICS 2009. LNCS, vol. 6206, pp. 174–187. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Brown, I., Mues, C.: An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets. Expert Syst. Appl. 39(3), 3446–3453 (2012)CrossRefGoogle Scholar
  7. 7.
    Vinciotti, V., Hand, D.J.: Scorecard Construction with Unbalanced Class Sizes. J. Iran. Stat. Society 2(2), 189–205 (2003)MathSciNetGoogle Scholar
  8. 8.
    Huang, Y.M., Hung, C.M., Jiau, H.C.: Evaluation of Neural Networks and Data Mining Methods on a Credit Assessment Task for Class Imbalance Problem. Nonlinear Anal. RWA 7(4), 720–747 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Yao, P.: Comparative Study on Class Imbalance Learning for Credit Scoring. In: Proc. of the 9th International Conference on Hybrid Intelligent Systems, Shenyang, China, pp. 105–107 (2009)Google Scholar
  10. 10.
    Xie, H., Han, S., Shu, X., Yang, X., Qu, X., Zheng, S.: Solving Credit Scoring Problem with Ensemble Learning: A Case Study. In: Proc. of the 2nd International Symposium on Knowledge Acquisition and Modeling, Wuhan, China, pp. 51–54 (2009)Google Scholar
  11. 11.
    Florez-Lopez, R.: Credit Risk Management for Low Default Portfolios. Forecasting Defaults through Cooperative Models and Boostrapping Strategies. In: Proc. of the 4th European Risk Conference, Nottingham, UK, pp. 1–27 (2010)Google Scholar
  12. 12.
    Tian, B., Nan, L., Zheng, Q., Yang, L.: Customer Credit Scoring Method Based on the SVDD Classification Model with Imbalanced Dataset. In: Zaman, M., Liang, Y., Siddiqui, S.M., Wang, T., Liu, V., Lu, C. (eds.) CETS 2010. CCIS, vol. 113, pp. 46–60. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  13. 13.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intell. Research 16, 321–357 (2002)zbMATHGoogle Scholar
  14. 14.
    Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 475–482. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  15. 15.
    Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations Newsletter 6(1), 20–29 (2004)CrossRefGoogle Scholar
  16. 16.
    Wilson, D.L.: Asymptotic Properties of Nearest Neighbour Rules Using Edited Data. IEEE Trans. Syst. Man and Cybern. 2, 408–421 (1972)zbMATHCrossRefGoogle Scholar
  17. 17.
    Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In: Proc. of the 14th International Conference on Machine Learning, Nashville, TN, pp. 179–186 (1997)Google Scholar
  18. 18.
    Hart, P.E.: The Condensed Nearest Neighbor Rule. IEEE Trans. Inform. Theory 14(3), 505–516 (1968)CrossRefGoogle Scholar
  19. 19.
    Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distribution. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS (LNAI), vol. 2101, pp. 63–66. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  20. 20.
    Yen, S.J., Lee, Y.-S.: Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCIS, vol. 344, pp. 731–740. Springer, Heidelberg (2006)Google Scholar
  21. 21.
    Sabzevari, H., Soleymani, M., Noorbakhsh, E.: A Comparison between Statistical and Data Mining Methods for Credit Scoring in Case of Limited Available Data. In: Proceeding of the 3rd CRC Credit Scoring Conference, Edinburgh, UK (2007)Google Scholar
  22. 22.
    Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Research 7(1), 1–30 (2006)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Vicente García
    • 1
  • Ana Isabel Marqués
    • 2
  • Jose Salvador Sánchez
    • 1
  1. 1.Dep. Computer Languages and Systems - Institute of New Imaging TechnologiesUniversitat Jaume ICastelló de la PlanaSpain
  2. 2.Dep. Business Administration and MarketingUniversitat Jaume ICastelló de la PlanaSpain

Personalised recommendations