Skip to main content

Improving Risk Predictions by Preprocessing Imbalanced Credit Data

  • Conference paper
Neural Information Processing (ICONIP 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7664))

Included in the following conference series:

Abstract

Imbalanced credit data sets refer to databases in which the class of defaulters is heavily under-represented in comparison to the class of non-defaulters. This is a very common situation in real-life credit scoring applications, but it has still received little attention. This paper investigates whether data resampling can be used to improve the performance of learners built from imbalanced credit data sets, and whether the effectiveness of resampling is related to the type of classifier. Experimental results demonstrate that learning with the resampled sets consistently outperforms the use of the original imbalanced credit data, independently of the classifier used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Thomas, L.C., Edelman, D.B., Crook, J.N.: Credit Scoring and Its Applications. SIAM, Philadelphia (2002)

    Book  MATH  Google Scholar 

  2. Phua, C., Alahakoon, D., Lee, V.: Minority Report in Fraud Detection: Classification of Skewed Data. SIGKDD Explorations Newsletter 6(1), 50–59 (2004)

    Article  Google Scholar 

  3. Kiefer, N.M.: Default estimation for Low-Default Portfolios. J. Empiri. Finance 16(1), 164–173 (2009)

    Article  MathSciNet  Google Scholar 

  4. Japkowicz, N., Stephen, S.: The Class Imbalance Problem: A Systematic Study. Intell. Data Anal. 6(5), 429–449 (2002)

    MATH  Google Scholar 

  5. Kennedy, K., Mac Namee, B., Delany, S.J.: Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem. In: Coyle, L., Freyne, J. (eds.) AICS 2009. LNCS, vol. 6206, pp. 174–187. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Brown, I., Mues, C.: An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets. Expert Syst. Appl. 39(3), 3446–3453 (2012)

    Article  Google Scholar 

  7. Vinciotti, V., Hand, D.J.: Scorecard Construction with Unbalanced Class Sizes. J. Iran. Stat. Society 2(2), 189–205 (2003)

    MathSciNet  Google Scholar 

  8. Huang, Y.M., Hung, C.M., Jiau, H.C.: Evaluation of Neural Networks and Data Mining Methods on a Credit Assessment Task for Class Imbalance Problem. Nonlinear Anal. RWA 7(4), 720–747 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Yao, P.: Comparative Study on Class Imbalance Learning for Credit Scoring. In: Proc. of the 9th International Conference on Hybrid Intelligent Systems, Shenyang, China, pp. 105–107 (2009)

    Google Scholar 

  10. Xie, H., Han, S., Shu, X., Yang, X., Qu, X., Zheng, S.: Solving Credit Scoring Problem with Ensemble Learning: A Case Study. In: Proc. of the 2nd International Symposium on Knowledge Acquisition and Modeling, Wuhan, China, pp. 51–54 (2009)

    Google Scholar 

  11. Florez-Lopez, R.: Credit Risk Management for Low Default Portfolios. Forecasting Defaults through Cooperative Models and Boostrapping Strategies. In: Proc. of the 4th European Risk Conference, Nottingham, UK, pp. 1–27 (2010)

    Google Scholar 

  12. Tian, B., Nan, L., Zheng, Q., Yang, L.: Customer Credit Scoring Method Based on the SVDD Classification Model with Imbalanced Dataset. In: Zaman, M., Liang, Y., Siddiqui, S.M., Wang, T., Liu, V., Lu, C. (eds.) CETS 2010. CCIS, vol. 113, pp. 46–60. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  13. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intell. Research 16, 321–357 (2002)

    MATH  Google Scholar 

  14. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 475–482. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  15. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations Newsletter 6(1), 20–29 (2004)

    Article  Google Scholar 

  16. Wilson, D.L.: Asymptotic Properties of Nearest Neighbour Rules Using Edited Data. IEEE Trans. Syst. Man and Cybern. 2, 408–421 (1972)

    Article  MATH  Google Scholar 

  17. Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In: Proc. of the 14th International Conference on Machine Learning, Nashville, TN, pp. 179–186 (1997)

    Google Scholar 

  18. Hart, P.E.: The Condensed Nearest Neighbor Rule. IEEE Trans. Inform. Theory 14(3), 505–516 (1968)

    Article  Google Scholar 

  19. Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distribution. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS (LNAI), vol. 2101, pp. 63–66. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  20. Yen, S.J., Lee, Y.-S.: Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCIS, vol. 344, pp. 731–740. Springer, Heidelberg (2006)

    Google Scholar 

  21. Sabzevari, H., Soleymani, M., Noorbakhsh, E.: A Comparison between Statistical and Data Mining Methods for Credit Scoring in Case of Limited Available Data. In: Proceeding of the 3rd CRC Credit Scoring Conference, Edinburgh, UK (2007)

    Google Scholar 

  22. Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Research 7(1), 1–30 (2006)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

García, V., Marqués, A.I., Sánchez, J.S. (2012). Improving Risk Predictions by Preprocessing Imbalanced Credit Data. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7664. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34481-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34481-7_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34480-0

  • Online ISBN: 978-3-642-34481-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics