Abstract
Imbalanced credit data sets refer to databases in which the class of defaulters is heavily under-represented in comparison to the class of non-defaulters. This is a very common situation in real-life credit scoring applications, but it has still received little attention. This paper investigates whether data resampling can be used to improve the performance of learners built from imbalanced credit data sets, and whether the effectiveness of resampling is related to the type of classifier. Experimental results demonstrate that learning with the resampled sets consistently outperforms the use of the original imbalanced credit data, independently of the classifier used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Thomas, L.C., Edelman, D.B., Crook, J.N.: Credit Scoring and Its Applications. SIAM, Philadelphia (2002)
Phua, C., Alahakoon, D., Lee, V.: Minority Report in Fraud Detection: Classification of Skewed Data. SIGKDD Explorations Newsletter 6(1), 50–59 (2004)
Kiefer, N.M.: Default estimation for Low-Default Portfolios. J. Empiri. Finance 16(1), 164–173 (2009)
Japkowicz, N., Stephen, S.: The Class Imbalance Problem: A Systematic Study. Intell. Data Anal. 6(5), 429–449 (2002)
Kennedy, K., Mac Namee, B., Delany, S.J.: Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem. In: Coyle, L., Freyne, J. (eds.) AICS 2009. LNCS, vol. 6206, pp. 174–187. Springer, Heidelberg (2010)
Brown, I., Mues, C.: An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets. Expert Syst. Appl. 39(3), 3446–3453 (2012)
Vinciotti, V., Hand, D.J.: Scorecard Construction with Unbalanced Class Sizes. J. Iran. Stat. Society 2(2), 189–205 (2003)
Huang, Y.M., Hung, C.M., Jiau, H.C.: Evaluation of Neural Networks and Data Mining Methods on a Credit Assessment Task for Class Imbalance Problem. Nonlinear Anal. RWA 7(4), 720–747 (2006)
Yao, P.: Comparative Study on Class Imbalance Learning for Credit Scoring. In: Proc. of the 9th International Conference on Hybrid Intelligent Systems, Shenyang, China, pp. 105–107 (2009)
Xie, H., Han, S., Shu, X., Yang, X., Qu, X., Zheng, S.: Solving Credit Scoring Problem with Ensemble Learning: A Case Study. In: Proc. of the 2nd International Symposium on Knowledge Acquisition and Modeling, Wuhan, China, pp. 51–54 (2009)
Florez-Lopez, R.: Credit Risk Management for Low Default Portfolios. Forecasting Defaults through Cooperative Models and Boostrapping Strategies. In: Proc. of the 4th European Risk Conference, Nottingham, UK, pp. 1–27 (2010)
Tian, B., Nan, L., Zheng, Q., Yang, L.: Customer Credit Scoring Method Based on the SVDD Classification Model with Imbalanced Dataset. In: Zaman, M., Liang, Y., Siddiqui, S.M., Wang, T., Liu, V., Lu, C. (eds.) CETS 2010. CCIS, vol. 113, pp. 46–60. Springer, Heidelberg (2010)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intell. Research 16, 321–357 (2002)
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 475–482. Springer, Heidelberg (2009)
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations Newsletter 6(1), 20–29 (2004)
Wilson, D.L.: Asymptotic Properties of Nearest Neighbour Rules Using Edited Data. IEEE Trans. Syst. Man and Cybern. 2, 408–421 (1972)
Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In: Proc. of the 14th International Conference on Machine Learning, Nashville, TN, pp. 179–186 (1997)
Hart, P.E.: The Condensed Nearest Neighbor Rule. IEEE Trans. Inform. Theory 14(3), 505–516 (1968)
Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distribution. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS (LNAI), vol. 2101, pp. 63–66. Springer, Heidelberg (2001)
Yen, S.J., Lee, Y.-S.: Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCIS, vol. 344, pp. 731–740. Springer, Heidelberg (2006)
Sabzevari, H., Soleymani, M., Noorbakhsh, E.: A Comparison between Statistical and Data Mining Methods for Credit Scoring in Case of Limited Available Data. In: Proceeding of the 3rd CRC Credit Scoring Conference, Edinburgh, UK (2007)
Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Research 7(1), 1–30 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
García, V., Marqués, A.I., Sánchez, J.S. (2012). Improving Risk Predictions by Preprocessing Imbalanced Credit Data. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7664. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34481-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-34481-7_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34480-0
Online ISBN: 978-3-642-34481-7
eBook Packages: Computer ScienceComputer Science (R0)