Improving Risk Predictions by Preprocessing Imbalanced Credit Data

García, Vicente; Marqués, Ana Isabel; Sánchez, Jose Salvador

doi:10.1007/978-3-642-34481-7_9

Vicente García²⁰,
Ana Isabel Marqués²¹ &
Jose Salvador Sánchez²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7664))

Included in the following conference series:

International Conference on Neural Information Processing

2922 Accesses
6 Citations

Abstract

Imbalanced credit data sets refer to databases in which the class of defaulters is heavily under-represented in comparison to the class of non-defaulters. This is a very common situation in real-life credit scoring applications, but it has still received little attention. This paper investigates whether data resampling can be used to improve the performance of learners built from imbalanced credit data sets, and whether the effectiveness of resampling is related to the type of classifier. Experimental results demonstrate that learning with the resampled sets consistently outperforms the use of the original imbalanced credit data, independently of the classifier used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Thomas, L.C., Edelman, D.B., Crook, J.N.: Credit Scoring and Its Applications. SIAM, Philadelphia (2002)
Book MATH Google Scholar
Phua, C., Alahakoon, D., Lee, V.: Minority Report in Fraud Detection: Classification of Skewed Data. SIGKDD Explorations Newsletter 6(1), 50–59 (2004)
Article Google Scholar
Kiefer, N.M.: Default estimation for Low-Default Portfolios. J. Empiri. Finance 16(1), 164–173 (2009)
Article MathSciNet Google Scholar
Japkowicz, N., Stephen, S.: The Class Imbalance Problem: A Systematic Study. Intell. Data Anal. 6(5), 429–449 (2002)
MATH Google Scholar
Kennedy, K., Mac Namee, B., Delany, S.J.: Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem. In: Coyle, L., Freyne, J. (eds.) AICS 2009. LNCS, vol. 6206, pp. 174–187. Springer, Heidelberg (2010)
Chapter Google Scholar
Brown, I., Mues, C.: An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets. Expert Syst. Appl. 39(3), 3446–3453 (2012)
Article Google Scholar
Vinciotti, V., Hand, D.J.: Scorecard Construction with Unbalanced Class Sizes. J. Iran. Stat. Society 2(2), 189–205 (2003)
MathSciNet Google Scholar
Huang, Y.M., Hung, C.M., Jiau, H.C.: Evaluation of Neural Networks and Data Mining Methods on a Credit Assessment Task for Class Imbalance Problem. Nonlinear Anal. RWA 7(4), 720–747 (2006)
Article MathSciNet MATH Google Scholar
Yao, P.: Comparative Study on Class Imbalance Learning for Credit Scoring. In: Proc. of the 9th International Conference on Hybrid Intelligent Systems, Shenyang, China, pp. 105–107 (2009)
Google Scholar
Xie, H., Han, S., Shu, X., Yang, X., Qu, X., Zheng, S.: Solving Credit Scoring Problem with Ensemble Learning: A Case Study. In: Proc. of the 2nd International Symposium on Knowledge Acquisition and Modeling, Wuhan, China, pp. 51–54 (2009)
Google Scholar
Florez-Lopez, R.: Credit Risk Management for Low Default Portfolios. Forecasting Defaults through Cooperative Models and Boostrapping Strategies. In: Proc. of the 4th European Risk Conference, Nottingham, UK, pp. 1–27 (2010)
Google Scholar
Tian, B., Nan, L., Zheng, Q., Yang, L.: Customer Credit Scoring Method Based on the SVDD Classification Model with Imbalanced Dataset. In: Zaman, M., Liang, Y., Siddiqui, S.M., Wang, T., Liu, V., Lu, C. (eds.) CETS 2010. CCIS, vol. 113, pp. 46–60. Springer, Heidelberg (2010)
Chapter Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intell. Research 16, 321–357 (2002)
MATH Google Scholar
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 475–482. Springer, Heidelberg (2009)
Chapter Google Scholar
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations Newsletter 6(1), 20–29 (2004)
Article Google Scholar
Wilson, D.L.: Asymptotic Properties of Nearest Neighbour Rules Using Edited Data. IEEE Trans. Syst. Man and Cybern. 2, 408–421 (1972)
Article MATH Google Scholar
Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In: Proc. of the 14th International Conference on Machine Learning, Nashville, TN, pp. 179–186 (1997)
Google Scholar
Hart, P.E.: The Condensed Nearest Neighbor Rule. IEEE Trans. Inform. Theory 14(3), 505–516 (1968)
Article Google Scholar
Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distribution. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS (LNAI), vol. 2101, pp. 63–66. Springer, Heidelberg (2001)
Chapter Google Scholar
Yen, S.J., Lee, Y.-S.: Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCIS, vol. 344, pp. 731–740. Springer, Heidelberg (2006)
Google Scholar
Sabzevari, H., Soleymani, M., Noorbakhsh, E.: A Comparison between Statistical and Data Mining Methods for Credit Scoring in Case of Limited Available Data. In: Proceeding of the 3rd CRC Credit Scoring Conference, Edinburgh, UK (2007)
Google Scholar
Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Research 7(1), 1–30 (2006)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dep. Computer Languages and Systems - Institute of New Imaging Technologies, Universitat Jaume I, Av. Vicent Sos Baynat s/n, 12071, Castelló de la Plana, Spain
Vicente García & Jose Salvador Sánchez
Dep. Business Administration and Marketing, Universitat Jaume I, Av. Vicent Sos Baynat s/n, 12071, Castelló de la Plana, Spain
Ana Isabel Marqués

Authors

Vicente García
View author publications
You can also search for this author in PubMed Google Scholar
Ana Isabel Marqués
View author publications
You can also search for this author in PubMed Google Scholar
Jose Salvador Sánchez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Texas A&M University at Qatar, Education City, P.O. Box 23874, Doha, Qatar
Tingwen Huang
Department of Control Science and Engineering, Huazhong University of Science and Technology, 1037 Luoyu Road, 430074, Wuhan, Hubei, China
Zhigang Zeng
College of Computer Science, Chongqing University, 174 Shazhengjie Street, 400044, Chongqing, China
Chuandong Li
Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
Chi Sing Leung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

García, V., Marqués, A.I., Sánchez, J.S. (2012). Improving Risk Predictions by Preprocessing Imbalanced Credit Data. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7664. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34481-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-34481-7_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34480-0
Online ISBN: 978-3-642-34481-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics