Generative adversarial fusion network for class imbalance credit scoring

Abstract

Credit scoring on class imbalance data, where the class of defaulters is insufficiently represented compared with the class of non-defaulters, is an important but challenging task. In this paper, we propose an imbalanced generative adversarial fusion network (IGAFN) to cope with the class imbalance credit scoring based on multi-source heterogeneous credit data. Concretely, we design a fusion module to integrate the heterogeneous credit data from multiple sources into a unified latent feature space. A generative adversarial network-based balance module is then designed to generate latent representations of new samples for the minority class of the imbalanced datasets. The performance of IGAFN is compared against multiple conventional machine learning and deep learning algorithms. Extensive experiments show that the proposed IGAFN exhibits significantly better performance than the compared methods on two real-life datasets.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

Notes

  1. 1.

    https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients.

  2. 2.

    https://www.kaggle.com/c/data-science-nigeria-credit-risk-prediction.

References

  1. 1.

    Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113

    Google Scholar 

  2. 2.

    Abellán J, Mantas CJ (2014) Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Syst Appl 41(8):3825–3830

    Google Scholar 

  3. 3.

    Batista GE, Bazzan AL, Monard MC (2003) Balancing training data for automated annotation of keywords: a case study. In: Brazilian Workshop on Bioinformatics, pp 35–43

  4. 4.

    Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29

    Google Scholar 

  5. 5.

    Bellotti T, Crook J (2009) Support vector machines for credit scoring and discovery of significant features. Expert Syst Appl 36(2):3302–3308

    Google Scholar 

  6. 6.

    Ben-David A (2008) Rule effectiveness in rule-based systems: a credit scoring case study. Expert Syst Appl 34(4):2783–2788

    Google Scholar 

  7. 7.

    Blanco A, Pino-Mejías R, Lara J, Rayo S (2013) Credit scoring models for the microfinance industry using neural networks: evidence from Peru. Expert Syst Appl 40(1):356–364

    Google Scholar 

  8. 8.

    Brennan P (2012) A comprehensive survey of methods for overcoming the class imbalance problem in fraud detection. Institute of Technology Blanchardstown, Dublin

    Google Scholar 

  9. 9.

    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  10. 10.

    Chen MY (2011) Predicting corporate financial distress based on integration of decision tree classification and logistic regression. Expert Syst Appl 38(9):11261–11272

    Google Scholar 

  11. 11.

    Chen RC, Chen TS, Lin CC (2006) A new binary support vector system for increasing detection rate of credit card fraud. Int J Pattern Recognit Artif Intell 20(02):227–239

    Google Scholar 

  12. 12.

    Crook JN, Edelman DB, Thomas LC (2007) Recent developments in consumer credit risk assessment. Eur J Oper Res 183(3):1447–1465

    MathSciNet  MATH  Google Scholar 

  13. 13.

    Douzas G, Bacao F (2018) Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst Appl 91:464–471

    Google Scholar 

  14. 14.

    Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874

    MathSciNet  Google Scholar 

  15. 15.

    Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F (2017) Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf Sci 479:448–455

    Google Scholar 

  16. 16.

    Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  17. 17.

    Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, pp 878–887. Springer

  18. 18.

    Harris T (2015) Credit scoring using the clustered support vector machine. Expert Syst Appl 42(2):741–750

    Google Scholar 

  19. 19.

    Henley W, Hand Dj (1997) Construction of a k-nearest-neighbour credit-scoring system. IMA J Manag Math 8(4):305–321

    MATH  Google Scholar 

  20. 20.

    Hua Z, Wang Y, Xu X, Zhang B, Liang L (2007) Predicting corporate financial distress based on integration of support vector machine and logistic regression. Expert Syst Appl 33(2):434–440

    Google Scholar 

  21. 21.

    Huang CL, Chen MC, Wang CJ (2007) Credit scoring with a data mining approach based on support vector machines. Expert Syst Appl 33(4):847–856

    Google Scholar 

  22. 22.

    Joanes DN (1993) Reject inference applied to logistic regression for credit scoring. IMA J Manag Math 5(1):35–43

    MATH  Google Scholar 

  23. 23.

    Kvamme H, Sellereite N, Aas K, Sjursen S (2018) Predicting mortgage default using convolutional neural networks. Expert Syst Appl 102:207–217

    Google Scholar 

  24. 24.

    Lessmann S, Baesens B, Seow HV, Thomas LC (2015) Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur J Oper Res 247(1):124–136

    MATH  Google Scholar 

  25. 25.

    Li FC (2009) The hybrid credit scoring strategies based on KNN classifier. In: Sixth international conference on fuzzy systems and knowledge discovery, 2009. FSKD’09, vol 1, pp 330–334. IEEE

  26. 26.

    Li S, Tsang IW, Chaudhari NS (2012) Relevance vector machine based infinite decision agent ensemble learning for credit risk analysis. Expert Syst Appl 39(5):4947–4953

    Google Scholar 

  27. 27.

    Liu L, Zhang H, Ji Y, Wu QJ (2019) Towards AI fashion design: an attribute-GAN model for clothing match. Neurocomputing 341:156–167

    Google Scholar 

  28. 28.

    Luo C, Wu D, Wu D (2017) A deep learning approach for credit scoring using credit default swaps. Eng Appl Artif Intell 65:465–470

    Google Scholar 

  29. 29.

    Marqués AI, García V, Sánchez JS (2013) On the suitability of resampling techniques for the class imbalance problem in credit scoring. J Oper Res Soc 64(7):1060–1070

    Google Scholar 

  30. 30.

    Martens D, Baesens B, Van Gestel T, Vanthienen J (2007) Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res 183(3):1466–1476

    MATH  Google Scholar 

  31. 31.

    Mirza M, Osindero S (2014) Conditional generative adversarial nets. ArXiv preprint arXiv:1411.1784

  32. 32.

    Nanni L, Lumini A (2006) An experimental comparison of ensemble of classifiers for biometric data. Neurocomputing 69(13–15):1670–1673

    MATH  Google Scholar 

  33. 33.

    Odena A (2016) Semi-supervised learning with generative adversarial networks. ArXiv preprint arXiv:1606.01583

  34. 34.

    Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242

  35. 35.

    Tomczak JM, Zieba M (2015) Classification restricted Boltzmann machine for comprehensible credit scoring model. Expert Syst Appl 42(4):1789–1796

    Google Scholar 

  36. 36.

    Tsai CF (2014) Combining cluster analysis with classifier ensembles to predict financial distress. Inf Fusion 16:46–58

    Google Scholar 

  37. 37.

    Yeh IC, Lien Ch (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl 36(2):2473–2480

    Google Scholar 

  38. 38.

    Zhang D, Zhou X, Leung SC, Zheng J (2010) Vertical bagging decision trees model for credit scoring. Expert Syst Appl 37(12):7838–7843

    Google Scholar 

  39. 39.

    Zhang H, Sun Y, Liu L, Wang X, Li L, Liu W (2018) ClothingOut: a category-supervised GAN model for clothing segmentation and retrieval. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3691-y

    Google Scholar 

  40. 40.

    Zhang Y, Wang D, Chen Y, Shang H, Tian Q (2017) Credit risk assessment based on long short-term memory model. In: International conference on intelligent computing, pp 700–712. Springer

  41. 41.

    Zojaji Z, Atani RE, Monadjemi AH et al (2016) A survey of credit card fraud detection techniques: data and technique oriented perspective. ArXiv preprint arXiv:1611.06439

Download references

Acknowledgements

This work was financially supported by the Shenzhen Project (ZDSYS201802051831427), National Natural Science Foundation of China (No. 61602013), and the Shenzhen Fundamental Research Project (No. JCYJ20170818091546869). Min Yang was sponsored by CCF-Tencent Open Research Fund.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ying Shen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lei, K., Xie, Y., Zhong, S. et al. Generative adversarial fusion network for class imbalance credit scoring. Neural Comput & Applic 32, 8451–8462 (2020). https://doi.org/10.1007/s00521-019-04335-1

Download citation

Keywords

  • Credit scoring
  • Class imbalance
  • Generative adversarial network
  • Feature fusion