Advertisement

Generative adversarial fusion network for class imbalance credit scoring

  • Kai Lei
  • Yuexiang Xie
  • Shangru Zhong
  • Jingchao Dai
  • Min Yang
  • Ying ShenEmail author
Original Article
  • 26 Downloads

Abstract

Credit scoring on class imbalance data, where the class of defaulters is insufficiently represented compared with the class of non-defaulters, is an important but challenging task. In this paper, we propose an imbalanced generative adversarial fusion network (IGAFN) to cope with the class imbalance credit scoring based on multi-source heterogeneous credit data. Concretely, we design a fusion module to integrate the heterogeneous credit data from multiple sources into a unified latent feature space. A generative adversarial network-based balance module is then designed to generate latent representations of new samples for the minority class of the imbalanced datasets. The performance of IGAFN is compared against multiple conventional machine learning and deep learning algorithms. Extensive experiments show that the proposed IGAFN exhibits significantly better performance than the compared methods on two real-life datasets.

Keywords

Credit scoring Class imbalance Generative adversarial network Feature fusion 

Notes

Acknowledgements

This work was financially supported by the Shenzhen Project (ZDSYS201802051831427), National Natural Science Foundation of China (No. 61602013), and the Shenzhen Fundamental Research Project (No. JCYJ20170818091546869). Min Yang was sponsored by CCF-Tencent Open Research Fund.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. 1.
    Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113Google Scholar
  2. 2.
    Abellán J, Mantas CJ (2014) Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Syst Appl 41(8):3825–3830Google Scholar
  3. 3.
    Batista GE, Bazzan AL, Monard MC (2003) Balancing training data for automated annotation of keywords: a case study. In: Brazilian Workshop on Bioinformatics, pp 35–43Google Scholar
  4. 4.
    Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29Google Scholar
  5. 5.
    Bellotti T, Crook J (2009) Support vector machines for credit scoring and discovery of significant features. Expert Syst Appl 36(2):3302–3308Google Scholar
  6. 6.
    Ben-David A (2008) Rule effectiveness in rule-based systems: a credit scoring case study. Expert Syst Appl 34(4):2783–2788Google Scholar
  7. 7.
    Blanco A, Pino-Mejías R, Lara J, Rayo S (2013) Credit scoring models for the microfinance industry using neural networks: evidence from Peru. Expert Syst Appl 40(1):356–364Google Scholar
  8. 8.
    Brennan P (2012) A comprehensive survey of methods for overcoming the class imbalance problem in fraud detection. Institute of Technology Blanchardstown, DublinGoogle Scholar
  9. 9.
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357zbMATHGoogle Scholar
  10. 10.
    Chen MY (2011) Predicting corporate financial distress based on integration of decision tree classification and logistic regression. Expert Syst Appl 38(9):11261–11272Google Scholar
  11. 11.
    Chen RC, Chen TS, Lin CC (2006) A new binary support vector system for increasing detection rate of credit card fraud. Int J Pattern Recognit Artif Intell 20(02):227–239Google Scholar
  12. 12.
    Crook JN, Edelman DB, Thomas LC (2007) Recent developments in consumer credit risk assessment. Eur J Oper Res 183(3):1447–1465MathSciNetzbMATHGoogle Scholar
  13. 13.
    Douzas G, Bacao F (2018) Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst Appl 91:464–471Google Scholar
  14. 14.
    Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874MathSciNetGoogle Scholar
  15. 15.
    Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F (2017) Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf Sci 479:448–455Google Scholar
  16. 16.
    Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680Google Scholar
  17. 17.
    Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, pp 878–887. SpringerGoogle Scholar
  18. 18.
    Harris T (2015) Credit scoring using the clustered support vector machine. Expert Syst Appl 42(2):741–750Google Scholar
  19. 19.
    Henley W, Hand Dj (1997) Construction of a k-nearest-neighbour credit-scoring system. IMA J Manag Math 8(4):305–321zbMATHGoogle Scholar
  20. 20.
    Hua Z, Wang Y, Xu X, Zhang B, Liang L (2007) Predicting corporate financial distress based on integration of support vector machine and logistic regression. Expert Syst Appl 33(2):434–440Google Scholar
  21. 21.
    Huang CL, Chen MC, Wang CJ (2007) Credit scoring with a data mining approach based on support vector machines. Expert Syst Appl 33(4):847–856Google Scholar
  22. 22.
    Joanes DN (1993) Reject inference applied to logistic regression for credit scoring. IMA J Manag Math 5(1):35–43zbMATHGoogle Scholar
  23. 23.
    Kvamme H, Sellereite N, Aas K, Sjursen S (2018) Predicting mortgage default using convolutional neural networks. Expert Syst Appl 102:207–217Google Scholar
  24. 24.
    Lessmann S, Baesens B, Seow HV, Thomas LC (2015) Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur J Oper Res 247(1):124–136zbMATHGoogle Scholar
  25. 25.
    Li FC (2009) The hybrid credit scoring strategies based on KNN classifier. In: Sixth international conference on fuzzy systems and knowledge discovery, 2009. FSKD’09, vol 1, pp 330–334. IEEEGoogle Scholar
  26. 26.
    Li S, Tsang IW, Chaudhari NS (2012) Relevance vector machine based infinite decision agent ensemble learning for credit risk analysis. Expert Syst Appl 39(5):4947–4953Google Scholar
  27. 27.
    Liu L, Zhang H, Ji Y, Wu QJ (2019) Towards AI fashion design: an attribute-GAN model for clothing match. Neurocomputing 341:156–167Google Scholar
  28. 28.
    Luo C, Wu D, Wu D (2017) A deep learning approach for credit scoring using credit default swaps. Eng Appl Artif Intell 65:465–470Google Scholar
  29. 29.
    Marqués AI, García V, Sánchez JS (2013) On the suitability of resampling techniques for the class imbalance problem in credit scoring. J Oper Res Soc 64(7):1060–1070Google Scholar
  30. 30.
    Martens D, Baesens B, Van Gestel T, Vanthienen J (2007) Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res 183(3):1466–1476zbMATHGoogle Scholar
  31. 31.
    Mirza M, Osindero S (2014) Conditional generative adversarial nets. ArXiv preprint arXiv:1411.1784
  32. 32.
    Nanni L, Lumini A (2006) An experimental comparison of ensemble of classifiers for biometric data. Neurocomputing 69(13–15):1670–1673zbMATHGoogle Scholar
  33. 33.
    Odena A (2016) Semi-supervised learning with generative adversarial networks. ArXiv preprint arXiv:1606.01583
  34. 34.
    Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242Google Scholar
  35. 35.
    Tomczak JM, Zieba M (2015) Classification restricted Boltzmann machine for comprehensible credit scoring model. Expert Syst Appl 42(4):1789–1796Google Scholar
  36. 36.
    Tsai CF (2014) Combining cluster analysis with classifier ensembles to predict financial distress. Inf Fusion 16:46–58Google Scholar
  37. 37.
    Yeh IC, Lien Ch (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl 36(2):2473–2480Google Scholar
  38. 38.
    Zhang D, Zhou X, Leung SC, Zheng J (2010) Vertical bagging decision trees model for credit scoring. Expert Syst Appl 37(12):7838–7843Google Scholar
  39. 39.
    Zhang H, Sun Y, Liu L, Wang X, Li L, Liu W (2018) ClothingOut: a category-supervised GAN model for clothing segmentation and retrieval. Neural Comput Appl.  https://doi.org/10.1007/s00521-018-3691-y Google Scholar
  40. 40.
    Zhang Y, Wang D, Chen Y, Shang H, Tian Q (2017) Credit risk assessment based on long short-term memory model. In: International conference on intelligent computing, pp 700–712. SpringerGoogle Scholar
  41. 41.
    Zojaji Z, Atani RE, Monadjemi AH et al (2016) A survey of credit card fraud detection techniques: data and technique oriented perspective. ArXiv preprint arXiv:1611.06439

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Shenzhen Key Lab for Information Centric Networking & Blockchain Technology (ICNLAB), School of Electronics and Computer Engineering (SECE)Peking UniversityShenzhenPeople’s Republic of China
  2. 2.PCL Research Center of Networks and CommunicationsPeng Cheng LaboratoryShenzhenPeople’s Republic of China
  3. 3.Shenzhen Institutes of Advanced Technology (SIAT)University of the Chinese Academy of SciencesBeijingPeople’s Republic of China

Personalised recommendations