Advertisement

Decision Trees as Interpretable Bank Credit Scoring Models

  • Andrzej Szwabe
  • Pawel MisiorekEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 928)

Abstract

We evaluate several approaches to classification of loan applications that provide their final results in the form of a single decision tree, i.e., in the form widely regarded as interpretable by humans. We apply state-of-the-art credit scoring-oriented classification algorithms, such as logistic regression, gradient boosting decision trees and random forests, as components of the proposed algorithms of decision tree building. We use four real-world loan default prediction data sets of different sizes. We evaluate the proposed methods using the area under the receiver operating characteristic curve (AUC) but we also measure the models’ interpretability. We verify the significance of differences between AUC values observed when using the compared techniques by measuring Friedman’s statistic and performing Nemenyi’s post-hoc test.

Keywords

Data extraction and integration Credit scoring Expert systems and artificial intelligence Big data Data processing performance 

Notes

Acknowledgments

This work was supported by the Polish National Science Centre, grant DEC-2011/01/D/ST6/06788, and by Poznan University of Technology under grant 04/45/DSPB/0185.

References

  1. 1.
    Baesens, B., Setiono, R., Mues, C., Vanthienen, J.: Using neural network rule extraction and decision tables for credit-risk evaluation. Manage. Sci. 49(3), 312–329 (2003).  https://doi.org/10.1287/mnsc.49.3.312.12739CrossRefzbMATHGoogle Scholar
  2. 2.
    Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39(3), 3446–3453 (2012).  https://doi.org/10.1016/j.eswa.2011.09.033CrossRefGoogle Scholar
  3. 3.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006). http://dl.acm.org/citation.cfm?id=1248547.1248548MathSciNetzbMATHGoogle Scholar
  4. 4.
    Domingos, P.: Knowledge discovery via multiple models. Intell. Data Anal. 2(3), 187–202 (1998). http://dl.acm.org/citation.cfm?id=2639331.2639334CrossRefGoogle Scholar
  5. 5.
    Flach, P.: Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge University Press, New York (2012)CrossRefGoogle Scholar
  6. 6.
    Goodman, B., Flaxman, S.: European Union regulations on algorithmic decision-making and a “right to explanation”, June 2016. arXiv e-prints arXiv:1606.08813
  7. 7.
    Harris, T.: Credit scoring using the clustered support vector machine. Expert Syst. Appl. 42(2), 741–750 (2015).  https://doi.org/10.1016/j.eswa.2014.08.029CrossRefGoogle Scholar
  8. 8.
    Kaggle: Default of Credit Card Clients Dataset (2017). https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset
  9. 9.
    Kaggle: Give Me Some Credit dataset (2017). https://www.kaggle.com/c/GiveMeSomeCredit
  10. 10.
    Kaggle: Lending Club Loan dataset (2017). https://www.kaggle.com/wendykan/lending-club-loan-data
  11. 11.
    Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)CrossRefGoogle Scholar
  12. 12.
    Li, Z., Tian, Y., Li, K., Zhou, F., Yang, W.: Reject inference in credit scoring using semi-supervised support vector machines. Expert Syst. Appl. 74(C), 105–114 (2017).  https://doi.org/10.1016/j.eswa.2017.01.011CrossRefGoogle Scholar
  13. 13.
    Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42(10), 4621–4631 (2015).  https://doi.org/10.1016/j.eswa.2015.02.001CrossRefGoogle Scholar
  14. 14.
    Martens, D., Huysmans, J., Setiono, R., Vanthienen, J., Baesens, B.: Rule extraction from support vector machines: an overview of issues and application in credit scoring. In: Diederich, J. (ed.) Rule Extraction from Support Vector Machines. SCI, pp. 33–63. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-75390-2_2CrossRefzbMATHGoogle Scholar
  15. 15.
    Martens, D., Vanthienen, J., Verbeke, W., Baesens, B.: Performance of classification models from a user perspective. Decis. Support Syst. 51(4), 782–793 (2011).  https://doi.org/10.1016/j.dss.2011.01.013CrossRefGoogle Scholar
  16. 16.
    Scikit-learn: Machine Learning in Python. (2017). http://scikit-learn.org/stable/index.html
  17. 17.
    Serrano-Cinca, C., Gutiérrez-Nieto, B.N.: The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decis. Support Syst. 89(C), 113–122 (2016).  https://doi.org/10.1016/j.dss.2016.06.014CrossRefGoogle Scholar
  18. 18.
    Szwabe, A.: Kernel and acquisition function setup for Bayesian optimization of gradient boosting hyperparameters. In: Nguyen, N.T., Hoang, D.H., Hong, T.-P., Pham, H., Trawiński, B. (eds.) ACIIDS 2018. LNCS (LNAI), vol. 10751, pp. 297–306. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-75417-8_28CrossRefGoogle Scholar
  19. 19.
    Szwabe, A., Misiorek, P., Ciesielczyk, M.: Tensor-based modeling of temporal features for big data CTR estimation. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2017. CCIS, vol. 716, pp. 16–27. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-58274-0_2CrossRefGoogle Scholar
  20. 20.
    Szwabe, A., Misiorek, P., Walkowiak, P.: Reflective relational learning for ontology alignment. In: Omatu, S., De Paz Santana, J.F., González, S.R., Molina, J.M., Bernardos, A.M., Rodríguez, J.M.C. (eds.) Distributed Computing and Artificial Intelligence. AISC, vol. 151, pp. 519–526. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-28765-7_62CrossRefGoogle Scholar
  21. 21.
    University of California, Irvine (UCI), Machine Learning Repository: German Credit dataset (2017). https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)
  22. 22.
    Wang, H., Xu, Q., Zhou, L.: Large unbalanced credit scoring using lasso-logistic regression ensemble. PloS one 10(2), e0117844 (2015)CrossRefGoogle Scholar
  23. 23.
    Yeh, I.C., Lien, C.H.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2), 2473–2480 (2009).  https://doi.org/10.1016/j.eswa.2007.12.020CrossRefGoogle Scholar
  24. 24.
    Zieba, M., Hardle, W.K.: Beta-boosted ensemble for big credit scoring data. SFB 649 Discussion Paper 2016-052, SSRN, November 2016.  https://doi.org/10.2139/ssrn.2875664, https://ssrn.com/abstract=2875664

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Institute of Control, Robotics and Information EngineeringPoznan University of TechnologyPoznanPoland

Personalised recommendations