Skip to main content

Decision Trees as Interpretable Bank Credit Scoring Models

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 928))

Abstract

We evaluate several approaches to classification of loan applications that provide their final results in the form of a single decision tree, i.e., in the form widely regarded as interpretable by humans. We apply state-of-the-art credit scoring-oriented classification algorithms, such as logistic regression, gradient boosting decision trees and random forests, as components of the proposed algorithms of decision tree building. We use four real-world loan default prediction data sets of different sizes. We evaluate the proposed methods using the area under the receiver operating characteristic curve (AUC) but we also measure the models’ interpretability. We verify the significance of differences between AUC values observed when using the compared techniques by measuring Friedman’s statistic and performing Nemenyi’s post-hoc test.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Baesens, B., Setiono, R., Mues, C., Vanthienen, J.: Using neural network rule extraction and decision tables for credit-risk evaluation. Manage. Sci. 49(3), 312–329 (2003). https://doi.org/10.1287/mnsc.49.3.312.12739

    Article  MATH  Google Scholar 

  2. Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39(3), 3446–3453 (2012). https://doi.org/10.1016/j.eswa.2011.09.033

    Article  Google Scholar 

  3. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006). http://dl.acm.org/citation.cfm?id=1248547.1248548

    MathSciNet  MATH  Google Scholar 

  4. Domingos, P.: Knowledge discovery via multiple models. Intell. Data Anal. 2(3), 187–202 (1998). http://dl.acm.org/citation.cfm?id=2639331.2639334

    Article  Google Scholar 

  5. Flach, P.: Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge University Press, New York (2012)

    Book  Google Scholar 

  6. Goodman, B., Flaxman, S.: European Union regulations on algorithmic decision-making and a “right to explanation”, June 2016. arXiv e-prints arXiv:1606.08813

  7. Harris, T.: Credit scoring using the clustered support vector machine. Expert Syst. Appl. 42(2), 741–750 (2015). https://doi.org/10.1016/j.eswa.2014.08.029

    Article  Google Scholar 

  8. Kaggle: Default of Credit Card Clients Dataset (2017). https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset

  9. Kaggle: Give Me Some Credit dataset (2017). https://www.kaggle.com/c/GiveMeSomeCredit

  10. Kaggle: Lending Club Loan dataset (2017). https://www.kaggle.com/wendykan/lending-club-loan-data

  11. Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)

    Article  Google Scholar 

  12. Li, Z., Tian, Y., Li, K., Zhou, F., Yang, W.: Reject inference in credit scoring using semi-supervised support vector machines. Expert Syst. Appl. 74(C), 105–114 (2017). https://doi.org/10.1016/j.eswa.2017.01.011

    Article  Google Scholar 

  13. Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42(10), 4621–4631 (2015). https://doi.org/10.1016/j.eswa.2015.02.001

    Article  Google Scholar 

  14. Martens, D., Huysmans, J., Setiono, R., Vanthienen, J., Baesens, B.: Rule extraction from support vector machines: an overview of issues and application in credit scoring. In: Diederich, J. (ed.) Rule Extraction from Support Vector Machines. SCI, pp. 33–63. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-75390-2_2

    Chapter  MATH  Google Scholar 

  15. Martens, D., Vanthienen, J., Verbeke, W., Baesens, B.: Performance of classification models from a user perspective. Decis. Support Syst. 51(4), 782–793 (2011). https://doi.org/10.1016/j.dss.2011.01.013

    Article  Google Scholar 

  16. Scikit-learn: Machine Learning in Python. (2017). http://scikit-learn.org/stable/index.html

  17. Serrano-Cinca, C., Gutiérrez-Nieto, B.N.: The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decis. Support Syst. 89(C), 113–122 (2016). https://doi.org/10.1016/j.dss.2016.06.014

    Article  Google Scholar 

  18. Szwabe, A.: Kernel and acquisition function setup for Bayesian optimization of gradient boosting hyperparameters. In: Nguyen, N.T., Hoang, D.H., Hong, T.-P., Pham, H., Trawiński, B. (eds.) ACIIDS 2018. LNCS (LNAI), vol. 10751, pp. 297–306. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75417-8_28

    Chapter  Google Scholar 

  19. Szwabe, A., Misiorek, P., Ciesielczyk, M.: Tensor-based modeling of temporal features for big data CTR estimation. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2017. CCIS, vol. 716, pp. 16–27. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58274-0_2

    Chapter  Google Scholar 

  20. Szwabe, A., Misiorek, P., Walkowiak, P.: Reflective relational learning for ontology alignment. In: Omatu, S., De Paz Santana, J.F., González, S.R., Molina, J.M., Bernardos, A.M., Rodríguez, J.M.C. (eds.) Distributed Computing and Artificial Intelligence. AISC, vol. 151, pp. 519–526. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28765-7_62

    Chapter  Google Scholar 

  21. University of California, Irvine (UCI), Machine Learning Repository: German Credit dataset (2017). https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)

  22. Wang, H., Xu, Q., Zhou, L.: Large unbalanced credit scoring using lasso-logistic regression ensemble. PloS one 10(2), e0117844 (2015)

    Article  Google Scholar 

  23. Yeh, I.C., Lien, C.H.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2), 2473–2480 (2009). https://doi.org/10.1016/j.eswa.2007.12.020

    Article  Google Scholar 

  24. Zieba, M., Hardle, W.K.: Beta-boosted ensemble for big credit scoring data. SFB 649 Discussion Paper 2016-052, SSRN, November 2016. https://doi.org/10.2139/ssrn.2875664, https://ssrn.com/abstract=2875664

Download references

Acknowledgments

This work was supported by the Polish National Science Centre, grant DEC-2011/01/D/ST6/06788, and by Poznan University of Technology under grant 04/45/DSPB/0185.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pawel Misiorek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Szwabe, A., Misiorek, P. (2018). Decision Trees as Interpretable Bank Credit Scoring Models. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety. BDAS 2018. Communications in Computer and Information Science, vol 928. Springer, Cham. https://doi.org/10.1007/978-3-319-99987-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99987-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99986-9

  • Online ISBN: 978-3-319-99987-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics