Decision Trees as Interpretable Bank Credit Scoring Models

Szwabe, Andrzej; Misiorek, Pawel

doi:10.1007/978-3-319-99987-6_16

Decision Trees as Interpretable Bank Credit Scoring Models

Andrzej Szwabe¹³ &
Pawel Misiorek¹³

Conference paper
First Online: 31 August 2018

1079 Accesses
3 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 928))

Abstract

We evaluate several approaches to classification of loan applications that provide their final results in the form of a single decision tree, i.e., in the form widely regarded as interpretable by humans. We apply state-of-the-art credit scoring-oriented classification algorithms, such as logistic regression, gradient boosting decision trees and random forests, as components of the proposed algorithms of decision tree building. We use four real-world loan default prediction data sets of different sizes. We evaluate the proposed methods using the area under the receiver operating characteristic curve (AUC) but we also measure the models’ interpretability. We verify the significance of differences between AUC values observed when using the compared techniques by measuring Friedman’s statistic and performing Nemenyi’s post-hoc test.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Baesens, B., Setiono, R., Mues, C., Vanthienen, J.: Using neural network rule extraction and decision tables for credit-risk evaluation. Manage. Sci. 49(3), 312–329 (2003). https://doi.org/10.1287/mnsc.49.3.312.12739
Article MATH Google Scholar
Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39(3), 3446–3453 (2012). https://doi.org/10.1016/j.eswa.2011.09.033
Article Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006). http://dl.acm.org/citation.cfm?id=1248547.1248548
MathSciNet MATH Google Scholar
Domingos, P.: Knowledge discovery via multiple models. Intell. Data Anal. 2(3), 187–202 (1998). http://dl.acm.org/citation.cfm?id=2639331.2639334
Article Google Scholar
Flach, P.: Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge University Press, New York (2012)
Book Google Scholar
Goodman, B., Flaxman, S.: European Union regulations on algorithmic decision-making and a “right to explanation”, June 2016. arXiv e-prints arXiv:1606.08813
Harris, T.: Credit scoring using the clustered support vector machine. Expert Syst. Appl. 42(2), 741–750 (2015). https://doi.org/10.1016/j.eswa.2014.08.029
Article Google Scholar
Kaggle: Default of Credit Card Clients Dataset (2017). https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset
Kaggle: Give Me Some Credit dataset (2017). https://www.kaggle.com/c/GiveMeSomeCredit
Kaggle: Lending Club Loan dataset (2017). https://www.kaggle.com/wendykan/lending-club-loan-data
Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)
Article Google Scholar
Li, Z., Tian, Y., Li, K., Zhou, F., Yang, W.: Reject inference in credit scoring using semi-supervised support vector machines. Expert Syst. Appl. 74(C), 105–114 (2017). https://doi.org/10.1016/j.eswa.2017.01.011
Article Google Scholar
Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42(10), 4621–4631 (2015). https://doi.org/10.1016/j.eswa.2015.02.001
Article Google Scholar
Martens, D., Huysmans, J., Setiono, R., Vanthienen, J., Baesens, B.: Rule extraction from support vector machines: an overview of issues and application in credit scoring. In: Diederich, J. (ed.) Rule Extraction from Support Vector Machines. SCI, pp. 33–63. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-75390-2_2
Chapter MATH Google Scholar
Martens, D., Vanthienen, J., Verbeke, W., Baesens, B.: Performance of classification models from a user perspective. Decis. Support Syst. 51(4), 782–793 (2011). https://doi.org/10.1016/j.dss.2011.01.013
Article Google Scholar
Scikit-learn: Machine Learning in Python. (2017). http://scikit-learn.org/stable/index.html
Serrano-Cinca, C., Gutiérrez-Nieto, B.N.: The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decis. Support Syst. 89(C), 113–122 (2016). https://doi.org/10.1016/j.dss.2016.06.014
Article Google Scholar
Szwabe, A.: Kernel and acquisition function setup for Bayesian optimization of gradient boosting hyperparameters. In: Nguyen, N.T., Hoang, D.H., Hong, T.-P., Pham, H., Trawiński, B. (eds.) ACIIDS 2018. LNCS (LNAI), vol. 10751, pp. 297–306. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75417-8_28
Chapter Google Scholar
Szwabe, A., Misiorek, P., Ciesielczyk, M.: Tensor-based modeling of temporal features for big data CTR estimation. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2017. CCIS, vol. 716, pp. 16–27. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58274-0_2
Chapter Google Scholar
Szwabe, A., Misiorek, P., Walkowiak, P.: Reflective relational learning for ontology alignment. In: Omatu, S., De Paz Santana, J.F., González, S.R., Molina, J.M., Bernardos, A.M., Rodríguez, J.M.C. (eds.) Distributed Computing and Artificial Intelligence. AISC, vol. 151, pp. 519–526. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28765-7_62
Chapter Google Scholar
University of California, Irvine (UCI), Machine Learning Repository: German Credit dataset (2017). https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)
Wang, H., Xu, Q., Zhou, L.: Large unbalanced credit scoring using lasso-logistic regression ensemble. PloS one 10(2), e0117844 (2015)
Article Google Scholar
Yeh, I.C., Lien, C.H.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2), 2473–2480 (2009). https://doi.org/10.1016/j.eswa.2007.12.020
Article Google Scholar
Zieba, M., Hardle, W.K.: Beta-boosted ensemble for big credit scoring data. SFB 649 Discussion Paper 2016-052, SSRN, November 2016. https://doi.org/10.2139/ssrn.2875664, https://ssrn.com/abstract=2875664

Download references

Acknowledgments

This work was supported by the Polish National Science Centre, grant DEC-2011/01/D/ST6/06788, and by Poznan University of Technology under grant 04/45/DSPB/0185.

Author information

Authors and Affiliations

Institute of Control, Robotics and Information Engineering, Poznan University of Technology, ul. Piotrowo 3a, 60-965, Poznan, Poland
Andrzej Szwabe & Pawel Misiorek

Authors

Andrzej Szwabe
View author publications
You can also search for this author in PubMed Google Scholar
Pawel Misiorek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pawel Misiorek .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Dariusz Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Paweł Kasprowski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Bożena Małysiak-Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Daniel Kostrzewa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Szwabe, A., Misiorek, P. (2018). Decision Trees as Interpretable Bank Credit Scoring Models. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety. BDAS 2018. Communications in Computer and Information Science, vol 928. Springer, Cham. https://doi.org/10.1007/978-3-319-99987-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-99987-6_16
Published: 31 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99986-9
Online ISBN: 978-3-319-99987-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics