Default avoidance on credit card portfolios using accounting, demographical and exploratory factors: decision making based on machine learning (ML) techniques

  • Nikolaos SariannidisEmail author
  • Stelios Papadakis
  • Alexandros Garefalakis
  • Christos Lemonakis
  • Tsioptsia Kyriaki-Argyro
S.I.: BALCOR-2017


Effective and thorough credit-risk management is a key factor for lending institutions, as significant financial losses can arise from the borrowers’ default. Consequently, machine learning methods can measure and analyze credit risk objectively when at the same time they face increasingly attention. This study analyzes default payment data from a credit cards’ portfolio containing some 30,000 clients from Taiwan with twenty-three attributes and with no missing information. We compare prediction accuracy of seven classification methods used, i.e. KNN, Logistic Regression, Naïve Bayes, Decision Trees, Random Forest, SVC, and Linear SVC. The results indicate that only few out of most of the typical variables used can adequately analyze default characteristics in terms of lending decisions. The results provide effective feedback to credit evaluators, lending institutions and business analysts for in-depth analysis. Also, they mention to the importance of the precautionary borrowing techniques to be used to better understand credit-card borrowers’ behavior, along with specific accounting, historical and demographical characteristics.


Debt Credit card portfolios Machine learning (ML) methods Explanatory factors Accounting data Demographic data Credit history data 



The current publication is based on the following dataset: Lichman (Lichman 2013). We would also like to thank the Laboratory of Artificial Intelligence Systems and Computer Architectures of the Technological Educational Institute of Crete for providing the computer power to complete extensive experimental results for the needs of this work


  1. Aha, D. (1992). Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms. International Journal of Man–Machine Studies, 36(2), 267–287.CrossRefGoogle Scholar
  2. Ajay, V., & Shomona, G. J. (2016). Prediction of credit-card defaulters: a comparative study on performance of classifiers. International Journal of Computer Applications (0975–8887), 145(7), 36–41.CrossRefGoogle Scholar
  3. Ben-Hur, A., Horn, D., Siegelmann, H. T., & Vapnik, V. (2001). Support vector clustering. Journal of Machine Learning Research, 2, 125–137.Google Scholar
  4. Bhaduri, A. (2009). Credit scoring using artificial immune system algorithms: a comparative study. In Proceedings of the world congress on nature and biologically inspired computing NaBIC2009, Coimbatore (pp. 1540–1543).Google Scholar
  5. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRefGoogle Scholar
  6. Cheng D., Zhang S., Deng Z., Zhu Y., & Zong M. (2014). kNN algorithm with data-driven k value. In: Luo X., Yu J. X., & Li Z. (Eds.), Advanced data mining and applications. ADMA 2014. Lecture Notes in Computer Science (Vol. 8933). Berlin: Springer.Google Scholar
  7. Davis, R. H., Edelman, D. B., & Gammerman, A. J. (1992). Machine-learning algorithms for credit-card applications. Journal of Management Mathematics, 4(1), 43–51.Google Scholar
  8. Dimitras, A., Papadakis, S., & Garefalakis, A. (2017). Evaluation of empirical attributes for credit risk forecasting from numerical data. Investment Management and Financial Innovations, 14(1), 9–18. Scholar
  9. Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. In J. Shavlik (Ed.), Proceedings of the fifteenth international conference on machine learning, Madison, WI. San Francisco: Morgan Kaufmann (pp. 144–151).Google Scholar
  10. Frank, E., & Hall, M. (2001). A simple approach to ordinal classification. In L. de Raedt, & P. A. Flach (Eds.), Proceedings of the twelfth European conference on machine learning, Freiburg, Germany. Berlin: Springer (pp. 145–156).Google Scholar
  11. Hamori, S., Kawai, M., Kume, T., Murakami, Y., & Watanabe, Y. (2018). Ensemble learning or deep learning? Application to default risk analysis. Journal of Risk and Financial Management, 11(1), 12. Scholar
  12. Hand, D. J., & Henley, W. E. (1996). A k-nearest-neighbour classifier for assessing consumer credit risk. The Statistician, 45(1), 77–95.CrossRefGoogle Scholar
  13. He, J., Liu, X., Shi, Y., Xu, W., & Yan, N. (2004). Classifications of credit cardholder behavior by using fuzzy linear programming. International Journal of Information Technology and Decision Making, 3(4), 633–650.CrossRefGoogle Scholar
  14. Jenhani, I., Nahla, B. A., & Ziedm, E. (2008). Decision trees as possibilistic classifiers (Special Section on Choquet Integration in honor of Gustave Choquet (1915–2006) and Special Section on Nonmonotonic and Uncertain Reasoning). International Journal of Approximate Reasoning, 48(3), 784–807.CrossRefGoogle Scholar
  15. Khandani, A. E., Kim, A. J., & Lo, A. W. (2010). Consumer credit risk models via machine-learning Algorithms. AFA 2011 Denver Meetings Paper.
  16. Krichene, A. (2017). Using a naive Bayesian classifier methodology for loan risk assessment evidence from a Tunisian commercial bank. Journal of Economics, Finance and Administrative Science, 22(42), 3–24.CrossRefGoogle Scholar
  17. Landwehr, N., Hall, M., & Frank, E. (2003). Logistic model trees. In N. Lavrac, D. Gamberger, L. Todorovski, & H. Blockeel (Eds.), Proceedings of the fourteenth European conference on machine learning, Cavtat-Dubrovnik, Croatia. Berlin: Springer (pp. 241–252).Google Scholar
  18. Lee, T. S., Chiu, C. C., Chou, Y. C., & Lu, C. J. (2006). Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Computational Statistics and Data Analysis, 50, 1113–1130.CrossRefGoogle Scholar
  19. Lichman, M. (2013). UCI Machine Learning Repository []. Irvine: University of California, School of Information and Computer Science. The original dataset can be found at the UCI Machine Learning Repository, i.e.
  20. Makalic, E., & Schmidt, D. F. (2010). Review of modern logistic regression methods with application to small and medium sample size problems. In Li, J. (Eds.), AI 2010: advances in artificial intelligence. AI 2010. Lecture Notes in Computer Science, (Vol. 6464). Berlin: Springer.Google Scholar
  21. Marinakis, Y., Marinaki, M., Doumpos, M., & Zopounidis, C. (2009). Ant colony and particle swarm optimization for financial classification problems. Expert Systems with Applications, 36, 10604–10611.CrossRefGoogle Scholar
  22. Neema, S., & Soibam, B. (2017). The comparison of machine learning methods to achieve most cost-effective prediction for credit card default. Journal of Management Science and Business Intelligence, 2(2), 36–41.Google Scholar
  23. Peng, Y., Kou, G., Chen, Z., & Shi, Y. (2004). Cross-validation and ensemble analyses on multiple-criteria linear programming classification for credit cardholder behavior, Lecture Notes in Computer Science, ICCS 2004 (Vol. 3039, pp. 931–939).Google Scholar
  24. Quinlan, J., Rajendra, G., & Castro, D. (1998). Bank collateralised loan obligations: From 0 to 60 in less than 2 years? Merrill Lynch, Global Securities Research & Economics Group, March.Google Scholar
  25. Ramoni, M., & Sebastiani, P. (2001). Robust Bayes classifiers. Artificial Intelligence, 125(1–2), 209–226.CrossRefGoogle Scholar
  26. Shen, A., Tong, R., & Deng, Y. (2007). Application of classification models on credit card fraud detection. In Proceedings of the international conference on service systems and service management, Chengdu (pp. 1–4).Google Scholar
  27. Shi, Y., Peng, Y., Kou, G., & Chen, Z. (2005). Classifying credit card accounts for business intelligence and decision making: A multiple-criteria quadratic programming approach. International Journal of Information Technology and Decision Making, 4(4), 581–599.CrossRefGoogle Scholar
  28. Shomona, J. G., & Ramani, R. G. (2011). Discovery of knowledge patterns in clinical data through data mining algorithms: Multi-class categorization of breast tissue data. International Journal of Computer Applications, 32(7), 46–53.Google Scholar
  29. Srinivasan, V., & Kim, Y. H. (1987). Credit granting: A comparative analysis of classification procedures. The Journal of Finance, 42(3), 665–681.CrossRefGoogle Scholar
  30. Stone, M. (1974). Cross-validation choice and assessment of statistical predictions. Journal of the Royal Statistical Society B, 36, 111–147.Google Scholar
  31. Watanabe, C. Y. V., Ribeiro, M. X., Traina, C., & Traina, A. J. M. (2011). SACMiner: A new classification method based on statistical association rules to mine medical images. In: J. Filipe, & J. Cordeiro (Eds.), Enterprise information systems. ICEIS 2010. Lecture Notes in Business Information Processing (Vol. 73). Berlin: Springer.Google Scholar
  32. Yeh, I.-C., & Lien, C. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2, Part 1), 2473–2480.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Nikolaos Sariannidis
    • 1
    Email author
  • Stelios Papadakis
    • 2
  • Alexandros Garefalakis
    • 2
  • Christos Lemonakis
    • 2
  • Tsioptsia Kyriaki-Argyro
    • 3
  1. 1.Department of Finance and AccountingWestern Macedonia University οf Applied SciencesKozaniGreece
  2. 2.Department of Business AdministrationTechnological Educational Institute of CreteHeraklionGreece
  3. 3.Department of Accounting and FinanceWestern Macedonia University οf Applied SciencesKozaniGreece

Personalised recommendations