Machine Learning Model for Predicting Non-performing Agricultural Loans

  • Mohamed Ahmed ElnaggarEmail author
  • Mostafa Abed EL Azeem
  • Fahima A. Maghraby
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1153)


Extending agricultural loans to individuals is essential to support the agriculture sector and markets. One of the most important risks affecting the banking sector is the concept of credit risk. Predicting the probability of non-performing loans for an individual is a vital and beneficial role for banks to decrease credit risk and make the right decisions. These decisions are based on credit study and in accordance with generally accepted standards, loan payment history, and demographic data of the clients. The subject paper here is proposing an ensemble-based model, to enhance classification accuracy. For the building model, the dataset was gathered from an agricultural bank in Egypt. Egyptian credit dataset involves 112907 instances and 17 features that are used in the current study. Variable selections were used to select important features for the classification. Cross-classification has also been used with ten subsets. Classification methods have been applied with Logistics Regression (LR), k-nearest neighbors (KNN), Support Vector Machine (SVM), Decision Tree (DT) and Meta-classifier methods for training and testing toward the dataset. The outcome of the specified experiments showed that the accuracy of the ensemble method is the highly recommended one for individuals.


Data pre-processing Credit risk Loan Machine learning 


  1. 1.
    Chen, W.H., Shih, J.Y.: A study of Taiwan’s issuer credit rating systems using support vector machines. Expert Syst. Appl. 30, 427–435 (2006)CrossRefGoogle Scholar
  2. 2.
    European Central Bank: What are non-performing loans? Accessed 11 Dec 2019
  3. 3.
    Paireekreng, W., Choensawat, W.: An ensemble learning based model for real estate project classification. In: 6th International Conference on Applied Human Factors and Ergonomics and the Affiliated Conferences, vol. 3, pp. 3852–3859 (2015)Google Scholar
  4. 4.
    Zhang, Y., et al.: Predicting non-performing loan of business bank by multiple classifier fusion algorithms. J. Interdisc. Math. 19(4), 657–667 (2016)CrossRefGoogle Scholar
  5. 5.
    Goyal, A., Kaur, R.: Loan prediction using ensemble technique. Int. J. Adv. Res. Comput. Commun. Eng. 5(3), 523–526 (2016)Google Scholar
  6. 6.
    Okesola, O.J., et al.: An improved bank credit scoring model: a naïve Bayesian approach. In: 2017 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 228–233 (2017)Google Scholar
  7. 7.
    Soni, P.M., Paul, V.: A novel optimized classifier for the loan repayment capability prediction system. In: 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), pp. 23–28 (2019)Google Scholar
  8. 8.
    Zhao, W.: Research on the deep learning of the small sample data based on transfer learning. In: AIP Conference Proceedings, vol. 1864, p. 020018 (2017)Google Scholar
  9. 9.
    Maheswari, J.P.: Breaking the curse of small datasets in Machine Learning: Part 1. Accessed 10 Dec 2019
  10. 10.
    Hand, D.J., Vinciotti, V.: Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recogn. Lett. 24(9–10), 1555–1562 (2003)CrossRefGoogle Scholar
  11. 11.
    Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol. 3, pp. 41–46 (2001)Google Scholar
  12. 12.
    Patel, B., Rana, K.: A survey on decision tree algorithm for classification. Int. J. Eng. Dev. Res. (IJEDR) 2(1), 1–5 (2014)Google Scholar
  13. 13.
    Gestel, V., et al.: A support vector machine approach to credit scoring. Bank en Financiewezen 2, 73–82 (2003)Google Scholar
  14. 14.
    Wang, G., et al.: A comparative assessment of ensemble learning for credit scoring. Expert Syst. Appl. 38(1), 223–230 (2011)CrossRefGoogle Scholar
  15. 15.
    Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996). Scholar
  16. 16.
    Lin, W.-Z., et al.: iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS ONE 6, 9 (2011)CrossRefGoogle Scholar
  17. 17.
    Khalilia, M., et al.: Predicting disease risks from highly imbalanced data using random forest. BMC Med. Inf. Decis. Making 11(1), 51 (2011)CrossRefGoogle Scholar
  18. 18.
    Mohan, A., et al.: Automatic classification of protein structures using physicochemical parameters. Interdisc. Sci.: Comput. Life Sci. 6(3), 176–186 (2014)CrossRefGoogle Scholar
  19. 19.
    Seera, M., Lim, C.P.: A hybrid intelligent system for medical data classification. Expert Syst. Appl. 41(5), 2239–2249 (2014)CrossRefGoogle Scholar
  20. 20.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). Scholar
  21. 21.
    Zajkowski, A., et al.: Data Normalization. U.S. Patent US20030110250 (2003)Google Scholar
  22. 22.
    Zhou, Z.-H.: Ensemble Methods: Foundations and Algorithms. Chapman & Hall/CRC, Boca Raton (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Mohamed Ahmed Elnaggar
    • 1
    Email author
  • Mostafa Abed EL Azeem
    • 1
  • Fahima A. Maghraby
    • 1
  1. 1.College of Computing and Information TechnologyArab Academy for Science, Technology and Maritime TransportCairoEgypt

Personalised recommendations