The usage of non-traditional data in credit scoring, from microfinance institutions, is very useful when trying to address the problem, very common in emerging markets, of the lack of a verifiable customers’ credit history. In this context, this paper relies on data acquired from smartphones in a loan classification problem. We conduct a set of experiments concerning feature selection, strategies to deal with imbalanced datasets and algorithm choice, to define a baseline model. This model is, then, compared to others adding network features to the original ones. For that comparison, we generate a network that links a given user to its phone book contacts which are users of a given mobile application, taking into account the ethics and privacy concerns involved, and use some feature extraction techniques, such as the introduction of centrality measures and the definition of node embeddings, in order to capture certain aspects of the network’s topology. Several node embedding algorithms are tested, but only Node2Vec proves to be significantly better than the baseline model, applying Friedman’s post hoc tests. This node embedding algorithm outperforms all the other, representing a relative improvement, in comparison with the baseline model, of 5.74% on the mean accuracy, 7.13% on the area under the Receiver Operating Characteristic curve and 30.83% on the Kolmogorov–Smirnov statistic scores. This method, therefore, proves to be very promising when trying to discriminate between “good” and “bad” customers, in credit scoring classification problems.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
How digital finance could boost growth in emerging economies, www.mckinsey.com/featured-insights/employment-and-growth/how-digital-finance-could-boost-growth-in-emerging-economies, 17 Feb 2020
How M-Shwari Works: The story so far, www.cgap.org/research/publication/how-m-shwari-works-story-so-far, 17 Feb 2020
Ruiz, S., Gomes, P., Rodrigues,L. , Gama, J.: Credit Scoring in Microfinance Using Non-traditional Data. In: EPIA conference on artificial intelligence. Springer, Cham (2017)
Schreiner, M.: Credit scoring for microfinance: can it work? J Microfinance ESR Rev 2(2), 6 (2000)
Bumacov, V., Ashta, A., Singh, P.: The use of credit scoring in microfinance institutions and their outreach. Strateg Change 23(7–8), 401–413 (2014)
Van Gool, J., Baesens, B., Sercu, P., Verbeke, W.: An analysis of the applicability of credit scoring for microfinance. Belgium. University of Southampton, Southampton, United Kingdom, Katholieke Universiteit Leuven Leuven (2009)
Sousa, M.R., Gama, J., Brandão, E.: A new dynamic modeling framework for credit risk assessment. Expert Syst Appl 45, 341–351 (2016)
San Pedro, J., Proserpio, D., Oliver, N.: MobiScore: towards universal credit scoring from mobile phone data. In: International conference on user modeling. Adaptation, and personalization. Springer, Cham, pp. 195–207 (2015)
Björkegren, D., Grissen D.: Behavior revealed in mobile phone usage predicts loan repayment., arXiv preprint arXiv:1712.05840 (2017)
Wei, Y., Yildirim, P., Van den Bulte, C., Dellarocas, C.: Credit scoring with social network data. Mark Sci 35(2), 234–258 (2016)
Misheva, B. H., Giudici P., Pediroda V.: Network-based models to improve credit scoring accuracy., In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, pp. 623–630 (2018)
María, Ó., Bravo, C., Sarraute C., Baesens B., and Vanthienen, J., “Credit scoring for good: Enhancing financial inclusion with smartphone-based microlending.” arXiv preprint arXiv:2001.10994 (2020)
Siddiqi, N.: Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring, 3rd edn. Wiley, New Jersey (2012)
Shichen, X.: Scorecard Development in Python, GitHub repository, GitHub, https://github.com/ShichenXie/scorecardpy (2020)
Newman, M.: Networks. Oxford University Press, New York (2018)
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of social representations., In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710 (2014)
Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International conference on Knowledge discovery and data mining. pp. 855–864 (2016)
Janez, D.: Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7, 1–30 (2006)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200), 675–701 (1937)
Bioinformatics Laboratory, FRI UL.: Orange Development in Python, GitHub repository, GitHub, https://github.com/biolab/orange3 (2020)
Jiezhong, Q., Yuxiao, D., Hao, M., Jian, L., Kuansan, W., Jie, T.: Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In: Proceedings of the eleventh ACM international conference on web search and data mining, pp 459–467 (2018)
Dennis, L., Sun, Cedric F.: Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6201–6205 (2014)
Cao, S., Lu, W., Xu, Q.: Grarep: Learning graph representations with global structural information. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 891–900 (2015)
Li, J., Wu, L., Guo, R., Liu, C., Liu, H.: Multi-level network embedding with boosted low-rank matrix approximation. In: Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, pp 49–56 (2019)
Benedek, R.: KarateClub Development in Python, GitHub repository, GitHub, https://github.com/benedekrozemberczki/KarateClub (2020)
Nemenyi, P.: Distribution-free multiple comparisons. Biometrics 18(2), 263 (1962)
Dunn, O.J.: Multiple comparisons among means. J Am Stat Assoc 56(293), 52–64 (1961)
Gama, J., Carvalho, A.D.L., Faceli, K., Lorena, A.C., Oliveira,M.: Extração de Conhecimento de Dados. data mining (3rd Edition) Silabo, (2015)
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is a result of the project Risk Assessment for Microfinance, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF)
About this article
Cite this article
Paraíso, P., Ruiz, S., Gomes, P. et al. Using network features for credit scoring in microfinance. Int J Data Sci Anal (2021). https://doi.org/10.1007/s41060-021-00243-7
- Credit scoring
- Feature extraction
- Node embeddings