Using network features for credit scoring in microfinance


The usage of non-traditional data in credit scoring, from microfinance institutions, is very useful when trying to address the problem, very common in emerging markets, of the lack of a verifiable customers’ credit history. In this context, this paper relies on data acquired from smartphones in a loan classification problem. We conduct a set of experiments concerning feature selection, strategies to deal with imbalanced datasets and algorithm choice, to define a baseline model. This model is, then, compared to others adding network features to the original ones. For that comparison, we generate a network that links a given user to its phone book contacts which are users of a given mobile application, taking into account the ethics and privacy concerns involved, and use some feature extraction techniques, such as the introduction of centrality measures and the definition of node embeddings, in order to capture certain aspects of the network’s topology. Several node embedding algorithms are tested, but only Node2Vec proves to be significantly better than the baseline model, applying Friedman’s post hoc tests. This node embedding algorithm outperforms all the other, representing a relative improvement, in comparison with the baseline model, of 5.74% on the mean accuracy, 7.13% on the area under the Receiver Operating Characteristic curve and 30.83% on the Kolmogorov–Smirnov statistic scores. This method, therefore, proves to be very promising when trying to discriminate between “good” and “bad” customers, in credit scoring classification problems.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. 1.

    How digital finance could boost growth in emerging economies,, 17 Feb 2020

  2. 2.

    How M-Shwari Works: The story so far,, 17 Feb 2020

  3. 3.

    Ruiz, S., Gomes, P., Rodrigues,L. , Gama, J.: Credit Scoring in Microfinance Using Non-traditional Data. In: EPIA conference on artificial intelligence. Springer, Cham (2017)

  4. 4.

    Schreiner, M.: Credit scoring for microfinance: can it work? J Microfinance ESR Rev 2(2), 6 (2000)

    Google Scholar 

  5. 5.

    Bumacov, V., Ashta, A., Singh, P.: The use of credit scoring in microfinance institutions and their outreach. Strateg Change 23(7–8), 401–413 (2014)

    Article  Google Scholar 

  6. 6.

    Van Gool, J., Baesens, B., Sercu, P., Verbeke, W.: An analysis of the applicability of credit scoring for microfinance. Belgium. University of Southampton, Southampton, United Kingdom, Katholieke Universiteit Leuven Leuven (2009)

  7. 7.

    Sousa, M.R., Gama, J., Brandão, E.: A new dynamic modeling framework for credit risk assessment. Expert Syst Appl 45, 341–351 (2016)

    Article  Google Scholar 

  8. 8.

    San Pedro, J., Proserpio, D., Oliver, N.: MobiScore: towards universal credit scoring from mobile phone data. In: International conference on user modeling. Adaptation, and personalization. Springer, Cham, pp. 195–207 (2015)

  9. 9.

    Björkegren, D., Grissen D.: Behavior revealed in mobile phone usage predicts loan repayment., arXiv preprint arXiv:1712.05840 (2017)

  10. 10.

    Wei, Y., Yildirim, P., Van den Bulte, C., Dellarocas, C.: Credit scoring with social network data. Mark Sci 35(2), 234–258 (2016)

    Article  Google Scholar 

  11. 11.

    Misheva, B. H., Giudici P., Pediroda V.: Network-based models to improve credit scoring accuracy., In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, pp. 623–630 (2018)

  12. 12.

    María, Ó., Bravo, C., Sarraute C., Baesens B., and Vanthienen, J., “Credit scoring for good: Enhancing financial inclusion with smartphone-based microlending.” arXiv preprint arXiv:2001.10994 (2020)

  13. 13.

    Siddiqi, N.: Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring, 3rd edn. Wiley, New Jersey (2012)

    Google Scholar 

  14. 14.

    Shichen, X.: Scorecard Development in Python, GitHub repository, GitHub, (2020)

  15. 15.

    Newman, M.: Networks. Oxford University Press, New York (2018)

    Google Scholar 

  16. 16.

    Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of social representations., In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710 (2014)

  17. 17.

    Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International conference on Knowledge discovery and data mining. pp. 855–864 (2016)

  18. 18.

    Janez, D.: Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  19. 19.

    Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200), 675–701 (1937)

    Article  Google Scholar 

  20. 20.

    Bioinformatics Laboratory, FRI UL.: Orange Development in Python, GitHub repository, GitHub, (2020)

  21. 21.

    Jiezhong, Q., Yuxiao, D., Hao, M., Jian, L., Kuansan, W., Jie, T.: Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In: Proceedings of the eleventh ACM international conference on web search and data mining, pp 459–467 (2018)

  22. 22.

    Dennis, L., Sun, Cedric F.: Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6201–6205 (2014)

  23. 23.

    Cao, S., Lu, W., Xu, Q.: Grarep: Learning graph representations with global structural information. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 891–900 (2015)

  24. 24.

    Li, J., Wu, L., Guo, R., Liu, C., Liu, H.: Multi-level network embedding with boosted low-rank matrix approximation. In: Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, pp 49–56 (2019)

  25. 25.

    Benedek, R.: KarateClub Development in Python, GitHub repository, GitHub, (2020)

  26. 26.

    Nemenyi, P.: Distribution-free multiple comparisons. Biometrics 18(2), 263 (1962)

    Google Scholar 

  27. 27.

    Dunn, O.J.: Multiple comparisons among means. J Am Stat Assoc 56(293), 52–64 (1961)

    MathSciNet  Article  Google Scholar 

  28. 28.

    Gama, J., Carvalho, A.D.L., Faceli, K., Lorena, A.C., Oliveira,M.: Extração de Conhecimento de Dados. data mining (3rd Edition) Silabo, (2015)

Download references

Author information



Corresponding author

Correspondence to Paulo Paraíso.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is a result of the project Risk Assessment for Microfinance, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Paraíso, P., Ruiz, S., Gomes, P. et al. Using network features for credit scoring in microfinance. Int J Data Sci Anal (2021).

Download citation


  • Credit scoring
  • Microfinance
  • Networks
  • Feature extraction
  • Node embeddings