Credit scoring of micro and small entrepreneurial firms in China

Abstract

It is difficult for micro and small entrepreneurial firms (MSEFs) to access external financing from formal financial institutions because financial institutions cannot obtain sufficient and reliable credit information about MSEFs. With the development of the internet and data collection technologies, more and more data can be accessed from different sources, and logistic regression model often suffers from bad performance. In this paper, we propose a credit scoring model using composite MCP logistic regression and firstly apply the proposed method to predict the probability of default of MSEFs in China. The proposed method can carry out parameter estimation and automatic bi-level selection of variables simultaneously with respect to the grouping structure of variables. Empirical results of MSEFs with complex grouping structure data have shown that the proposed method outperforms forward stepwise logistic regression, MCP logistic regression and group MCP logistic regression. In addition, we find that gross salary, tax paid, bank and injury insurance information of MSEFs are the most important factors to predict the probability of default.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

References

  1. Anton, S. G. (2019). Leverage and firm growth: An empirical investigation of gazelles from emerging Europe. International Entrepreneurship and Management Journal, 15(1), 209–232.

    Article  Google Scholar 

  2. Baesens, B., Setiono, R., Mues, C., & Vanthienen, J. (2003a). Using neural network rule extraction and decision tables for credit-risk evaluation. Management Science, 49(3), 312–329.

    Article  Google Scholar 

  3. Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., & Vanthienen, J. (2003b). Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society, 54(6), 627–635.

    Article  Google Scholar 

  4. Bahnsen, A. C., Aouada, D., & Ottersten, B. (2014). Example-dependent cost-sensitive logistic regression for credit scoring. In 2014 13th International Conference on Machine Learning and Applications (pp. 263–269). IEEE.

  5. Bakin, S. (1999). Adaptive regression and model selection in data mining problems. Canberra: The Australian National University.

    Google Scholar 

  6. Beck, T., & Demirguc-Kunt, A. (2006). Small and medium-size enterprises: Access to finance as a growth constraint. Journal of Banking and Finance, 30(11), 2931–2943.

    Article  Google Scholar 

  7. Breheny, P. (2015). The group exponential lasso for bi-level variable selection. Biometrics, 71(3), 731–740.

    Article  Google Scholar 

  8. Breheny, P., & Huang, J. (2009). Penalized methods for bi-level variable selection. Statistics and its interface, 2(3), 369–380.

    Article  Google Scholar 

  9. Breheny, P., & Huang, J. (2015). Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25(2), 173–187.

    Article  Google Scholar 

  10. De la Torre, A., Martinez Peria, M., & Schmukler, S. (2010). Bank involvement with SMEs: Beyond relationship lending. Journal of Banking and Finance, 34(9), 2280–2293.

    Article  Google Scholar 

  11. Fernandes, G. B., & Artes, R. (2016). Spatial dependence in credit risk and its improvement in credit scoring. European Journal of Operational Research, 249(2), 517–524.

    Article  Google Scholar 

  12. Gicić, A., & Subasi, A. (2019). Credit scoring for a microcredit data set using the synthetic minority oversampling technique and ensemble classifiers. Expert Systems, 36(2), e12363.

    Article  Google Scholar 

  13. Huang, J., Breheny, P., & Ma, S. (2012). A selective review of group selection in high-dimensional models. Statistical science: a review journal of the Institute of Mathematical Statistics, 27(4), 481–499.

    Article  Google Scholar 

  14. Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136.

    Article  Google Scholar 

  15. Ma, S., Huang, J., & Song, X. (2011a). Integrative analysis and variable selection with multiple high-dimensional data sets. Biostatistics, 12(4), 763–775.

    Article  Google Scholar 

  16. Ma, S., Huang, J., Wei, F., Xie, Y., & Fang, K. (2011b). Integrative analysis of multiple cancer prognosis studies with gene expression measurements. Statistics in Medicine, 30(28), 3361–3371.

    Article  Google Scholar 

  17. Maldonado, S., Pérez, J., & Bravo, C. (2017). Cost-based feature selection for support vector machines: An application in credit scoring. European Journal of Operational Research, 261(2), 656–665.

    Article  Google Scholar 

  18. Martin, D. (1977). Early warning of bank failure: A logit regression approach. Journal of Banking and Finance, 1(3), 249–276.

    Article  Google Scholar 

  19. Redondo, M., & Camarero, C. (2019). Social Capital in University Business Incubators: Dimensions, antecedents and outcomes. International Entrepreneurship and Management Journal, 15(2), 599–624.

    Article  Google Scholar 

  20. Sohn, S. Y., & Kim, J. W. (2012). Decision tree-based technology credit scoring for start-up firms: Korean case. Expert Systems with Applications, 39(4), 4007–4012.

    Article  Google Scholar 

  21. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.

    Google Scholar 

  22. Tsaih, R., Liu, Y. J., Liu, W., & Lien, Y. L. (2004). Credit scoring system for small business loans. Decision Support Systems, 38(1), 91–99.

    Article  Google Scholar 

  23. Van Rijnsoever, F. J., Van Weele, M. A., & Eveleens, C. P. (2017). Network brokers or hit makers? Analyzing the influence of incubation on start-up investments. International Entrepreneurship and Management Journal, 13(2), 605–629.

    Article  Google Scholar 

  24. Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.

    Article  Google Scholar 

  25. Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2), 894–942.

    Article  Google Scholar 

  26. Zhang, L., Ray, H., Priestley, J., & Tan, S. (2019). A descriptive study of variable discretization and cost-sensitive logistic regression on imbalanced credit data. Journal of Applied Statistics, 1–14.

Download references

Funding

This paper is supported by the Social Science Fund of Zhejiang Province, China, through the programme ‘Application Research on Establishing Prosperity Monitoring System of Small and Micro Enterprises based on Big Data(20NDJC193YB)’.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Chenlu Zheng.

Ethics declarations

Conflicts of interest/competing interests

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Fang, K., Zheng, C. et al. Credit scoring of micro and small entrepreneurial firms in China. Int Entrep Manag J 17, 29–43 (2021). https://doi.org/10.1007/s11365-020-00685-8

Download citation

Keywords

  • Credit scoring
  • Micro and small entrepreneurial firms (MSEFs)
  • Composite MCP
  • Bi-level variable selection
  • Logistic regression