It is difficult for micro and small entrepreneurial firms (MSEFs) to access external financing from formal financial institutions because financial institutions cannot obtain sufficient and reliable credit information about MSEFs. With the development of the internet and data collection technologies, more and more data can be accessed from different sources, and logistic regression model often suffers from bad performance. In this paper, we propose a credit scoring model using composite MCP logistic regression and firstly apply the proposed method to predict the probability of default of MSEFs in China. The proposed method can carry out parameter estimation and automatic bi-level selection of variables simultaneously with respect to the grouping structure of variables. Empirical results of MSEFs with complex grouping structure data have shown that the proposed method outperforms forward stepwise logistic regression, MCP logistic regression and group MCP logistic regression. In addition, we find that gross salary, tax paid, bank and injury insurance information of MSEFs are the most important factors to predict the probability of default.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Anton, S. G. (2019). Leverage and firm growth: An empirical investigation of gazelles from emerging Europe. International Entrepreneurship and Management Journal, 15(1), 209–232.
Baesens, B., Setiono, R., Mues, C., & Vanthienen, J. (2003a). Using neural network rule extraction and decision tables for credit-risk evaluation. Management Science, 49(3), 312–329.
Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., & Vanthienen, J. (2003b). Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society, 54(6), 627–635.
Bahnsen, A. C., Aouada, D., & Ottersten, B. (2014). Example-dependent cost-sensitive logistic regression for credit scoring. In 2014 13th International Conference on Machine Learning and Applications (pp. 263–269). IEEE.
Bakin, S. (1999). Adaptive regression and model selection in data mining problems. Canberra: The Australian National University.
Beck, T., & Demirguc-Kunt, A. (2006). Small and medium-size enterprises: Access to finance as a growth constraint. Journal of Banking and Finance, 30(11), 2931–2943.
Breheny, P. (2015). The group exponential lasso for bi-level variable selection. Biometrics, 71(3), 731–740.
Breheny, P., & Huang, J. (2009). Penalized methods for bi-level variable selection. Statistics and its interface, 2(3), 369–380.
Breheny, P., & Huang, J. (2015). Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25(2), 173–187.
De la Torre, A., Martinez Peria, M., & Schmukler, S. (2010). Bank involvement with SMEs: Beyond relationship lending. Journal of Banking and Finance, 34(9), 2280–2293.
Fernandes, G. B., & Artes, R. (2016). Spatial dependence in credit risk and its improvement in credit scoring. European Journal of Operational Research, 249(2), 517–524.
Gicić, A., & Subasi, A. (2019). Credit scoring for a microcredit data set using the synthetic minority oversampling technique and ensemble classifiers. Expert Systems, 36(2), e12363.
Huang, J., Breheny, P., & Ma, S. (2012). A selective review of group selection in high-dimensional models. Statistical science: a review journal of the Institute of Mathematical Statistics, 27(4), 481–499.
Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136.
Ma, S., Huang, J., & Song, X. (2011a). Integrative analysis and variable selection with multiple high-dimensional data sets. Biostatistics, 12(4), 763–775.
Ma, S., Huang, J., Wei, F., Xie, Y., & Fang, K. (2011b). Integrative analysis of multiple cancer prognosis studies with gene expression measurements. Statistics in Medicine, 30(28), 3361–3371.
Maldonado, S., Pérez, J., & Bravo, C. (2017). Cost-based feature selection for support vector machines: An application in credit scoring. European Journal of Operational Research, 261(2), 656–665.
Martin, D. (1977). Early warning of bank failure: A logit regression approach. Journal of Banking and Finance, 1(3), 249–276.
Redondo, M., & Camarero, C. (2019). Social Capital in University Business Incubators: Dimensions, antecedents and outcomes. International Entrepreneurship and Management Journal, 15(2), 599–624.
Sohn, S. Y., & Kim, J. W. (2012). Decision tree-based technology credit scoring for start-up firms: Korean case. Expert Systems with Applications, 39(4), 4007–4012.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.
Tsaih, R., Liu, Y. J., Liu, W., & Lien, Y. L. (2004). Credit scoring system for small business loans. Decision Support Systems, 38(1), 91–99.
Van Rijnsoever, F. J., Van Weele, M. A., & Eveleens, C. P. (2017). Network brokers or hit makers? Analyzing the influence of incubation on start-up investments. International Entrepreneurship and Management Journal, 13(2), 605–629.
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.
Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2), 894–942.
Zhang, L., Ray, H., Priestley, J., & Tan, S. (2019). A descriptive study of variable discretization and cost-sensitive logistic regression on imbalanced credit data. Journal of Applied Statistics, 1–14.
This paper is supported by the Social Science Fund of Zhejiang Province, China, through the programme ‘Application Research on Establishing Prosperity Monitoring System of Small and Micro Enterprises based on Big Data(20NDJC193YB)’.
Conflicts of interest/competing interests
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Wang, C., Fang, K., Zheng, C. et al. Credit scoring of micro and small entrepreneurial firms in China. Int Entrep Manag J 17, 29–43 (2021). https://doi.org/10.1007/s11365-020-00685-8
- Credit scoring
- Micro and small entrepreneurial firms (MSEFs)
- Composite MCP
- Bi-level variable selection
- Logistic regression