Introducing a Vector Space Model to Perform a Proactive Credit Scoring
Many authoritative studies report how in these last years the consumer credit was up year on year, making it necessary to develop instruments able to assist the financial operators in some crucial tasks. The most important of them is to classify the loan applications as reliable or unreliable, on the basis of the customer information at their disposal. Such instruments of credit scoring allow the operators to reduce the financial losses, and for this reason they play a very important role. However, the design of effective credit scoring models is not an easy task, since it must face some problems, first among them the data imbalance in the model training. This problem arises because the number of default cases is usually much smaller than that of the non-default ones and this kind of distribution worsens the effectiveness of the state-of-the-art approaches used to define these models. This paper proposes a novel Linear Dependence Based (LDB) approach able to build a credit scoring model by using only the past non-default cases, overcoming both the imbalanced class distribution and the cold-start issues. It relies on the concept of linear dependence between the vector representations of the past and new loan applications, evaluating it in the context of a matrix. The experiments, performed by using two real-world datasets with a strong unbalanced distribution of data, show that the proposed approach achieves performance closer or better than that of one of the best state-of-the-art approaches of credit scoring such as random forests, even using only past non-default cases.
KeywordsBusiness intelligence Decision support system Credit scoring Data mining Algorithms Metrics
This research is partially funded by Regione Sardegna under project “Next generation Open Mobile Apps Development” (NOMAD), “Pacchetti Integrati di Agevolazione” (PIA) - Industria Artigianato e Servizi - Annualità 2013.
- 2.Mester, L.J.: What’s the point of credit scoring? Bus. Rev. 3, 3–16 (1997)Google Scholar
- 3.Morrison, J.: Introduction to survival analysis in business. J. Bus. Forecast. 23, 18 (2004)Google Scholar
- 4.Brill, J.: The importance of credit scoring models in improving cash flow and collections. Bus. Credit. 100, 16–17 (1998)Google Scholar
- 11.Saia, R., Carta, S.: A linear-dependence-based approach to design proactive credit scoring models. In: Fred, A.L.N., Dietz, J.L.G., Aveiro, D., Liu, K., Bernardino, J., Filipe, J. (eds.) Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2016), KDIR, vol. 1, Porto, Portugal, 9–11 November 2016, pp. 111–120. SciTePress (2016)Google Scholar
- 15.Siami, M., Hajimohammadi, Z., et al.: Credit scoring in banks and financial institutions via data mining techniques: a literature review. J. AI Data Min. 1, 119–129 (2013)Google Scholar
- 18.Reichert, A.K., Cho, C.C., Wagner, G.M.: An examination of the conceptual issues involved in developing credit-scoring models. J. Bus. Econ. Stat. 1, 101–114 (1983)Google Scholar
- 19.Henley, W.E.: Statistical aspects of credit scoring. Ph.D. thesis, Open University (1994)Google Scholar
- 35.Zhu, J., Wang, H., Yao, T., Tsou, B.K.: Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Scott, D., Uszkoreit, H. (eds.) COLING 2008, 22nd International Conference on Computational Linguistics, Proceedings of the Conference, 18–22 August 2008, Manchester, UK, pp. 1137–1144 (2008)Google Scholar
- 36.Donmez, P., Carbonell, J.G., Bennett, P.N.: Dual strategy active learning. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 116–127. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_14CrossRefGoogle Scholar
- 38.Thanuja, V., Venkateswarlu, B., Anjaneyulu, G.: Applications of data mining in customer relationship management. J. Comput. Math. Sci. 2, 399–580 (2011)Google Scholar
- 41.Fernández-Tobías, I., Tomeo, P., Cantador, I., Noia, T.D., Sciascio, E.D.: Accuracy and diversity in cross-domain recommendations for cold-start users with positive-only feedback. In: Sen, S., Geyer, W., Freyne, J., Castells, P. (eds.) Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016, pp. 119–122. ACM (2016)Google Scholar
- 50.Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation (2011)Google Scholar