Introducing a Vector Space Model to Perform a Proactive Credit Scoring

Saia, Roberto; Carta, Salvatore

doi:10.1007/978-3-319-99701-8_6

Introducing a Vector Space Model to Perform a Proactive Credit Scoring

Roberto Saia¹⁴ &
Salvatore Carta¹⁴

Conference paper
First Online: 14 November 2018

676 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 914))

Abstract

Many authoritative studies report how in these last years the consumer credit was up year on year, making it necessary to develop instruments able to assist the financial operators in some crucial tasks. The most important of them is to classify the loan applications as reliable or unreliable, on the basis of the customer information at their disposal. Such instruments of credit scoring allow the operators to reduce the financial losses, and for this reason they play a very important role. However, the design of effective credit scoring models is not an easy task, since it must face some problems, first among them the data imbalance in the model training. This problem arises because the number of default cases is usually much smaller than that of the non-default ones and this kind of distribution worsens the effectiveness of the state-of-the-art approaches used to define these models. This paper proposes a novel Linear Dependence Based (LDB) approach able to build a credit scoring model by using only the past non-default cases, overcoming both the imbalanced class distribution and the cold-start issues. It relies on the concept of linear dependence between the vector representations of the past and new loan applications, evaluating it in the context of a matrix. The experiments, performed by using two real-world datasets with a strong unbalanced distribution of data, show that the proposed approach achieves performance closer or better than that of one of the best state-of-the-art approaches of credit scoring such as random forests, even using only past non-default cases.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
When one of the vectors is a scalar multiple of the other.
2.
http://math.nist.gov/javanumerics/jama/.
3.
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/statlog/.
4.
https://www.r-project.org/.

References

Henley, W., et al.: Construction of a k-nearest-neighbour credit-scoring system. IMA J. Manag. Math. 8, 305–321 (1997)
Article Google Scholar
Mester, L.J.: What’s the point of credit scoring? Bus. Rev. 3, 3–16 (1997)
Google Scholar
Morrison, J.: Introduction to survival analysis in business. J. Bus. Forecast. 23, 18 (2004)
Google Scholar
Brill, J.: The importance of credit scoring models in improving cash flow and collections. Bus. Credit. 100, 16–17 (1998)
Google Scholar
Pozzolo, A.D., Caelen, O., Borgne, Y.L., Waterschoot, S., Bontempi, G.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41, 4915–4928 (2014)
Article Google Scholar
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6, 20–29 (2004)
Article Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6, 429–449 (2002)
Article Google Scholar
Lessmann, S., Baesens, B., Seow, H., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247, 124–136 (2015)
Article Google Scholar
Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39, 3446–3453 (2012)
Article Google Scholar
Bhattacharyya, S., Jha, S., Tharakunnel, K.K., Westland, J.C.: Data mining for credit card fraud: a comparative study. Decis. Support. Syst. 50, 602–613 (2011)
Article Google Scholar
Saia, R., Carta, S.: A linear-dependence-based approach to design proactive credit scoring models. In: Fred, A.L.N., Dietz, J.L.G., Aveiro, D., Liu, K., Bernardino, J., Filipe, J. (eds.) Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2016), KDIR, vol. 1, Porto, Portugal, 9–11 November 2016, pp. 111–120. SciTePress (2016)
Google Scholar
Doumpos, M., Zopounidis, C.: Credit scoring. In: Doumpos, M., Zopounidis, C. (eds.) Multicriteria Analysis in Finance, pp. 43–59. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05864-1_4
Chapter MATH Google Scholar
Ali, S., Smith, K.A.: On learning algorithm selection for classification. Appl. Soft Comput. 6, 119–138 (2006)
Article Google Scholar
Hand, D.J.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77, 103–123 (2009)
Article Google Scholar
Siami, M., Hajimohammadi, Z., et al.: Credit scoring in banks and financial institutions via data mining techniques: a literature review. J. AI Data Min. 1, 119–129 (2013)
Google Scholar
Chen, S.Y., Liu, X.: The contribution of data mining to information science. J. Inf. Sci. 30, 550–558 (2004)
Article Google Scholar
Alborzi, M., Khanbabaei, M.: Using data mining and neural networks techniques to propose a new hybrid customer behaviour analysis and credit scoring model in banking services based on a developed RFM analysis method. IJBIS 23, 1–22 (2016)
Article Google Scholar
Reichert, A.K., Cho, C.C., Wagner, G.M.: An examination of the conceptual issues involved in developing credit-scoring models. J. Bus. Econ. Stat. 1, 101–114 (1983)
Google Scholar
Henley, W.E.: Statistical aspects of credit scoring. Ph.D. thesis, Open University (1994)
Google Scholar
Desai, V.S., Crook, J.N., Overstreet, G.A.: A comparison of neural networks and linear scoring models in the credit union environment. Eur. J. Oper. Res. 95, 24–37 (1996)
Article Google Scholar
Blanco-Oliver, A., Pino-Mejías, R., Lara-Rubio, J., Rayo, S.: Credit scoring models for the microfinance industry using neural networks: evidence from Peru. Expert Syst. Appl. 40, 356–364 (2013)
Article Google Scholar
Henley, W.: A k-nearest-neighbour classifier for assessing consumer credit risk. Statistician 45, 77–95 (1996)
Article Google Scholar
Ong, C.S., Huang, J.J., Tzeng, G.H.: Building credit scoring models using genetic programming. Expert. Syst. Appl. 29, 41–47 (2005)
Article Google Scholar
Chi, B., Hsu, C.: A hybrid approach to integrate genetic algorithm into dual scoring model in enhancing the performance of credit scoring model. Expert Syst. Appl. 39, 2650–2661 (2012)
Article Google Scholar
Saia, R., Carta, S.: An entropy based algorithm for credit scoring. In: Tjoa, A.M., Xu, L.D., Raffai, M., Novak, N.M. (eds.) CONFENIS 2016. LNBIP, vol. 268, pp. 263–276. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49944-4_20
Chapter Google Scholar
Davis, R., Edelman, D., Gammerman, A.: Machine-learning algorithms for credit-card applications. IMA J. Manag. Math. 4, 43–51 (1992)
Article Google Scholar
Wang, G., Ma, J., Huang, L., Xu, K.: Two credit scoring models based on dual strategy ensemble trees. Knowl.-Based Syst. 26, 61–68 (2012)
Article Google Scholar
Hsieh, N.C.: Hybrid mining approach in the design of credit scoring models. Expert. Syst. Appl. 28, 655–665 (2005)
Article Google Scholar
Lee, T.S., Chen, I.F.: A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert. Syst. Appl. 28, 743–752 (2005)
Article Google Scholar
Wang, G., Hao, J., Ma, J., Jiang, H.: A comparative assessment of ensemble learning for credit scoring. Expert Syst. Appl. 38, 223–230 (2011)
Article Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009)
Article Google Scholar
Vinciotti, V., Hand, D.J.: Scorecard construction with unbalanced class sizes. J. Iran. Stat. Soc. 2, 189–205 (2003)
MATH Google Scholar
Marqués, A.I., García, V., Sánchez, J.S.: On the suitability of resampling techniques for the class imbalance problem in credit scoring. JORS 64, 1060–1070 (2013)
Article Google Scholar
Crone, S.F., Finlay, S.: Instance sampling in credit scoring: an empirical study of sample size and balancing. Int. J. Forecast. 28, 224–238 (2012)
Article Google Scholar
Zhu, J., Wang, H., Yao, T., Tsou, B.K.: Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Scott, D., Uszkoreit, H. (eds.) COLING 2008, 22nd International Conference on Computational Linguistics, Proceedings of the Conference, 18–22 August 2008, Manchester, UK, pp. 1137–1144 (2008)
Google Scholar
Donmez, P., Carbonell, J.G., Bennett, P.N.: Dual strategy active learning. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 116–127. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_14
Chapter Google Scholar
Attenberg, J., Provost, F.J.: Inactive learning? Difficulties employing active learning in practice. SIGKDD Explor. 12, 36–41 (2010)
Article Google Scholar
Thanuja, V., Venkateswarlu, B., Anjaneyulu, G.: Applications of data mining in customer relationship management. J. Comput. Math. Sci. 2, 399–580 (2011)
Google Scholar
Lika, B., Kolomvatsos, K., Hadjiefthymiades, S.: Facing the cold start problem in recommender systems. Expert Syst. Appl. 41, 2065–2073 (2014)
Article Google Scholar
Son, L.H.: Dealing with the new user cold-start problem in recommender systems: a comparative review. Inf. Syst. 58, 87–104 (2016)
Article Google Scholar
Fernández-Tobías, I., Tomeo, P., Cantador, I., Noia, T.D., Sciascio, E.D.: Accuracy and diversity in cross-domain recommendations for cold-start users with positive-only feedback. In: Sen, S., Geyer, W., Freyne, J., Castells, P. (eds.) Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016, pp. 119–122. ACM (2016)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article Google Scholar
Moler, C.B.: Numerical Computing with MATLAB. SIAM, Philadelphia (2004)
Book Google Scholar
Quah, J.T.S., Sriganesh, M.: Real-time credit card fraud detection using computational intelligence. Expert Syst. Appl. 35, 1721–1732 (2008)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)
Article Google Scholar
Shannon, C.E.: A mathematical theory of communication. Mob. Comput. Commun. Rev. 5, 3–55 (2001)
Article Google Scholar
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)
Article Google Scholar
Kwak, N., Choi, C.: Input feature selection for classification problems. IEEE Trans. Neural Netw. 13, 143–159 (2002)
Article Google Scholar
Jiang, F., Sui, Y., Zhou, L.: A relative decision entropy-based feature selection approach. Pattern Recognit. 48, 2151–2163 (2015)
Article Google Scholar
Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation (2011)
Google Scholar
Faraggi, D., Reiser, B.: Estimation of the area under the ROC curve. Stat. Med. 21, 3093–3106 (2002)
Article Google Scholar
Salzberg, S.: On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min. Knowl. Discov. 1, 317–328 (1997)
Article Google Scholar
Liu, Y., Schumann, M.: Data mining feature selection for credit scoring models. J. Oper. Res. Soc. 56, 1099–1108 (2005)
Article Google Scholar

Download references

Acknowledgements

This research is partially funded by Regione Sardegna under project “Next generation Open Mobile Apps Development” (NOMAD), “Pacchetti Integrati di Agevolazione” (PIA) - Industria Artigianato e Servizi - Annualità 2013.

Author information

Authors and Affiliations

Dipartimento di Matematica e Informatica, Università di Cagliari, Cagliari, Italy
Roberto Saia & Salvatore Carta

Authors

Roberto Saia
View author publications
You can also search for this author in PubMed Google Scholar
Salvatore Carta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Roberto Saia or Salvatore Carta .

Editor information

Editors and Affiliations

Instituto de Telecomunicações, Lisbon, Portugal
Ana Fred
Department of Software Technology, Delft University of Technology, Voorburg, Zuid-Holland, The Netherlands
Jan Dietz
Faculty of Exact Sciences and Engineering, University of Madeira, Funchal, Portugal
David Aveiro
Henley Business School, University of Reading, Reading, UK
Kecheng Liu
University of Coimbra, Coimbra, Portugal
Jorge Bernardino
Instituto Politecnico de Setúbal (IPS), Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saia, R., Carta, S. (2019). Introducing a Vector Space Model to Perform a Proactive Credit Scoring. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Bernardino, J., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2016. Communications in Computer and Information Science, vol 914. Springer, Cham. https://doi.org/10.1007/978-3-319-99701-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-99701-8_6
Published: 14 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99700-1
Online ISBN: 978-3-319-99701-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics