Skip to main content

Introducing a Vector Space Model to Perform a Proactive Credit Scoring

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 914))

Abstract

Many authoritative studies report how in these last years the consumer credit was up year on year, making it necessary to develop instruments able to assist the financial operators in some crucial tasks. The most important of them is to classify the loan applications as reliable or unreliable, on the basis of the customer information at their disposal. Such instruments of credit scoring allow the operators to reduce the financial losses, and for this reason they play a very important role. However, the design of effective credit scoring models is not an easy task, since it must face some problems, first among them the data imbalance in the model training. This problem arises because the number of default cases is usually much smaller than that of the non-default ones and this kind of distribution worsens the effectiveness of the state-of-the-art approaches used to define these models. This paper proposes a novel Linear Dependence Based (LDB) approach able to build a credit scoring model by using only the past non-default cases, overcoming both the imbalanced class distribution and the cold-start issues. It relies on the concept of linear dependence between the vector representations of the past and new loan applications, evaluating it in the context of a matrix. The experiments, performed by using two real-world datasets with a strong unbalanced distribution of data, show that the proposed approach achieves performance closer or better than that of one of the best state-of-the-art approaches of credit scoring such as random forests, even using only past non-default cases.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    When one of the vectors is a scalar multiple of the other.

  2. 2.

    http://math.nist.gov/javanumerics/jama/.

  3. 3.

    ftp://ftp.ics.uci.edu/pub/machine-learning-databases/statlog/.

  4. 4.

    https://www.r-project.org/.

References

  1. Henley, W., et al.: Construction of a k-nearest-neighbour credit-scoring system. IMA J. Manag. Math. 8, 305–321 (1997)

    Article  Google Scholar 

  2. Mester, L.J.: What’s the point of credit scoring? Bus. Rev. 3, 3–16 (1997)

    Google Scholar 

  3. Morrison, J.: Introduction to survival analysis in business. J. Bus. Forecast. 23, 18 (2004)

    Google Scholar 

  4. Brill, J.: The importance of credit scoring models in improving cash flow and collections. Bus. Credit. 100, 16–17 (1998)

    Google Scholar 

  5. Pozzolo, A.D., Caelen, O., Borgne, Y.L., Waterschoot, S., Bontempi, G.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41, 4915–4928 (2014)

    Article  Google Scholar 

  6. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6, 20–29 (2004)

    Article  Google Scholar 

  7. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6, 429–449 (2002)

    Article  Google Scholar 

  8. Lessmann, S., Baesens, B., Seow, H., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247, 124–136 (2015)

    Article  Google Scholar 

  9. Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39, 3446–3453 (2012)

    Article  Google Scholar 

  10. Bhattacharyya, S., Jha, S., Tharakunnel, K.K., Westland, J.C.: Data mining for credit card fraud: a comparative study. Decis. Support. Syst. 50, 602–613 (2011)

    Article  Google Scholar 

  11. Saia, R., Carta, S.: A linear-dependence-based approach to design proactive credit scoring models. In: Fred, A.L.N., Dietz, J.L.G., Aveiro, D., Liu, K., Bernardino, J., Filipe, J. (eds.) Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2016), KDIR, vol. 1, Porto, Portugal, 9–11 November 2016, pp. 111–120. SciTePress (2016)

    Google Scholar 

  12. Doumpos, M., Zopounidis, C.: Credit scoring. In: Doumpos, M., Zopounidis, C. (eds.) Multicriteria Analysis in Finance, pp. 43–59. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05864-1_4

    Chapter  MATH  Google Scholar 

  13. Ali, S., Smith, K.A.: On learning algorithm selection for classification. Appl. Soft Comput. 6, 119–138 (2006)

    Article  Google Scholar 

  14. Hand, D.J.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77, 103–123 (2009)

    Article  Google Scholar 

  15. Siami, M., Hajimohammadi, Z., et al.: Credit scoring in banks and financial institutions via data mining techniques: a literature review. J. AI Data Min. 1, 119–129 (2013)

    Google Scholar 

  16. Chen, S.Y., Liu, X.: The contribution of data mining to information science. J. Inf. Sci. 30, 550–558 (2004)

    Article  Google Scholar 

  17. Alborzi, M., Khanbabaei, M.: Using data mining and neural networks techniques to propose a new hybrid customer behaviour analysis and credit scoring model in banking services based on a developed RFM analysis method. IJBIS 23, 1–22 (2016)

    Article  Google Scholar 

  18. Reichert, A.K., Cho, C.C., Wagner, G.M.: An examination of the conceptual issues involved in developing credit-scoring models. J. Bus. Econ. Stat. 1, 101–114 (1983)

    Google Scholar 

  19. Henley, W.E.: Statistical aspects of credit scoring. Ph.D. thesis, Open University (1994)

    Google Scholar 

  20. Desai, V.S., Crook, J.N., Overstreet, G.A.: A comparison of neural networks and linear scoring models in the credit union environment. Eur. J. Oper. Res. 95, 24–37 (1996)

    Article  Google Scholar 

  21. Blanco-Oliver, A., Pino-Mejías, R., Lara-Rubio, J., Rayo, S.: Credit scoring models for the microfinance industry using neural networks: evidence from Peru. Expert Syst. Appl. 40, 356–364 (2013)

    Article  Google Scholar 

  22. Henley, W.: A k-nearest-neighbour classifier for assessing consumer credit risk. Statistician 45, 77–95 (1996)

    Article  Google Scholar 

  23. Ong, C.S., Huang, J.J., Tzeng, G.H.: Building credit scoring models using genetic programming. Expert. Syst. Appl. 29, 41–47 (2005)

    Article  Google Scholar 

  24. Chi, B., Hsu, C.: A hybrid approach to integrate genetic algorithm into dual scoring model in enhancing the performance of credit scoring model. Expert Syst. Appl. 39, 2650–2661 (2012)

    Article  Google Scholar 

  25. Saia, R., Carta, S.: An entropy based algorithm for credit scoring. In: Tjoa, A.M., Xu, L.D., Raffai, M., Novak, N.M. (eds.) CONFENIS 2016. LNBIP, vol. 268, pp. 263–276. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49944-4_20

    Chapter  Google Scholar 

  26. Davis, R., Edelman, D., Gammerman, A.: Machine-learning algorithms for credit-card applications. IMA J. Manag. Math. 4, 43–51 (1992)

    Article  Google Scholar 

  27. Wang, G., Ma, J., Huang, L., Xu, K.: Two credit scoring models based on dual strategy ensemble trees. Knowl.-Based Syst. 26, 61–68 (2012)

    Article  Google Scholar 

  28. Hsieh, N.C.: Hybrid mining approach in the design of credit scoring models. Expert. Syst. Appl. 28, 655–665 (2005)

    Article  Google Scholar 

  29. Lee, T.S., Chen, I.F.: A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert. Syst. Appl. 28, 743–752 (2005)

    Article  Google Scholar 

  30. Wang, G., Hao, J., Ma, J., Jiang, H.: A comparative assessment of ensemble learning for credit scoring. Expert Syst. Appl. 38, 223–230 (2011)

    Article  Google Scholar 

  31. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009)

    Article  Google Scholar 

  32. Vinciotti, V., Hand, D.J.: Scorecard construction with unbalanced class sizes. J. Iran. Stat. Soc. 2, 189–205 (2003)

    MATH  Google Scholar 

  33. Marqués, A.I., García, V., Sánchez, J.S.: On the suitability of resampling techniques for the class imbalance problem in credit scoring. JORS 64, 1060–1070 (2013)

    Article  Google Scholar 

  34. Crone, S.F., Finlay, S.: Instance sampling in credit scoring: an empirical study of sample size and balancing. Int. J. Forecast. 28, 224–238 (2012)

    Article  Google Scholar 

  35. Zhu, J., Wang, H., Yao, T., Tsou, B.K.: Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Scott, D., Uszkoreit, H. (eds.) COLING 2008, 22nd International Conference on Computational Linguistics, Proceedings of the Conference, 18–22 August 2008, Manchester, UK, pp. 1137–1144 (2008)

    Google Scholar 

  36. Donmez, P., Carbonell, J.G., Bennett, P.N.: Dual strategy active learning. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 116–127. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_14

    Chapter  Google Scholar 

  37. Attenberg, J., Provost, F.J.: Inactive learning? Difficulties employing active learning in practice. SIGKDD Explor. 12, 36–41 (2010)

    Article  Google Scholar 

  38. Thanuja, V., Venkateswarlu, B., Anjaneyulu, G.: Applications of data mining in customer relationship management. J. Comput. Math. Sci. 2, 399–580 (2011)

    Google Scholar 

  39. Lika, B., Kolomvatsos, K., Hadjiefthymiades, S.: Facing the cold start problem in recommender systems. Expert Syst. Appl. 41, 2065–2073 (2014)

    Article  Google Scholar 

  40. Son, L.H.: Dealing with the new user cold-start problem in recommender systems: a comparative review. Inf. Syst. 58, 87–104 (2016)

    Article  Google Scholar 

  41. Fernández-Tobías, I., Tomeo, P., Cantador, I., Noia, T.D., Sciascio, E.D.: Accuracy and diversity in cross-domain recommendations for cold-start users with positive-only feedback. In: Sen, S., Geyer, W., Freyne, J., Castells, P. (eds.) Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016, pp. 119–122. ACM (2016)

    Google Scholar 

  42. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  Google Scholar 

  43. Moler, C.B.: Numerical Computing with MATLAB. SIAM, Philadelphia (2004)

    Book  Google Scholar 

  44. Quah, J.T.S., Sriganesh, M.: Real-time credit card fraud detection using computational intelligence. Expert Syst. Appl. 35, 1721–1732 (2008)

    Article  Google Scholar 

  45. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)

    Article  Google Scholar 

  46. Shannon, C.E.: A mathematical theory of communication. Mob. Comput. Commun. Rev. 5, 3–55 (2001)

    Article  Google Scholar 

  47. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)

    Article  Google Scholar 

  48. Kwak, N., Choi, C.: Input feature selection for classification problems. IEEE Trans. Neural Netw. 13, 143–159 (2002)

    Article  Google Scholar 

  49. Jiang, F., Sui, Y., Zhou, L.: A relative decision entropy-based feature selection approach. Pattern Recognit. 48, 2151–2163 (2015)

    Article  Google Scholar 

  50. Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation (2011)

    Google Scholar 

  51. Faraggi, D., Reiser, B.: Estimation of the area under the ROC curve. Stat. Med. 21, 3093–3106 (2002)

    Article  Google Scholar 

  52. Salzberg, S.: On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min. Knowl. Discov. 1, 317–328 (1997)

    Article  Google Scholar 

  53. Liu, Y., Schumann, M.: Data mining feature selection for credit scoring models. J. Oper. Res. Soc. 56, 1099–1108 (2005)

    Article  Google Scholar 

Download references

Acknowledgements

This research is partially funded by Regione Sardegna under project “Next generation Open Mobile Apps Development” (NOMAD), “Pacchetti Integrati di Agevolazione” (PIA) - Industria Artigianato e Servizi - Annualità 2013.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Roberto Saia or Salvatore Carta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Saia, R., Carta, S. (2019). Introducing a Vector Space Model to Perform a Proactive Credit Scoring. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Bernardino, J., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2016. Communications in Computer and Information Science, vol 914. Springer, Cham. https://doi.org/10.1007/978-3-319-99701-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99701-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99700-1

  • Online ISBN: 978-3-319-99701-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics