Skip to main content

A Semi-supervised Approach for Reject Inference in Credit Scoring Using SVMs

  • Conference paper
Advances in Data Mining. Applications and Theoretical Aspects (ICDM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6171))

Included in the following conference series:

Abstract

This paper presents a novel semi-supervised approach that determines a linear predictor using Support Vector Machines (SVMs) and incorporates information on rejected loans, assuming that the labeled data (accepted applicants) and unlabeled data (rejected applicants) are not drawn from the same distribution. We use a self-training algorithm in order to predict how likely a rejected applicant would have repaid had the applicant received credit. A modification to the self-training algorithm based on Platt’s probabilistic output for SVMs is introduced. Experiments with two toy data sets; one well-known benchmark Credit Scoring data set, and one project performed for a Chilean financial institution demonstrate that our approach accomplishes the best classification performance compared to well-known reject inference alternatives and another state-of-the-art semi-supervised method for SVMs (Transductive SVM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawala, A.K.: Learning with a probabilistic teacher. IEEE Transactions on Information Theory 16, 373–379 (1970)

    Article  MATH  MathSciNet  Google Scholar 

  2. Berger, A.N., Frame, W.S., Miller, N.H.: Credit scoring and the availability, price, and risk of small business credit. Journal of Money, Credit and Banking 37(2), 191–222 (2005)

    Article  Google Scholar 

  3. Blum, M.T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)

    Google Scholar 

  4. Castelli, V., Cover, T.M.: On the exponential value of labeled samples. Pattern Recognition Letters 16, 105–111 (1995)

    Article  Google Scholar 

  5. Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proceeding of the Tenth International Workshop on Artificial Intelligence and Statistic (AISTAT 2005) (2005)

    Google Scholar 

  6. Chapelle, O., Scholkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge (2005)

    Google Scholar 

  7. Chen, G., Astebro, T.: A Maximum Likelihood Approach for Reject Inference in Credit scoring. Rotman School of Management Working Paper No. 07-05 (2006)

    Google Scholar 

  8. Chye, K.H., Chin, T.W., Peng, G.C.: Credit scoring using data mining techniques. Singapore Management Review 26(2), 25(23) (2004)

    Google Scholar 

  9. Collobert, R., Weston, J., Bottou, L.: Trading convexity for scalability. In: ICML 2006, 23rd International Conference on Machine Learning, Pittsburgh, USA (2006)

    Google Scholar 

  10. Culp, M., Michailidis, G.: An iterative algorithm for extending learners to a semisupervised setting. In: The 2007 Joint Statistical Meetings (2007)

    Google Scholar 

  11. Haffari, G., Sarkar, A.: Analysis of semi-supervised learning with the Yarowsky algorithm. In: 23rd Conference on Uncertainty in Artificial Intelligence (2007)

    Google Scholar 

  12. Hartley, H.O., Rao, J.N.K.: Classification and estimation in analysis of variance problems. Review of the International Statistical Institute 36, 141–147 (1968)

    Article  MATH  MathSciNet  Google Scholar 

  13. Hettich, S., Bay, S.D.: The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine, CA (1999), http://kdd.ics.uci.edu

    Google Scholar 

  14. Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: International Conference on Machine Learning, pp. 200–209 (1999)

    Google Scholar 

  15. Johnson, R., Zhang, T.: Two-view feature generation model for semi-supervised learning. In: The 24th International Conference on Machine Learning, pp. 25–27 (2007)

    Google Scholar 

  16. Maeireizo, B., Litman, D., Hwa, R.: Co-training for predicting emotions with spoken dialogue dat. In: The Companion Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, ACL (2004)

    Google Scholar 

  17. Maldonado, S., Weber, R.: A wrapper method for feature selection using Support Vector Machines. Information Sciences 179(13), 2208–2217 (2009)

    Article  Google Scholar 

  18. Martens, D., Baesens, B., Van Gestel, T., Vanthienen, J.: Comprehensible credit scoring models using rule extraction from Support Vector Machines. European Journal of Operational Research 183(3), 1466–1476 (2007)

    Article  MATH  Google Scholar 

  19. Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Ninth International Conference on Information and Knowledge Management, pp. 86–93 (2000)

    Google Scholar 

  20. Platt, J.: Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)

    Google Scholar 

  21. Scudder, H.J.: Probability of error of some adaptive pattern-recognition machines. IEEE Transactions on Information Theory 11, 363–371 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  22. Siddiqi, N.: Credit Risk Scorecards, Developing and Implementing Intelligent Credit scoring, 1st edn. Wiley & Sons, Chichester (2005)

    Google Scholar 

  23. Thomas, L.C.: A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. International Journal of Forecasting 16(2), 149–162 (2002)

    Article  Google Scholar 

  24. Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)

    Article  MATH  Google Scholar 

  25. Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, New York (1998)

    MATH  Google Scholar 

  26. Xu, J.-M., Fumera, G., Roli, F., Zhou, Z.-H.: Training SpamAssassin with active semi-supervised learning. In: Proceedings of the 6th Conference on Email and Anti-Spam (CEAS 2009), Mountain View, CA (2009)

    Google Scholar 

  27. Zhu, X.: Semi-Supervised Learning Literature Survey. Computer Sciences TR 1530, University of Wisconsin, Madison (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maldonado, S., Paredes, G. (2010). A Semi-supervised Approach for Reject Inference in Credit Scoring Using SVMs. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2010. Lecture Notes in Computer Science(), vol 6171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14400-4_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14400-4_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14399-1

  • Online ISBN: 978-3-642-14400-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics