A Semi-supervised Approach for Reject Inference in Credit Scoring Using SVMs

Maldonado, Sebastián; Paredes, Gonzalo

doi:10.1007/978-3-642-14400-4_43

Sebastián Maldonado²⁰ &
Gonzalo Paredes²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6171))

Included in the following conference series:

Industrial Conference on Data Mining

2534 Accesses
8 Citations

Abstract

This paper presents a novel semi-supervised approach that determines a linear predictor using Support Vector Machines (SVMs) and incorporates information on rejected loans, assuming that the labeled data (accepted applicants) and unlabeled data (rejected applicants) are not drawn from the same distribution. We use a self-training algorithm in order to predict how likely a rejected applicant would have repaid had the applicant received credit. A modification to the self-training algorithm based on Platt’s probabilistic output for SVMs is introduced. Experiments with two toy data sets; one well-known benchmark Credit Scoring data set, and one project performed for a Chilean financial institution demonstrate that our approach accomplishes the best classification performance compared to well-known reject inference alternatives and another state-of-the-art semi-supervised method for SVMs (Transductive SVM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawala, A.K.: Learning with a probabilistic teacher. IEEE Transactions on Information Theory 16, 373–379 (1970)
Article MATH MathSciNet Google Scholar
Berger, A.N., Frame, W.S., Miller, N.H.: Credit scoring and the availability, price, and risk of small business credit. Journal of Money, Credit and Banking 37(2), 191–222 (2005)
Article Google Scholar
Blum, M.T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Google Scholar
Castelli, V., Cover, T.M.: On the exponential value of labeled samples. Pattern Recognition Letters 16, 105–111 (1995)
Article Google Scholar
Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proceeding of the Tenth International Workshop on Artificial Intelligence and Statistic (AISTAT 2005) (2005)
Google Scholar
Chapelle, O., Scholkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge (2005)
Google Scholar
Chen, G., Astebro, T.: A Maximum Likelihood Approach for Reject Inference in Credit scoring. Rotman School of Management Working Paper No. 07-05 (2006)
Google Scholar
Chye, K.H., Chin, T.W., Peng, G.C.: Credit scoring using data mining techniques. Singapore Management Review 26(2), 25(23) (2004)
Google Scholar
Collobert, R., Weston, J., Bottou, L.: Trading convexity for scalability. In: ICML 2006, 23rd International Conference on Machine Learning, Pittsburgh, USA (2006)
Google Scholar
Culp, M., Michailidis, G.: An iterative algorithm for extending learners to a semisupervised setting. In: The 2007 Joint Statistical Meetings (2007)
Google Scholar
Haffari, G., Sarkar, A.: Analysis of semi-supervised learning with the Yarowsky algorithm. In: 23rd Conference on Uncertainty in Artificial Intelligence (2007)
Google Scholar
Hartley, H.O., Rao, J.N.K.: Classification and estimation in analysis of variance problems. Review of the International Statistical Institute 36, 141–147 (1968)
Article MATH MathSciNet Google Scholar
Hettich, S., Bay, S.D.: The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine, CA (1999), http://kdd.ics.uci.edu
Google Scholar
Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: International Conference on Machine Learning, pp. 200–209 (1999)
Google Scholar
Johnson, R., Zhang, T.: Two-view feature generation model for semi-supervised learning. In: The 24th International Conference on Machine Learning, pp. 25–27 (2007)
Google Scholar
Maeireizo, B., Litman, D., Hwa, R.: Co-training for predicting emotions with spoken dialogue dat. In: The Companion Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, ACL (2004)
Google Scholar
Maldonado, S., Weber, R.: A wrapper method for feature selection using Support Vector Machines. Information Sciences 179(13), 2208–2217 (2009)
Article Google Scholar
Martens, D., Baesens, B., Van Gestel, T., Vanthienen, J.: Comprehensible credit scoring models using rule extraction from Support Vector Machines. European Journal of Operational Research 183(3), 1466–1476 (2007)
Article MATH Google Scholar
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Ninth International Conference on Information and Knowledge Management, pp. 86–93 (2000)
Google Scholar
Platt, J.: Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)
Google Scholar
Scudder, H.J.: Probability of error of some adaptive pattern-recognition machines. IEEE Transactions on Information Theory 11, 363–371 (1965)
Article MATH MathSciNet Google Scholar
Siddiqi, N.: Credit Risk Scorecards, Developing and Implementing Intelligent Credit scoring, 1st edn. Wiley & Sons, Chichester (2005)
Google Scholar
Thomas, L.C.: A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. International Journal of Forecasting 16(2), 149–162 (2002)
Article Google Scholar
Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)
Article MATH Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, New York (1998)
MATH Google Scholar
Xu, J.-M., Fumera, G., Roli, F., Zhou, Z.-H.: Training SpamAssassin with active semi-supervised learning. In: Proceedings of the 6th Conference on Email and Anti-Spam (CEAS 2009), Mountain View, CA (2009)
Google Scholar
Zhu, X.: Semi-Supervised Learning Literature Survey. Computer Sciences TR 1530, University of Wisconsin, Madison (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial Engineering, University of Chile,
Sebastián Maldonado & Gonzalo Paredes

Authors

Sebastián Maldonado
View author publications
You can also search for this author in PubMed Google Scholar
Gonzalo Paredes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Bildverarbeitung und angewandte Informatik, Körnerstr. 10, 04107, Leipzig, Deutschland
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maldonado, S., Paredes, G. (2010). A Semi-supervised Approach for Reject Inference in Credit Scoring Using SVMs. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2010. Lecture Notes in Computer Science(), vol 6171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14400-4_43

Download citation

DOI: https://doi.org/10.1007/978-3-642-14400-4_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14399-1
Online ISBN: 978-3-642-14400-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics