Abstract
This paper presents a novel semi-supervised approach that determines a linear predictor using Support Vector Machines (SVMs) and incorporates information on rejected loans, assuming that the labeled data (accepted applicants) and unlabeled data (rejected applicants) are not drawn from the same distribution. We use a self-training algorithm in order to predict how likely a rejected applicant would have repaid had the applicant received credit. A modification to the self-training algorithm based on Platt’s probabilistic output for SVMs is introduced. Experiments with two toy data sets; one well-known benchmark Credit Scoring data set, and one project performed for a Chilean financial institution demonstrate that our approach accomplishes the best classification performance compared to well-known reject inference alternatives and another state-of-the-art semi-supervised method for SVMs (Transductive SVM).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawala, A.K.: Learning with a probabilistic teacher. IEEE Transactions on Information Theory 16, 373–379 (1970)
Berger, A.N., Frame, W.S., Miller, N.H.: Credit scoring and the availability, price, and risk of small business credit. Journal of Money, Credit and Banking 37(2), 191–222 (2005)
Blum, M.T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Castelli, V., Cover, T.M.: On the exponential value of labeled samples. Pattern Recognition Letters 16, 105–111 (1995)
Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proceeding of the Tenth International Workshop on Artificial Intelligence and Statistic (AISTAT 2005) (2005)
Chapelle, O., Scholkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge (2005)
Chen, G., Astebro, T.: A Maximum Likelihood Approach for Reject Inference in Credit scoring. Rotman School of Management Working Paper No. 07-05 (2006)
Chye, K.H., Chin, T.W., Peng, G.C.: Credit scoring using data mining techniques. Singapore Management Review 26(2), 25(23) (2004)
Collobert, R., Weston, J., Bottou, L.: Trading convexity for scalability. In: ICML 2006, 23rd International Conference on Machine Learning, Pittsburgh, USA (2006)
Culp, M., Michailidis, G.: An iterative algorithm for extending learners to a semisupervised setting. In: The 2007 Joint Statistical Meetings (2007)
Haffari, G., Sarkar, A.: Analysis of semi-supervised learning with the Yarowsky algorithm. In: 23rd Conference on Uncertainty in Artificial Intelligence (2007)
Hartley, H.O., Rao, J.N.K.: Classification and estimation in analysis of variance problems. Review of the International Statistical Institute 36, 141–147 (1968)
Hettich, S., Bay, S.D.: The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine, CA (1999), http://kdd.ics.uci.edu
Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: International Conference on Machine Learning, pp. 200–209 (1999)
Johnson, R., Zhang, T.: Two-view feature generation model for semi-supervised learning. In: The 24th International Conference on Machine Learning, pp. 25–27 (2007)
Maeireizo, B., Litman, D., Hwa, R.: Co-training for predicting emotions with spoken dialogue dat. In: The Companion Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, ACL (2004)
Maldonado, S., Weber, R.: A wrapper method for feature selection using Support Vector Machines. Information Sciences 179(13), 2208–2217 (2009)
Martens, D., Baesens, B., Van Gestel, T., Vanthienen, J.: Comprehensible credit scoring models using rule extraction from Support Vector Machines. European Journal of Operational Research 183(3), 1466–1476 (2007)
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Ninth International Conference on Information and Knowledge Management, pp. 86–93 (2000)
Platt, J.: Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)
Scudder, H.J.: Probability of error of some adaptive pattern-recognition machines. IEEE Transactions on Information Theory 11, 363–371 (1965)
Siddiqi, N.: Credit Risk Scorecards, Developing and Implementing Intelligent Credit scoring, 1st edn. Wiley & Sons, Chichester (2005)
Thomas, L.C.: A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. International Journal of Forecasting 16(2), 149–162 (2002)
Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)
Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, New York (1998)
Xu, J.-M., Fumera, G., Roli, F., Zhou, Z.-H.: Training SpamAssassin with active semi-supervised learning. In: Proceedings of the 6th Conference on Email and Anti-Spam (CEAS 2009), Mountain View, CA (2009)
Zhu, X.: Semi-Supervised Learning Literature Survey. Computer Sciences TR 1530, University of Wisconsin, Madison (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maldonado, S., Paredes, G. (2010). A Semi-supervised Approach for Reject Inference in Credit Scoring Using SVMs. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2010. Lecture Notes in Computer Science(), vol 6171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14400-4_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-14400-4_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14399-1
Online ISBN: 978-3-642-14400-4
eBook Packages: Computer ScienceComputer Science (R0)