Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem

Kennedy, Kenneth; Mac Namee, Brian; Delany, Sarah Jane

doi:10.1007/978-3-642-17080-5_20

Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem

Kenneth Kennedy²¹,
Brian Mac Namee²¹ &
Sarah Jane Delany²²

Conference paper

1947 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6206))

Abstract

This paper asks at what level of class imbalance one-class classifiers outperform two-class classifiers in credit scoring problems in which class imbalance, referred to as the low-default portfolio problem, is a serious issue. The question is answered by comparing the performance of a variety of one-class and two-class classifiers on a selection of credit scoring datasets as the class imbalance is manipulated. We also include random oversampling as this is one of the most common approaches to addressing class imbalance. This study analyses the suitability and performance of recognised two-class classifiers and one-class classifiers. Based on our study we conclude that the performance of the two-class classifiers deteriorates proportionally to the level of class imbalance. The two-class classifiers outperform one-class classifiers with class imbalance levels down as far as 15% (i.e. the imbalance ratio of minority class to majority class is 15:85). The one-class classifiers, whose performance remains unvaried throughout, are preferred when the minority class constitutes approximately 2% or less of the data. Between an imbalance of 2% to 15% the results are not as conclusive. These results show that one-class classifiers could potentially be used as a solution to the low-default portfolio problem experienced in the credit scoring domain.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hand, D.J., Henley, W.E.: Statistical classification methods in consumer credit scoring: a review. Journal of the Royal Statistical Society, Series A, 523–541 (1997)
Google Scholar
Verstraeten, G., den Poel, D.V.: The impact of sample bias on consumer credit scoring performance and profitability. Journal of the Operational Research Society 56, 981–992 (2004)
Article MATH Google Scholar
Joint British Bankers Asc, London Investment Banking Asc, Intl. Swaps, Derivatives Asc Industry Working Group.: The irb approach for low default portfolios (ldps)- recommendations of the joint bba, liba, isda industry working group. BBA, LIBA, ISDA Working Paper (2004)
Google Scholar
West, D.: Neural network credit scoring models. Computers and OR 27, 1131–1152 (2000)
Article MATH Google Scholar
Lee, H., Cho, S.: The novelty detection approach for different degrees of class imbalance. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4233, pp. 21–30. Springer, Heidelberg (2006)
Chapter Google Scholar
Lee, H., Cho, S.: Focusing on non-respondents: Response modeling with novelty detectors. Expert Systems with Applications 33, 522–530 (2007)
Article Google Scholar
Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMs: a case study. ACM SIGKDD Explorations Newsletter 6, 60–69 (2004)
Article Google Scholar
Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural computation 13, 1443–1471 (2001)
Article MATH Google Scholar
Vapnik, V.: The nature of statistical learning theory. Springer, New York (1995)
Book MATH Google Scholar
Bank for Intl. Settlements: Basel II: intl. convergence of capital measurement and capital standards: a revised framework. BIS (2004)
Google Scholar
Baesens, B., Gestel, T.V., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J.: Benchmarking state-of-the-art classification algorithms for credit scoring. JORS 54, 627–635 (2003)
Article MATH Google Scholar
Thomas, L.C., Oliver, R.W., Hand, D.J.: A survey of the issues in consumer credit modelling research. Journal of the Operational Research Society 56, 1006–1015 (2005)
Article MATH Google Scholar
Duda, R.O., Hart, P.E.: Pattern classification and scene analysis (1973)
Google Scholar
Ritter, G., Gallegos, M.T.: Outliers in statistical pattern recognition and an application to automatic chromosome classification. Pattern Recognition Letters 18, 525–540 (1997)
Article Google Scholar
Bishop, C.M.: Novelty detection and neural network validation. IEE Proceedings-Vision, Image and Signal processing 141, 217–222 (1994)
Article Google Scholar
Japkowicz, N., Myers, C., Gluck, M.: A novelty detection approach to classification. In: Proceedings of the Fourteenth Joint Conference on Artificial Intelligence (1995)
Google Scholar
Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognition Letters 20, 1191–1199 (1999)
Article Google Scholar
Tax, D.: One-class classification. Unpub. doc/dis. Delft University of Technology (2001)
Google Scholar
Hodge, V., Austin, J.: A survey of outlier detection methodologies. AI Rev. 22, 85–126 (2004)
MATH Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM, New York (2009)
Google Scholar
Tax, D.M.J., Duin, R.P.W.: Characterizing one-class datasets. In: Proceedings of the 16th Annual Symposium of the Pattern Recognition Assoc. of S. Africa, pp. 21–26 (2005) (Citeseer)
Google Scholar
Tax, D.M.J., Duin, R.P.W.: Support vector data description. ML 54, 45–66 (2004)
MATH Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences (2007)
Google Scholar
Hoff, K.J., Tech, M., Lingner, T., Daniel, R., Morgenstern, B., Meinicke, P.: Gene prediction in metagenomic fragments. BMC Bioinf. 9, 217 (2008)
Article Google Scholar
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explrs. Newsl. 6, 20–29 (2004)
Article Google Scholar
Liu, A., Ghosh, J., Martin, C.: Generative oversampling for mining imbalanced datasets. In: Proceedings of the 2007 International Conference on Data Mining, DMIN, pp. 25–28 (2007)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, pp. 265–320. Morgan Kaufmann Publishers, San Francisco (2000)
Google Scholar
Rijsbergen, C.J.V.: Information Retrieval. Butterworths, London (1979)
MATH Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines
Google Scholar
Ong, C.S., Huang, J.J., Tzeng, G.H.: Building credit scoring models using genetic programming. Expert Systems with Applications 29, 41–47 (2005)
Article Google Scholar
Hand, D.J.: Consumer credit and statistics. Statistics in Finance, 69–81 (1998)
Google Scholar
Quinlan, J.R.: Simplifying decision trees. Machine Intel. 27, 234 (1987)
Google Scholar
Elkan, K.: Invited talk- the real challenges in data mining- a contrarian view (2003)
Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proc of the 23rd Intl. Conf. on ML, pp. 233–240. ACM, New York (2006)
Google Scholar
Elazmeh, W., Japkowicz, N., Matwin, S.: Evaluating misclassifications in imbalanced data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 126–137. Springer, Heidelberg (2006)
Chapter Google Scholar
Drummond, C., Holte, R.C.: Explicitly representing expected cost: An alternative to ROC representation. In: Proc. of 6th ACM SIGKDD, pp. 198–207. ACM, New York (2000)
Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis 6, 429–449 (2002)
MATH Google Scholar
Weiss, G.M.: Mining with rarity. ACM SIGKDD Explorations Newsletter 6, 7–19 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Dublin Institute of Technology, Dublin, Ireland
Kenneth Kennedy & Brian Mac Namee
Digital Media Centre, Dublin Institute of Technology, Dublin, Ireland
Sarah Jane Delany

Authors

Kenneth Kennedy
View author publications
You can also search for this author in PubMed Google Scholar
Brian Mac Namee
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Jane Delany
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Lero, International Science Centre, University of Limerick, Limerick, Ireland
Lorcan Coyle
CSIRO Tasmanian ICT centre, GPO Box 1538, 7001, Hobart, Tasmania, Australia
Jill Freyne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kennedy, K., Mac Namee, B., Delany, S.J. (2010). Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem. In: Coyle, L., Freyne, J. (eds) Artificial Intelligence and Cognitive Science. AICS 2009. Lecture Notes in Computer Science(), vol 6206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17080-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-17080-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17079-9
Online ISBN: 978-3-642-17080-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics