Skip to main content

Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6206))

Abstract

This paper asks at what level of class imbalance one-class classifiers outperform two-class classifiers in credit scoring problems in which class imbalance, referred to as the low-default portfolio problem, is a serious issue. The question is answered by comparing the performance of a variety of one-class and two-class classifiers on a selection of credit scoring datasets as the class imbalance is manipulated. We also include random oversampling as this is one of the most common approaches to addressing class imbalance. This study analyses the suitability and performance of recognised two-class classifiers and one-class classifiers. Based on our study we conclude that the performance of the two-class classifiers deteriorates proportionally to the level of class imbalance. The two-class classifiers outperform one-class classifiers with class imbalance levels down as far as 15% (i.e. the imbalance ratio of minority class to majority class is 15:85). The one-class classifiers, whose performance remains unvaried throughout, are preferred when the minority class constitutes approximately 2% or less of the data. Between an imbalance of 2% to 15% the results are not as conclusive. These results show that one-class classifiers could potentially be used as a solution to the low-default portfolio problem experienced in the credit scoring domain.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hand, D.J., Henley, W.E.: Statistical classification methods in consumer credit scoring: a review. Journal of the Royal Statistical Society, Series A, 523–541 (1997)

    Google Scholar 

  2. Verstraeten, G., den Poel, D.V.: The impact of sample bias on consumer credit scoring performance and profitability. Journal of the Operational Research Society 56, 981–992 (2004)

    Article  MATH  Google Scholar 

  3. Joint British Bankers Asc, London Investment Banking Asc, Intl. Swaps, Derivatives Asc Industry Working Group.: The irb approach for low default portfolios (ldps)- recommendations of the joint bba, liba, isda industry working group. BBA, LIBA, ISDA Working Paper (2004)

    Google Scholar 

  4. West, D.: Neural network credit scoring models. Computers and OR 27, 1131–1152 (2000)

    Article  MATH  Google Scholar 

  5. Lee, H., Cho, S.: The novelty detection approach for different degrees of class imbalance. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4233, pp. 21–30. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Lee, H., Cho, S.: Focusing on non-respondents: Response modeling with novelty detectors. Expert Systems with Applications 33, 522–530 (2007)

    Article  Google Scholar 

  7. Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMs: a case study. ACM SIGKDD Explorations Newsletter 6, 60–69 (2004)

    Article  Google Scholar 

  8. Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural computation 13, 1443–1471 (2001)

    Article  MATH  Google Scholar 

  9. Vapnik, V.: The nature of statistical learning theory. Springer, New York (1995)

    Book  MATH  Google Scholar 

  10. Bank for Intl. Settlements: Basel II: intl. convergence of capital measurement and capital standards: a revised framework. BIS (2004)

    Google Scholar 

  11. Baesens, B., Gestel, T.V., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J.: Benchmarking state-of-the-art classification algorithms for credit scoring. JORS 54, 627–635 (2003)

    Article  MATH  Google Scholar 

  12. Thomas, L.C., Oliver, R.W., Hand, D.J.: A survey of the issues in consumer credit modelling research. Journal of the Operational Research Society 56, 1006–1015 (2005)

    Article  MATH  Google Scholar 

  13. Duda, R.O., Hart, P.E.: Pattern classification and scene analysis (1973)

    Google Scholar 

  14. Ritter, G., Gallegos, M.T.: Outliers in statistical pattern recognition and an application to automatic chromosome classification. Pattern Recognition Letters 18, 525–540 (1997)

    Article  Google Scholar 

  15. Bishop, C.M.: Novelty detection and neural network validation. IEE Proceedings-Vision, Image and Signal processing 141, 217–222 (1994)

    Article  Google Scholar 

  16. Japkowicz, N., Myers, C., Gluck, M.: A novelty detection approach to classification. In: Proceedings of the Fourteenth Joint Conference on Artificial Intelligence (1995)

    Google Scholar 

  17. Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognition Letters 20, 1191–1199 (1999)

    Article  Google Scholar 

  18. Tax, D.: One-class classification. Unpub. doc/dis. Delft University of Technology (2001)

    Google Scholar 

  19. Hodge, V., Austin, J.: A survey of outlier detection methodologies. AI Rev. 22, 85–126 (2004)

    MATH  Google Scholar 

  20. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM, New York (2009)

    Google Scholar 

  21. Tax, D.M.J., Duin, R.P.W.: Characterizing one-class datasets. In: Proceedings of the 16th Annual Symposium of the Pattern Recognition Assoc. of S. Africa, pp. 21–26 (2005) (Citeseer)

    Google Scholar 

  22. Tax, D.M.J., Duin, R.P.W.: Support vector data description. ML 54, 45–66 (2004)

    MATH  Google Scholar 

  23. Asuncion, A., Newman, D.: UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences (2007)

    Google Scholar 

  24. Hoff, K.J., Tech, M., Lingner, T., Daniel, R., Morgenstern, B., Meinicke, P.: Gene prediction in metagenomic fragments. BMC Bioinf. 9, 217 (2008)

    Article  Google Scholar 

  25. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explrs. Newsl. 6, 20–29 (2004)

    Article  Google Scholar 

  26. Liu, A., Ghosh, J., Martin, C.: Generative oversampling for mining imbalanced datasets. In: Proceedings of the 2007 International Conference on Data Mining, DMIN, pp. 25–28 (2007)

    Google Scholar 

  27. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, pp. 265–320. Morgan Kaufmann Publishers, San Francisco (2000)

    Google Scholar 

  28. Rijsbergen, C.J.V.: Information Retrieval. Butterworths, London (1979)

    MATH  Google Scholar 

  29. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines

    Google Scholar 

  30. Ong, C.S., Huang, J.J., Tzeng, G.H.: Building credit scoring models using genetic programming. Expert Systems with Applications 29, 41–47 (2005)

    Article  Google Scholar 

  31. Hand, D.J.: Consumer credit and statistics. Statistics in Finance, 69–81 (1998)

    Google Scholar 

  32. Quinlan, J.R.: Simplifying decision trees. Machine Intel. 27, 234 (1987)

    Google Scholar 

  33. Elkan, K.: Invited talk- the real challenges in data mining- a contrarian view (2003)

    Google Scholar 

  34. Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proc of the 23rd Intl. Conf. on ML, pp. 233–240. ACM, New York (2006)

    Google Scholar 

  35. Elazmeh, W., Japkowicz, N., Matwin, S.: Evaluating misclassifications in imbalanced data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 126–137. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  36. Drummond, C., Holte, R.C.: Explicitly representing expected cost: An alternative to ROC representation. In: Proc. of 6th ACM SIGKDD, pp. 198–207. ACM, New York (2000)

    Google Scholar 

  37. Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis 6, 429–449 (2002)

    MATH  Google Scholar 

  38. Weiss, G.M.: Mining with rarity. ACM SIGKDD Explorations Newsletter 6, 7–19 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kennedy, K., Mac Namee, B., Delany, S.J. (2010). Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem. In: Coyle, L., Freyne, J. (eds) Artificial Intelligence and Cognitive Science. AICS 2009. Lecture Notes in Computer Science(), vol 6206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17080-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17080-5_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17079-9

  • Online ISBN: 978-3-642-17080-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics