Academic Obsessions and Classification Realities: Ignoring Practicalities in Supervised Classification

Hand, David J.

doi:10.1007/978-3-642-17103-1_21

David J. Hand²³

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organisation ((STUDIES CLASS))

1523 Accesses
9 Citations
1 Altmetric

Abstract

Supervised classification methods have been the focus of a vast amount of research in recent decades, within a variety of intellectual disciplines, including statistics, machine learning, pattern recognition, and data mining. Highly sophisticated methods have been developed, using the full power of recent advances in computation. Many of these methods would have been simply inconceivable to earlier generations. However, most of these advances have largely taken place within the context of the classical supervised classification paradigm of data analysis. That is, a classification rule is constructed based on a given ‘design sample’ of data, with known and well-defined classes, and this rule is then used to classify future objects. This paper argues that this paradigm is often, perhaps typically, an over-idealisation of the practical realities of supervised classification problems. Furthermore, it is also argued that the sequential nature of the statistical modelling process means that the large gains in predictive accuracy are achieved early in the modelling process. Putting these two facts together leads to the suspicion that the apparent superiority of the highly sophisticated methods is often illusory: simple methods are often equally effective or even superior in classifying new data points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adams, N. M., and Hand, D. J. (1999). “Comparing Classifiers When the Misallocation Costs are Uncertain,” Pattern Recognition, 32, 1139–1147.
Article Google Scholar
Benton, T. C. (2002). “Theoretical and Empirical Models,” Ph.D. dissertation, Department of Mathematics, Imperial College London, UK.
Google Scholar
Blake, C, and Merz, C. J. (1998). UCI Repository of Machine Learning Databases [www.ics.uci.edu/mlearn/MLRepository.html], Irvine, CA: University of California, Department of Information and Computer Science.
Google Scholar
Brodley, C. E., and Smyth, P. (1997). “Applying Classification Algorithms in Practice,” Statistics and Computing, 7, 45–56.
Article Google Scholar
Cannan, E. (1892). “The Origin of the Law of Diminishing Returns,” Economic Journal, 2, 1813–1815.
Article Google Scholar
Fisher, R. A. (1936). “The Use of Multiple Measurements in Taxonomic Problems,” Annals of Eugenics, 7, 179–184.
Google Scholar
Friedman, J. H. (1997). On Bias, Variance, 0/1 Loss, and the Curse of Dimensionality,” Data Mining and Knowledge Discovery, 1, 55–77.
Article Google Scholar
Gallagher, J. C, Hedlund, L. R., Stoner, S., and Meeger, C. (1988). “Vertebral Morphometry: Normative Data,” Bone and Mineral, 4, 189–196.
Google Scholar
Hand, D. J. (1981). Discrimination and Classification. Chichester: Wiley.
MATH Google Scholar
Hand, D. J. (1986). “Recent Advances in Error Rate Estimation,” Pattern Recognition Letters, 4, 335–346.
Article Google Scholar
Hand, D. J. (1987). “Screening Versus Prevalence Estimation,” Applied Statistics, 36, 1–7.
Article Google Scholar
Hand, D. J. (1996). “Classification and Computers: Shifting the Focus,” in COMPSTAT-Proceedings in Computational Statistics, 1996, ed. A. Prat, Physica-Verlag, pp. 77–88.
Google Scholar
Hand, D. J. (1997). Construction and Assessment of Classification Rules. Chichester: Wiley.
MATH Google Scholar
Hand, D. J. (1998). “Strategy, Methods, and Solving the Right Problem,” Computational Statistics, 13, 5–14.
MATH Google Scholar
Hand, D. J., (1999). “Intelligent Data Analysis and Deep Understanding,” in Causal Models and Intelligent Data Management, ed. A. Gammerman, Springer-Verlag, pp. 67–80.
Google Scholar
Hand, D. J. (2001). “Measuring Diagnostic Accuracy of Statistical Prediction Rules,” Statistica Neerlandica, 53, 3–16.
Article MathSciNet Google Scholar
Hand, D. J. (2001b). “Modelling Consumer Credit Risk,” IMA Journal of Management Mathematics, 12, 139–155.
Article MATH Google Scholar
Hand, D. J. (2001c). “Reject Inference in Credit Operations,” in it Handbook of Credit Scoring, ed. E. Mays, Chicago: Glenlake Publishing, pp. 225–240.
Google Scholar
Hand, D. J. (2003). “Supervised Classification and Tunnel Vision,” Technical Report, Department of Mathematics, Imperial College London.
Google Scholar
Hand, D. J. (2003b). “Good Practice in Retail Credit Scorecard Assessment,” Technical Report, Department of Mathematics, Imperial College London.
Google Scholar
Hand, D. J. (2003c). “Pattern Recognition,” to appear in Handbook of Statistics, ed. E. Wegman.
Google Scholar
Hand D. J. and Henley W.E. (1997). “Statistical Classification Methods in Consumer Credit Scoring: A Review,” Journal of the Royal Statistical Society, Series A, 160, 523–541.
Google Scholar
Hand, D. J. and Vinciotti, V. (2003). “Local Versus Global Models for Classification Problems: Fitting Models Where It Matters,” The American Statistician, 57, 124–131.
Article MathSciNet Google Scholar
Heckman, J. (1976). “The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables, and a Simple Estimator for Such Models,” Annals of Economic and Social Measurement, 5, 475–492.
Google Scholar
Holte, R. C. (1993). “Very Simple Classification Rules Perform Well on Most Commonly Used Datasets,” Machine Learning, 11, 63–91.
Article MATH Google Scholar
Kelly, M. G, and Hand, D. J. (1999). “Credit Scoring with Uncertain Class Definitions,” IMA Journal of Mathematics Applied in Business and Industry, 10, 331–345.
MATH Google Scholar
Kelly, M. G., Hand, D. J., and Adams, N. M. (1998). “Defining the Goals to Optimise Data Mining Performance,” in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, ed. R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, Menlo Park: AAAI Press, pp. 234–238.
Google Scholar
Kelly, M. G., Hand, D. J., and Adams, N. M. (1999). “The Impact of Changing Populations on Classifier Performance,” Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ed. S. Chaudhuri and D. Madigan, Association for Computing Machinery, New York, pp. 367–371.
Chapter Google Scholar
Kelly, M. G., Hand, D. J., and Adams, N. M. (1999b). “Supervised Classification Problems: How to be Both Judge and Jury,” in Advances in Intelligent Data Analysis, ed. D. J. Hand, J. N. Kok, and M. R. Berthold, Springer, Berlin, pp. 235–244.
Chapter Google Scholar
Lane, T. and Brodley, C. E. (1998). “Approaches to Online Learning and Concept Drift for User Identification in Computer Security,” in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, ed. R. A. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, AAAI Press, Menlo Park, California, pp. 259–263.
Google Scholar
Lewis, E. M. (1994). An Introduction to Credit Scoring, San Rafael, California: Athena Press.
Google Scholar
Li, H. G. and Hand, D. J. (2002). “Direct Versus Indirect Credit Scoring Classifications,” Journal of the Operational Research Society, 53, 1–8.
Article Google Scholar
Mingers, J. (1989). “An Empirical Comparison of Pruning Methods for Decision Tree Induction,” Machine Learning, 4, 227–243.
Article Google Scholar
Rendell, L. and Sechu, R. (1990). “Learning Hard Concepts Through Construcive Induction,” Computational Intelligence, 6, 247–270.
Article Google Scholar
Ripley, B. D. (1996). Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge.
MATH Google Scholar
Rosenberg, E. and Gleit, A. (1994). “Quantitative Methods in Credit Management: A Survey,” Operations Research, 42, 589–613.
Article MATH Google Scholar
Schiavo, R. and Hand, D. J. (2000). “Ten More Years of Error Rate Research,” International Statistical Review, 68, 295–310.
Article MATH Google Scholar
Shavlik, J., Mooney, R. J., and Towell, G. (1991). “Symbolic and Neural Learning Algorithms: An Experimental Comparison,” Machine Learning, 6, 111–143.
Google Scholar
Thomas, L. C. (2000). “A Survey of Credit and Behavioural Scoring: Forecasting Financial Risk of Lending to Consumers,” International Journal of Forecasting, 16, 149–172.
Article MATH Google Scholar
Webb, A. (2002). Statistical Pattern Recognition, 2nd ed. Chichester: Wiley.
Book MATH Google Scholar
Weiss, S. M., Galen, R. S., and Tadepalli, P. V. (1990). “Maximizing the Predictive Value of Production Rules,” Artificial Intelligence, 45, 47–71.
Article Google Scholar
Widmer, G. and Kubat, M. (1996). “Learning in the Presence of Concept Drift and Hidden Contexts,” Machine Learning, 23, 69–101.
Google Scholar

Download references

Author information

Authors and Affiliations

Imperial College, England
David J. Hand

Authors

David J. Hand
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Leanna House Institute of Statistics and Decision Sciences, Duke University, 27708, Durham, NC, USA
David Banks
Department of Mathematics, Illinois Institute of Technology, 10 West 32nd Street, 60616-3793, Chicago, IL, USA
Frederick R. McMorris
Faculty of Management, Rutgers University, 180 University Avenue, 07102-1895, Newark, NJ, USA
Phipps Arabie
Institute of Decision Theory, University of Karlsruhe, Kaiserstr. 12, 76128, Karlsruhe, Germany
Wolfgang Gaul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hand, D.J. (2004). Academic Obsessions and Classification Realities: Ignoring Practicalities in Supervised Classification. In: Banks, D., McMorris, F.R., Arabie, P., Gaul, W. (eds) Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organisation. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17103-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-17103-1_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22014-5
Online ISBN: 978-3-642-17103-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics