Abstract
We propose to use systematic simulation studies as opposed to the use of real-world benchmark datasets to better understand the behaviour, strengths and weaknesses of machine learning algorithms. Simulated data sets allow much better control and understanding of the nature of the learning problem than empirical benchmark data sets.
To demonstrate the value of our proposed research methodology, we describe in this paper the results of our studies concerning the problem of learning multiple classes. We derived the following hypothesis: “Learning classification functions using decision tree learners can be helped by providing additional subclass labels.” To illustrate, for learning a two class problem “car is OK/car needs service” it can be helpful to provide a finer-grained classification in the training data such as “car OK”, “faulty brakes”, “faulty engine”, “faulty lights”, etc.
This hypothesis was corroborated using a number of ‘real-world’ multi-class data sets from the UCIMLrepository. Our empirical studies demonstrate the usefulness of the proposed research methodology using artificial data sets as an important methodological complement to using real-world datasets.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
T. Dietterich. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 7(10):1895–1924, 1998.
J. H. Gennari, P. Langley, and D. Fisher. Model of incremental concept formation. Artificial Intelligence, 40:11–61, 1989.
K. Lang and M. Witbrock. Learning to tell 2-spirals apart. In Connectionist Models Summer School, 1988.
D. D. Margineantu and T. G. Dietterich. Bootstrap methods for the cost-sensitive evaluation of classifiers. In Proceedings of the 17th International Conference on Machine Learning, pages 582–590. Kaufmann, 2000.
C. Mesterharm. A multi-class linear learning algorithm related to Winnow. In Neural Information Processing Systems (NIPS-12), pages 519–525. MIT Press, 2000.
D. Michie and D. Spiegelhalter. Machine Learning, Neural and Statistical Classification. Ellis Horwood, 1994.
T. Scheffer. Predicting the generalization performance of cross validatory model selection criteria. In Proceedings of the 17th International Conference on Machine Learning. Kaufmann, 2000.
J. Weston and C. Watkins. Multi-class support vector machines. Technical Report CSD-TR-98-04, Royal Holloway, University of London, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hoffmann, A., Kwok, R., Compton, P. (2001). Using Subclasses to Improve Classification Learning. In: De Raedt, L., Flach, P. (eds) Machine Learning: ECML 2001. ECML 2001. Lecture Notes in Computer Science(), vol 2167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44795-4_18
Download citation
DOI: https://doi.org/10.1007/3-540-44795-4_18
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42536-6
Online ISBN: 978-3-540-44795-5
eBook Packages: Springer Book Archive