Abstract
In this paper, we present a newviewof multiclass classification and introduce the constraint classification problem, a generalization that captures many flavors of multiclass classification. We provide the first optimal, distribution independent bounds for many multiclass learning algorithms, including winner-take-all (WTA). Based on our view, we present a learning algorithm that learns via a single linear classifier in high dimension. In addition to the distribution independent bounds, we provide a simple margin-based analysis improving generalization bounds for linear multiclass support vector machines.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
M. Anthony and P. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, England, 1999.
Chidanand Apte, Fred Damerau, and Sholom M. Weiss. Automated learning of decision rules for text categorization. Information Systems, 12(3):233–251, 1994.
E. Allwein, R.E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. In Proc. 17th International Conf. on Machine Learning, pages 9–16. Morgan Kaufmann, San Francisco, CA, 2000.
S. Ben-David, N. Cesa-Bianchi, D. Haussler, and P. Long. Characterizations of learnability for classes of 0,..., n-valued functions. J. Comput. Sys. Sci., 50(1):74–86, 1995.
E. Brill. Some advances in transformation-based part of speech tagging. In AAAI, Vol. 1, pages 722–727, 1994.
A. Carlson, C. Cumby, J. Rosen, and D. Roth. The SNoW learning architecture. Technical Report UIUCDCS-R-99-2101, UIUC Computer Science Department, May 1999.
K. Crammer and Y. Singer. On the learnability and design of output codes for multiclass problems. In Computational Learing Theory, pages 35–46, 2000.
K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. J. Machine Learning Research, 2 (December):265–292, 2001.
K. Crammer and Y. Singer. Ultraconservative online algorithms for multiclass problems. In COLT/EuroCOLT, pages 99–115, 2001.
Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000.
I. Dagan, Y. Karov, and D. Roth. Mistake-driven learning in text categorization. In EMNLP-97, The Second Conference on Empirical Methods in Natural Language Processing, pages 55–63, 1997.
T. Hastie and R. Tibshirani. Classification by pairwise coupling. In NIPS-10, The 1997 Conference on Advances in Neural Information Processing Systems, pages 507–513. MIT Press, 1998.
F. Jelinek. Statistical Methods for Speech Recognition. The MIT Press, Cambridge, Massachusetts, 1998.
T. Kohonen. Sel-Organizing Maps. Springer Verlag, NewYork, 3rd edition, 2001.
Y. Le Cun, B. Boser, J. Denker, D. Hendersen, R. Howard, W. Hubbard, and L. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1:pp 541, 1989.
D. Lee and H. Seung. Unsupervised learning by convex and conic coding. In Michael C. Mozer, Michael I. Jordan, and Thomas Petsche, editors, Advances in Neural Information Processing Systems, volume 9, page 515. The MIT Press, 1997.
W. Maass. On the computational power of winner-take-all. Neural Computation, 12(11):2519–2536, 2000.
D. Roth. Learning to resolve natural language ambiguities: A unified approach. In Proc. of AAAI, pages 806–813, 1998.
D. Roth and D. Zelenko. Part of speech tagging using a network of linear separators. In COLING-ACL 98, The 17th International Conference on Computational Linguistics, pages 1136–1142, 1998.
R.E. Schapire. Using output codes to boost multiclass learning problems. In Proc. 14th Internat. Conf. on Machine Learning, pages 313–321. Morgan Kaufmann, 1997.
V. Vapnik. Statistical Learning Theory. Wiley, 605 Third Avenue, New York, New York, 10158–10212, 1998.
J. Weston and C. Watkins. Support vector machines for multiclass pattern recognition. In Proceedings of the Seventh European Symposium On Artificial Neural Networks, 4 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Har-Peled, S., Roth, D., Zimak, D. (2002). Constraint Classification: A New Approach to Multiclass Classification. In: Cesa-Bianchi, N., Numao, M., Reischuk, R. (eds) Algorithmic Learning Theory. ALT 2002. Lecture Notes in Computer Science(), vol 2533. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36169-3_29
Download citation
DOI: https://doi.org/10.1007/3-540-36169-3_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00170-6
Online ISBN: 978-3-540-36169-5
eBook Packages: Springer Book Archive