Abstract
Excessive information is known to degrade the classification performance of many machine learning algorithms. Attribute-efficient learning algorithms can tolerate irrelevant attributes without their performance being affected too much. Valiant’s projection learning is a way to combine such algorithms so that this desired property is maintained. The archetype attribute-efficient learning algorithm Winnow and, especially, combinations of Winnow have turned out empirically successful in domains containing many attributes. However, projection learning as proposed by Valiant has not yet been evaluated empirically. We study how projection learning relates to using Winnow as such and with an extended set of attributes. We also compare projection learning with decision tree learning and Naïve Bayes on UCI data sets. Projection learning systematically enhances the classification accuracy of Winnow, but the cost in time and space consumption can be high. Balanced Winnow seems to be a better alternative than the basic algorithm for learning the projection hypotheses. However, Balanced Winnow is not well suited for learning the second level (projective disjunction) hypothesis. The on-line approach projection learning does not fall far behind in classification accuracy from batch algorithms such as decision tree learning and Naïve Bayes on the UCI data sets that we used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html
Blum, A.: Empirical support for Winnow and weighted-majority based algorithms: results on a calendar scheduling domain. Proc. Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA (1995) 64–72
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Second Edition. John Wiley and Sons, New York, NY (2000)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. Proc. Thirteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA (1993) 1022–1027
Golding, A.R., Roth, D.: A Winnow-based approach to context-sensitive spelling correction. Mach. Learn. 34 (1999) 107–130
Haussler, D. Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artif. Intell. 36 (1988) 177–222
Haykin, S.: Neural Networks: A Comprehensive Foundation. Second Edition. Prentice Hall, Upper Saddle River, NJ (1999)
Khardon, R., Roth, D., Valiant L.G.: Relational learning for NLP using linear threshold elements. Proc. Sixteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA (1999) 911–919
Kivinen, J., Warmuth, M.K., Auer, P.: The Perceptron algorithm versus Winnow: linear versus logarithmic mistake bounds when few inputs are relevant. Artif. Intell. 97 (1997) 325–343
Littlestone, N.: Learning quickly when irrelevant attributes are abound: a new linear threshold algorithm. Mach. Learn. 2 (1988) 285–318
Littlestone, N.: Mistake bounds and logarithmic linear-threshold learning algorithms. Ph.D. Thesis, Report UCSC-CRL-89-11, University of California, Santa Cruz (1989)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993)
Rosenblatt, F.: The Perceptron: a probabilistic model for information storage and organization in the brain. Psychological Rev. 65 (1958) 386–407
Roth, D., Yang, M.-H., Ahuja, N.: Learning to recognize three-dimensional objects. Neural Comput. 14 (2002) 1071–1103
Servedio, R.A.: On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm. Proc. Twelfth Annual Conference on Computational Learning Theory. (1999) 296–307
Servedio, R.A.: Computational sample complexity and attribute-efficient learning. J. Comput. Syst. Sci. 60 (2000) 161–178
Ting, K.M., Witten, I.H.: Stacked generalizations: when does it work? Proc. Fifteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA (1997) 866–873
Uehara, R., Tsuchida, K., Wegener, I.: Identification of partial disjunction, parity, and threshold functions. Theor. Comput. Sci. 230 (2000) 131–147
Valiant, L.G.: Circuits of the Mind. Oxford University Press, Oxford (1994)
Valiant, L.G.: Projection learning. Mach. Learn. 37 (1999) 115–130
Valiant, L.G.: A neuroidal architecture for cognitive computation. J. ACM 47 (2000) 854–882
Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons, New York, NY (1998)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, CA (2000)
Wolpert, D.H.: Stacked generalization. Neural Networks 5 (1992) 241–259
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Elomaa, T., Lindgren, J. (2002). Experiments with Projection Learning. In: Lange, S., Satoh, K., Smith, C.H. (eds) Discovery Science. DS 2002. Lecture Notes in Computer Science, vol 2534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36182-0_13
Download citation
DOI: https://doi.org/10.1007/3-540-36182-0_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00188-1
Online ISBN: 978-3-540-36182-4
eBook Packages: Springer Book Archive