We study the phenomenon of cognitive learning from an algorithmic standpoint. How does the brain effectively learn concepts from a small number of examples despite the fact that each example contains a huge amount of information? We provide a novel algorithmic analysis via a model of robust concept learning (closely related to “margin classifiers”), and show that a relatively small number of examples are sufficient to learn rich concept classes. The new algorithms have several advantages—they are faster, conceptually simpler, and resistant to low levels of noise. For example, a robust half-space can be learned in linear time using only a constant number of training examples, regardless of the number of attributes. A general (algorithmic) consequence of the model, that “more robust concepts are easier to learn”, is supported by a multitude of psychological studies.
Achlioptas, D. (2001). Database friendly random projections. In Proc. Principles of Database Systems (PODS) (pp. 274–281).
Agmon, S. (1954). The relaxation method for linear inequalities. Canadian Journal of Mathematics, 6(3), 382–392.
Arriaga, R. I., & Vempala, S. (1999). An algorithmic theory of Learning: Robust concepts and random projection. In Proc. of the 39th IEEE Foundations of Computer Science.
Balcan, N., Blum, A., & Vempala, S. (2004). On kernels, margins and low-dimensional mappings. In Proc. of Algorithmic Learning Theory.
Bartlett, P., & Shawe-Taylor, J. (1998). Generalization performance of support vector machines and other pattern classifiers, In B., Schvlkopf, C., Burges, & A.J. Smola, (eds.), Advances in kernel methods—support vector learning. MIT press.
Baum, E. B. (1990). On learning a union of half spaces. journal of Complexity, 6(1), 67–101.
Ben-David, S., Eiron, N., & Simon, H.(2004). Limitations of learning via embeddings in euclidean half spaces. Journal of Machine Learning Research, 3, 441–461.
Blum, A., Frieze, A. Kannan, R., & Vempala, S. (1996). A polynomial-time algorithm for learning noisy linear threshold functions. In Proc. of the 37th IEEE Foundations of Computer Science.
Blum, A., & Kannan, R. (1993). Learning an intersection of k halfspaces over a uniform distribution. In Proc. of the 34th IEEE Symposium on the Foundations of Computer Science.
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the vapnik-chervonenkis dimension. Journal of ACM, 36(4), 929–965.
Bylander, T. (1994). Learning linear threshold functions in the presence of classification noise. In Proc 7th Workshop on Computational Learning Theory.
Cohen, E. (1997). Learning noisy perceptrons by a perceptron in polynomial time. In Proc. of the 38th IEEE Foundations of Computer Science.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.
Dasgupta, S., & Gupta, A. (1999). An elementary proof of the Johnson-Lindenstrauss Lemma. Tech Rep.U.C. Berkeley.
Feller, W. (1957). An introduction to probability theory and its applications. John Wiley and Sons, Inc.
Frankl, P., & Maehara, H. (1988). The Johnson-Lindenstrauss Lemma and the Sphericity of some graphs, J Comb. Theory, B 44, 355–362.
Freund, Y., & Schapire, R. E. (1999). “Large margin classification using the perceptron algorithm. Machine learning, 37(3), 277–296.
Garg, A., Har-Peled, S., & Roth, D. (2002). On generalization bounds, projection profile, and margin distribution. ICML, 171–178.
Garg, A., & Roth, D. (2003). Margin distribution and learning. ICML, 210–217.
Glass, A.L., Holyoak, K.J., & Santa J.L. (1979). The structure of categories. In Cognition. Addison-Wesley.
Grötschel, M., Lovász, L., & Schrijver, A. (1988). Geometric algorithms and combinatorial optimization, Springer.
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13–30.
Indyk, P., & Motwani, R. (1998). Approximate nearest neighbors: Towards removing the curse of dimensionality. In Procedings of ACM STOC.
Johnson, W. B., & Lindenstrauss, J.(1984). Extensions of lipshitz mapping into Hilbert space. Contemporary Mathematics, 26, 189–206.
Kearns, M.J., & Schapire, R.E. (1994). Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3), 464–497.
Kearns, M.J., & Vazirani, U. (1994). Introduction to computational learning theory. MIT Press.
Kleinberg, J. (1997). Two algorithms for nearest-neighbor search in high dimensions. In Procedings 29th ACM Symposium on Theory of Computing.
Klivans, A., & Servedio, R. (2004). Learning intersections of halfspaces with a margin, In Procedings 17th Workshop on Computational Learning Theory.
Knowlton, B. (1999). What can neuropsychology tell us about category learning. Trends in Cognitive Science, 3, 123–124.
Komatsu, L.K. (1992). Recent views on conceptual structure. Psychological Bulletin, 112(3).
Linial, N., London, E., & Rabinovich, Y. (1994). The geometry of graphs and some of its algorithmic applications. In Proc. of 35th IEEE Foundations of Computer Science.
Littlestone, N. (1987). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285–318.
Littlestone, N. (1991). Redundant noisy attributes, attribute errors, and linear threshold learning using winnow. In Proc. 4th workshop on computational learning theory.
Mandler, J. M. (2003). Conceptual Categorization, Chapter 5. In D. H., Rakinson, & L. M. Oakes, (eds.), Early category and concept development: Making sense of the blooming, buzzing confusion. Oxford University Press.
Minsky, M., & Papert., S., (1969). Perceptrons: An introduction to computational geometry. The MIT press.
Nosofsky, R., & Zaki, S., (1969). Math modeling, neuropsychology, and category learning: Response to B. Knowlton (1999). Trends in Cognitive Science, 3, 125–126, 1999. Perceptrons: An introduction to computational geometry. The MIT press.
Rakinson, D. H., & Oakes, L. M. (eds.), (2003). Early category and concept development: Making sense of the blooming, buzzing confusion. Oxford University Press.
Reed, S. K. (1982). Categorization, in cognition: Theory and applications. brooks/cole.
Reed, S. K., & Friedman, M. P. (1973). Perceptual vs. conceptual categorization. Memory and Cognition, 1.
Rosch, E. (1978). Principles of categorization. in E., Rosch, & Lloyd, B. B. (eds.), Cognition and categorization Hillsdale.
Rosch, E. H., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic ojects in natural categories. Cognitive Psychology, 8.
Rosenblatt, F. (1962). Principles of neurodynamics. Spartan Books.
Schapire, R. E., Freund, Y., Bartlett, P. L., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5), 1651–1686.
Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.
Valiant, L. G. (1998). A neuroidal architecture for cognitive computation. In Proc. of ICALP.
Vapnik, V. N. (1995). The nature of statistical learning theory. Springer.
Vapnik, V. N., & Chervonenkis, A. Ya., (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its applications, 26(2), 264–280.
Vempala, S. (1997). A random sampling based algorithm for learning the Intersection of Half-spaces. In Proc. of the 38th IEEE Foundations of Computer Science.
Vempala, S. (2004). The random projection method. 65. DIMACS series, AMS.
Editor: Shai Ben-David
A preliminary version of this paper appeared in the Proc. of the Symposium on the Foundations of Computer Science, 1999
About this article
Cite this article
Arriaga, R.I., Vempala, S. An algorithmic theory of learning: Robust concepts and random projection. Mach Learn 63, 161–182 (2006). https://doi.org/10.1007/s10994-006-6265-7
- Random projection
- Robust concepts