An algorithmic theory of learning: Robust concepts and random projection

Abstract

We study the phenomenon of cognitive learning from an algorithmic standpoint. How does the brain effectively learn concepts from a small number of examples despite the fact that each example contains a huge amount of information? We provide a novel algorithmic analysis via a model of robust concept learning (closely related to “margin classifiers”), and show that a relatively small number of examples are sufficient to learn rich concept classes. The new algorithms have several advantages—they are faster, conceptually simpler, and resistant to low levels of noise. For example, a robust half-space can be learned in linear time using only a constant number of training examples, regardless of the number of attributes. A general (algorithmic) consequence of the model, that “more robust concepts are easier to learn”, is supported by a multitude of psychological studies.

References

  1. Achlioptas, D. (2001). Database friendly random projections. In Proc. Principles of Database Systems (PODS) (pp. 274–281).

  2. Agmon, S. (1954). The relaxation method for linear inequalities. Canadian Journal of Mathematics, 6(3), 382–392.

    MATH  MathSciNet  Google Scholar 

  3. Arriaga, R. I., & Vempala, S. (1999). An algorithmic theory of Learning: Robust concepts and random projection. In Proc. of the 39th IEEE Foundations of Computer Science.

  4. Balcan, N., Blum, A., & Vempala, S. (2004). On kernels, margins and low-dimensional mappings. In Proc. of Algorithmic Learning Theory.

  5. Bartlett, P., & Shawe-Taylor, J. (1998). Generalization performance of support vector machines and other pattern classifiers, In B., Schvlkopf, C., Burges, & A.J. Smola, (eds.), Advances in kernel methods—support vector learning. MIT press.

  6. Baum, E. B. (1990). On learning a union of half spaces. journal of Complexity, 6(1), 67–101.

    Article  MATH  MathSciNet  Google Scholar 

  7. Ben-David, S., Eiron, N., & Simon, H.(2004). Limitations of learning via embeddings in euclidean half spaces. Journal of Machine Learning Research, 3, 441–461.

    Article  MathSciNet  Google Scholar 

  8. Blum, A., Frieze, A. Kannan, R., & Vempala, S. (1996). A polynomial-time algorithm for learning noisy linear threshold functions. In Proc. of the 37th IEEE Foundations of Computer Science.

  9. Blum, A., & Kannan, R. (1993). Learning an intersection of k halfspaces over a uniform distribution. In Proc. of the 34th IEEE Symposium on the Foundations of Computer Science.

  10. Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the vapnik-chervonenkis dimension. Journal of ACM, 36(4), 929–965.

    Article  MathSciNet  Google Scholar 

  11. Bylander, T. (1994). Learning linear threshold functions in the presence of classification noise. In Proc 7th Workshop on Computational Learning Theory.

  12. Cohen, E. (1997). Learning noisy perceptrons by a perceptron in polynomial time. In Proc. of the 38th IEEE Foundations of Computer Science.

  13. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.

    Google Scholar 

  14. Dasgupta, S., & Gupta, A. (1999). An elementary proof of the Johnson-Lindenstrauss Lemma. Tech Rep.U.C. Berkeley.

  15. Feller, W. (1957). An introduction to probability theory and its applications. John Wiley and Sons, Inc.

  16. Frankl, P., & Maehara, H. (1988). The Johnson-Lindenstrauss Lemma and the Sphericity of some graphs, J Comb. Theory, B 44, 355–362.

    Google Scholar 

  17. Freund, Y., & Schapire, R. E. (1999). “Large margin classification using the perceptron algorithm. Machine learning, 37(3), 277–296.

    Article  Google Scholar 

  18. Garg, A., Har-Peled, S., & Roth, D. (2002). On generalization bounds, projection profile, and margin distribution. ICML, 171–178.

  19. Garg, A., & Roth, D. (2003). Margin distribution and learning. ICML, 210–217.

  20. Glass, A.L., Holyoak, K.J., & Santa J.L. (1979). The structure of categories. In Cognition. Addison-Wesley.

  21. Grötschel, M., Lovász, L., & Schrijver, A. (1988). Geometric algorithms and combinatorial optimization, Springer.

  22. Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13–30.

    Article  MATH  MathSciNet  Google Scholar 

  23. Indyk, P., & Motwani, R. (1998). Approximate nearest neighbors: Towards removing the curse of dimensionality. In Procedings of ACM STOC.

  24. Johnson, W. B., & Lindenstrauss, J.(1984). Extensions of lipshitz mapping into Hilbert space. Contemporary Mathematics, 26, 189–206.

    MathSciNet  Google Scholar 

  25. Kearns, M.J., & Schapire, R.E. (1994). Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3), 464–497.

    Article  MathSciNet  Google Scholar 

  26. Kearns, M.J., & Vazirani, U. (1994). Introduction to computational learning theory. MIT Press.

  27. Kleinberg, J. (1997). Two algorithms for nearest-neighbor search in high dimensions. In Procedings 29th ACM Symposium on Theory of Computing.

  28. Klivans, A., & Servedio, R. (2004). Learning intersections of halfspaces with a margin, In Procedings 17th Workshop on Computational Learning Theory.

  29. Knowlton, B. (1999). What can neuropsychology tell us about category learning. Trends in Cognitive Science, 3, 123–124.

    Article  Google Scholar 

  30. Komatsu, L.K. (1992). Recent views on conceptual structure. Psychological Bulletin, 112(3).

  31. Linial, N., London, E., & Rabinovich, Y. (1994). The geometry of graphs and some of its algorithmic applications. In Proc. of 35th IEEE Foundations of Computer Science.

  32. Littlestone, N. (1987). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285–318.

    Google Scholar 

  33. Littlestone, N. (1991). Redundant noisy attributes, attribute errors, and linear threshold learning using winnow. In Proc. 4th workshop on computational learning theory.

  34. Mandler, J. M. (2003). Conceptual Categorization, Chapter 5. In D. H., Rakinson, & L. M. Oakes, (eds.), Early category and concept development: Making sense of the blooming, buzzing confusion. Oxford University Press.

  35. Minsky, M., & Papert., S., (1969). Perceptrons: An introduction to computational geometry. The MIT press.

  36. Nosofsky, R., & Zaki, S., (1969). Math modeling, neuropsychology, and category learning: Response to B. Knowlton (1999). Trends in Cognitive Science, 3, 125–126, 1999. Perceptrons: An introduction to computational geometry. The MIT press.

  37. Rakinson, D. H., & Oakes, L. M. (eds.), (2003). Early category and concept development: Making sense of the blooming, buzzing confusion. Oxford University Press.

  38. Reed, S. K. (1982). Categorization, in cognition: Theory and applications. brooks/cole.

  39. Reed, S. K., & Friedman, M. P. (1973). Perceptual vs. conceptual categorization. Memory and Cognition, 1.

  40. Rosch, E. (1978). Principles of categorization. in E., Rosch, & Lloyd, B. B. (eds.), Cognition and categorization Hillsdale.

  41. Rosch, E. H., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic ojects in natural categories. Cognitive Psychology, 8.

  42. Rosenblatt, F. (1962). Principles of neurodynamics. Spartan Books.

  43. Schapire, R. E., Freund, Y., Bartlett, P. L., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5), 1651–1686.

    Article  MathSciNet  Google Scholar 

  44. Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.

    Article  MATH  Google Scholar 

  45. Valiant, L. G. (1998). A neuroidal architecture for cognitive computation. In Proc. of ICALP.

  46. Vapnik, V. N. (1995). The nature of statistical learning theory. Springer.

  47. Vapnik, V. N., & Chervonenkis, A. Ya., (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its applications, 26(2), 264–280.

    Article  MathSciNet  Google Scholar 

  48. Vempala, S. (1997). A random sampling based algorithm for learning the Intersection of Half-spaces. In Proc. of the 38th IEEE Foundations of Computer Science.

  49. Vempala, S. (2004). The random projection method. 65. DIMACS series, AMS.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Santosh Vempala.

Additional information

Editor: Shai Ben-David

A preliminary version of this paper appeared in the Proc. of the Symposium on the Foundations of Computer Science, 1999

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Arriaga, R.I., Vempala, S. An algorithmic theory of learning: Robust concepts and random projection. Mach Learn 63, 161–182 (2006). https://doi.org/10.1007/s10994-006-6265-7

Download citation

Keywords

  • Learning
  • Cognition
  • Random projection
  • Robust concepts