Advertisement

A View of Computational Learning Theory

  • Leslie G. Valiant
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 195)

Abstract

The distribution-free or “pac” approach to machine learning is described. The motivations, basic definitions and some of the more important results in this theory are summarized.

Keywords

Boolean Function Finite Automaton Disjunctive Normal Form Computing Machinery Boolean Circuit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abe, N. (1989). Polynomial learnability of semilinear sets. In Proceedings of the 2nd Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 25–40.Google Scholar
  2. Angluin, D. (1987a). Learning regular sets from queries and counter examples. Information and Computation, 75:87–106.MATHCrossRefMathSciNetGoogle Scholar
  3. Angluin, D. (1987b). Queries and concept learning. Machine Learning, 2:319–342.Google Scholar
  4. Angluin, D., Hellerstein, L., & Karpinski, M. (1989). Learning read-once formulas with queries (Technical Report Rept. No. UCB/CSD 89/528). Computer Science Division and University of California and Berkeley.Google Scholar
  5. Angluin, D. & Laird, P. (1987). Learning from noisy examples. Machine Learning, 2:343–370.Google Scholar
  6. Baum, E. (1990a). The perceptron algorithm is fast for non-malicious distributions. Neural Computation, 2:249–261.Google Scholar
  7. Baum, E. (1990b). A polynomial time algorithm that learns two hidden unit nets. In Proceedings of the 3rd Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA.Google Scholar
  8. Baum, E. (1990c). When are k-nearest neighbor and back propagation accurate for feasible sized sets of examples? Lecture Notes in Computer Science, 412:2–25.MathSciNetGoogle Scholar
  9. Baum, E. & Haussler, D. (1989). What size net gives valid generalization. Neural Computation, 1(1):151–160.CrossRefGoogle Scholar
  10. Ben-David, S., Benedek, G., & Mansour, Y. (1989). A parametrization scheme for classifying models of learnability. In Proceedings of the 2nd Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 285–302.Google Scholar
  11. Benedek, G. & Itai, A. (1987). Nonuniform learnability, (Technical Report TR 474). Computer Science Department, Technion, Haifa, Israel.Google Scholar
  12. Benedek, G. M. & Itai, A. (1988). Learnability by fixed distributions. In Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 80–90.Google Scholar
  13. Berman, P. & Roos, R. (1987). Learning one-counter languages in polynomial time. In Proceedings of the 28th IEEE Symposium on Foundations of Computer Science, IEEE Computer Society Press, Washington, D.C., 61–67.Google Scholar
  14. Blum, A. (1990a). Learning boolean functions in an infinite attribute space. In Proceedings of the 22nd ACM Symposium on Theory of Computing, The Association for Computing Machinery, New York, NY.Google Scholar
  15. Blum, A. (1990b). Separating distribution-free and mistake-bound learning models over the boolean domain. In Proceedings of the 31st IEEE Symposium on Foundation of Computer Science, IEEE Computer Society Press, Washington, D.C., 211–218.Google Scholar
  16. Blum, A. & Rivest, R. (1988). Training a 3-node neural network is NP-complete. In Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 9–18.Google Scholar
  17. Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1987). Occam’s razor. Information Proc. Letters, 25:377–380.CrossRefMathSciNetGoogle Scholar
  18. Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1989). Learnability and the Vapnik-Chervonenkis dimension. J. ACM, 36(2):929–965.MATHCrossRefMathSciNetGoogle Scholar
  19. Board, R. & Pitt, L. (1990). On the necessity of Occam algorithms. In Proceedings of the 22nd ACM Symposium on Theory of Computing, The Association for Computing Machinery, New York, NY.Google Scholar
  20. Boucheron, S. & Sallantin, J. (1988). Some remarks about space-complexity of learning, and circuit complexity of recognizing. In Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 125–138.Google Scholar
  21. Dietterich, T. (1990). Machine learning. Ann. Rev. of Comp. Sci., 4.Google Scholar
  22. Ehrenfeucht, A. & Haussler, D. (1989). Learning decision trees from random examples. Inf. and Computation, 231–247.Google Scholar
  23. Ehrenfeucht, A., Haussler, D., Kearns, M., & Valiant, L. (1989). A general lower bound on the number of examples needed for learning. Inf. and Computation, 247–261.Google Scholar
  24. Floyd, S. (1989). Space-bounded learning and the Vapnik-Chervonenkis dimension. Proceedings of the 2nd Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 349–364.Google Scholar
  25. Freund, Y. (1990). Boosting a weak learning algorithm by majority. Proceedings of the 3rd Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA.Google Scholar
  26. Geréb-Graus, M. (1989). Lower Bounds on Parallel, Distributed and Automata Computations. (PhD thesis, Harvard University).Google Scholar
  27. Goldman, S., Rivest, R., & Schapire, R. (1989). Learning binary relations and total orders. In Proceedings of the 30th IEEE Symposium on Foundations of Computer Science, IEEE Computer Society Press, Washington, D.C., 46–53.CrossRefGoogle Scholar
  28. Goldreich, O., Goldwasser, S., & Micali, S. (1986). How to construct random functions. J. ACM, 33(4):792–807.CrossRefMathSciNetGoogle Scholar
  29. Gu, Q. & Maruoka, A. (1988). Learning monotone boolean functions by uniform distributed examples. Manuscript.Google Scholar
  30. Hancock, T. (1990). Identifying μ-formula decision trees with queries. In Proceedings of the 2nd Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA.Google Scholar
  31. Haussler, D. (1987). Bias, version spaces and Valiant’s learning frame-work. In Proc. 4th Intl. Workshop on Machine Learning, Morgan Kaufmann, 324–336Google Scholar
  32. Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial Intelligence, 36(2):177–222.MATHCrossRefMathSciNetGoogle Scholar
  33. Haussler, D. (1990). Learning conjunctive concepts in structural domains. Machine Learning, 4.Google Scholar
  34. Haussler, D., Kearns, M., Littlestone, N., & Warmuth, M. (1988a). Equivalence of models of polynomial learnability. In Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 42–55.Google Scholar
  35. Haussler, D., Littlestone, N., & Warmuth, M. (1988b). Predicting 0,1-functions on randomly drawn points. In Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 280–296.Google Scholar
  36. Helmbold, D., Sloan, R., & Warmuth, M. (1989). Learning nested differences of intersection-closed concept classes. In Proceedings of the 2nd Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 41–56.Google Scholar
  37. Helmbold, D., Sloan, R., & Warmuth, M. (1990). Learning integer lattices. In Proceedings of the 3rd Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA.Google Scholar
  38. Judd, J. (1988). Learning in neural nets. In Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 2–8.Google Scholar
  39. Kearns, M. (1990). The Computational Complexity of Machine Learning. MIT Press.Google Scholar
  40. Kearns, M. & Li, M. (1988). Learning in the presence of malicious errors. In Proceedings of the 20th ACM Symposium on Theory of Computing, The Association for Computing Machinery, New York, NY, 267–279.Google Scholar
  41. Kearns, M., Li, M., Pitt, L., & Valiant, L. (1987a). On the learnability of Boolean formulae. In Proceedings of the 19th ACM Symposium on Theory of Computing, The Association for Computing Machinery, New York, NY, 285–295.Google Scholar
  42. Kearns, M., Li, M., Pitt, L., & Valiant, L. (1987b). Recent results on Boolean concept learning. In Proc. 4th Int. Workshop on Machine Learning, Los Altos, CA. Morgan Kaufmann, 337–352.Google Scholar
  43. Kearns, M., Li, M., & Valiant, L. (1989). Learning boolean formulae. Submitted for publication.Google Scholar
  44. Kearns, M. & Pitt, L. (1989). A polynomial-time algorithm for learning k-variable pattern languages from examples. In Proceedings of the 2nd Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 57–71.Google Scholar
  45. Kearns, M. & Schapire, R. (1990). Efficient distribution-free learning of probabilistic concepts. In Proceedings of the 3rd Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA.Google Scholar
  46. Kearns, M. & Valiant, L. (1989). Cryptographic limitations on learning boolean formulae and finite automata. In Proceedings of the 21st ACM Symposium on Theory of Computing, The Association for Computing Machinery, New York, NY, 433–444.Google Scholar
  47. Kivinen, J. (1989). Reliable and useful learning. In Proceedings of the 2nd Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 365–380.Google Scholar
  48. Kucera, L., Marchetti-Spaccamela, A., & Protasi, M. (1988). On the learnability of dnf formulae. In ICALP, 347–361.Google Scholar
  49. Laird, P. (1989). A survey of computational learning theory (Technical Report RIA-89-01-07-0), NASA, Ames Research Center.Google Scholar
  50. Li, M. & Vazirani, U. (1988). On the learnability of finite automata. In Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 359–370.Google Scholar
  51. Li, M. & Vitanyi, P. (1989). A theory of learning simple concepts under simple distributions and average case complexity for the universal distribution. In Proceedings of the 30th IEEE Symposium on Foundations of Computer Science, IEEE Computer Society Press, Washington, D.C., 34–39.CrossRefGoogle Scholar
  52. Lin, J.-H. & Vitter, S. (1989). Complexity issues in learning by neural nets. In Proceedings of the 2nd Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 118–133.Google Scholar
  53. Linial, N., Mansour, Y., & Nisan, N. (1989). Constant depth circuits, Fourier transforms and learnability. In Proceedings of the 30th IEEE Symposium on Foundations of Computer Science, IEEE Computer Society Press, Washington, D.C., 574–579.CrossRefGoogle Scholar
  54. Linial, N., Mansour, Y., & Rivest, R. (1988). Results on learnability and the Vapnik-Chervonenkis dimension. In Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 56–68.Google Scholar
  55. Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: a new linear threshold algorithm. Machine Learning, 2(4):245–318.Google Scholar
  56. Littlestone, N. (1989). From on-line to batch learning. In Proceedings of the 2nd Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 269–284.Google Scholar
  57. Megiddo, N. (1986). On the complexity of polyhedral separability, (Technical Report RJ 5252), IBM Almaden Research Center.Google Scholar
  58. Minsky, M. & Papert, S. (1988). Perceptrons: an introduction to computational geometry. MIT Press.Google Scholar
  59. Natarajan, B. (1987). On learning boolean functions. In Proceedings of the 19th ACM Symposium on Theory of Computing, The Association for Computing Machinery, New York, NY, 296–304.Google Scholar
  60. Natarajan, B. (1990). Probably approximate learning over classes of distributions. Manuscript.Google Scholar
  61. Ohguro, T. & Maruoka, A. (1989). A learning algorithm for monotone k-term dnf. In Fujitsu HAS-SIS Workshop on Computational Learning Theory.Google Scholar
  62. Paturi, R., Rajasekaran, S., & Reif, J. (1989). The light bulb problem. In Proceedings of the 2nd Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 261–268.Google Scholar
  63. Pitt, L. (1989). Inductive inference, dfas and computational complexity. In Jantke, K., (editor), Analogical and Indictive Inference. Lecture Notes in Computer Science, Vol. 397, pp.(18–44) Spring-Verlag.Google Scholar
  64. Pitt, L. & Valiant, L. (1988). Computational limitations on learning from examples. J. ACM, 35(4):965–984.MATHCrossRefMathSciNetGoogle Scholar
  65. Pitt, L. & Warmuth, M. (1988). Reductions among prediction problems: on the difficulty of predicting automata. In Proc. 3rd IEEE Conf. on Structure in Complexity Theory, 60–69.Google Scholar
  66. Pitt, L. & Warmuth, M. (1989). The minimal consistent dfa problem cannot be approximated within any polynomial. In Proceedings of the 21st ACM Symposium on Theory of Computing, The Association for Computing Machinery, New York, NY, 421–432.Google Scholar
  67. Rivest, R. (1987). Learning decision lists. Machine Learning, 2(3):229–246.Google Scholar
  68. Rivest, R. & Sloan, R. (1988). Learning complicated concepts reliably and usefully. In Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 69–79.Google Scholar
  69. Rivest, R. L. & Schapire, R. (1987). Diversity-based inference of finite automata. In Proceedings of the 28th IEEE Symposium on Foundations of Computer Science, IEEE Computer Society Press, Washington, D.C., 78–88.Google Scholar
  70. Rivest, R. L. & Schapire, R. (1989). Inference of finite automata using homing sequences. In Proceedings of the 21st ACM Symposium on Theory of Computing, The Association for Computing Machinery, New York, NY, 411–420.Google Scholar
  71. Rosenblatt, F. (1961). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington, D.C.Google Scholar
  72. Sakakibara, Y. (1988). Learning context-free grammars from structural data in polynomial time. In Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 330–344.Google Scholar
  73. Schapire, R. (1989). On the strength of weak learnability. In Proceedings of the 30th IEEE Symposium on Foundations of Computer Science, IEEE Computer Society Press, Washington, D.C., 28–33.CrossRefGoogle Scholar
  74. Shackelford, G. & Volper, D. (1988). Learning k-dnf with noise in the attributes. In Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 97–105.Google Scholar
  75. Shvaytser, H. (1990). A necessary condition for learning from positive examples. Machine Learning, 5:101–113.Google Scholar
  76. Sloan, R. (1988). Types of noise for concept learning. In Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 91–96.Google Scholar
  77. Valiant, L. (1984). A theory of the learnable. Comm. ACM, 27(11):1134–1142.MATHCrossRefGoogle Scholar
  78. Valiant, L. (1985). Learning disjunctions of conjunctions. In Proc. 9th Int. Joint Conf on Artificial Intelligence, 560–566, Los Altos, CA. Morgan Kaufmann.Google Scholar
  79. Valiant, L. (1988). Functionality in neural nets. In Proc. Amer. Assoc. for Artificial Intelligence, 629–634, San Mateo, CA. Morgan Kaufmann.Google Scholar
  80. Vapnik, V. (1982). Estimation of dependencies based on Empirical Data. Springer-Verlag.Google Scholar
  81. Vapnik, V. & Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theor. Probability and Appl, 16(2):264–280.CrossRefMATHMathSciNetGoogle Scholar
  82. Vitter, J. & Lin, J.-H. (1988). Learning in parallel. In Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 106–124.Google Scholar
  83. Warmuth, M. (1989). Toward representation independence in pac learning. In Jantke, K., (editor), Analogical and Inductive Inference, vol 397, Lecture Notes in Computer Science, 78–103. Springer-Verlag.Google Scholar

Copyright information

© Kluwer Academic Publishers 1993

Authors and Affiliations

  • Leslie G. Valiant
    • 1
  1. 1.Harvard University and NEC Research InstituteUSA

Personalised recommendations