Advertisement

The Probably Approximately Correct (PAC) and Other Learning Models

  • David Haussler
  • Manfred Warmuth
Chapter
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 195)

Abstract

This paper surveys some recent theoretical results on the efficiency of machine learning algorithms. The main tool described is the notion of Probably Approximately Correct (PAC) learning, introduced by Valiant. We define this learning model and then look at some of the results obtained in it. We then consider some criticisms of the PAC model and the extensions proposed to address these criticisms. Finally, we look briefly at other models recently proposed in computational learning theory.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amsterdam, J. (1988). The valiant learning model: Extensions and assessment. Master’s thesis, MTT Department of Electrical Engineering and Computer Science.Google Scholar
  2. Angluin, D. (1987). Learning regular sets from queries and counterexamples. Information and Computation, 75:87–106.zbMATHCrossRefGoogle Scholar
  3. Angluin, D. (1988). Queries and concept learning. Machine Learning, 2:319–342.Google Scholar
  4. Angluin, D., Frazier, M., and Pitt, L. (1990a). Learning conjunctions of horn clauses. In 31th Annual IEEE Symposium on Foundations of Computer Science, pages 186–192.Google Scholar
  5. Angluin, D., Hellerstein, L., and Karpinski, M. (1990b). Learning read-once formulas with queries. JACM. to appear.Google Scholar
  6. Angluin, D. and Kharitonov, M. (1991). Why won’t membership queries help? In Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, pages 444–454, New Orleans. ACM.Google Scholar
  7. Angluin, D. and Laird, P. (1988). Learning from noisy examples. Machine Learning, 2(4):343–370.Google Scholar
  8. Baum, E. (1990). When are k-nearest neighbor and back propagation accurate for feasible sized sets of examples. In Snowbird conference on Neural Networks for Computing. unpublished manuscript.Google Scholar
  9. Benedek, G. M. and Itai, A. (1988). Learnability by fixed distributions. In Proc. 1988 Workshop on Comp. Learning Theory, pages 80–90, San Mateo, CA. Morgan Kaufmann.Google Scholar
  10. Bergadano, F. and Saitta, L. (1989). On the error probability of boolean concept descriptions. In Proceedings of the 1989 European Working Session on Learning, pages 25–35.Google Scholar
  11. Blum, A. and Rivest, R. L. (1988). Training a three-neuron neural net is NP-Complete. In Proceedings of the 1988 Workshop on Computational Learning Theory, pages 9–18, San Mateo, CA. published by Morgan Kaufmann.Google Scholar
  12. Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M. K. (1987). Occam’s razor. Information Processing Letters, 24:377–380.zbMATHCrossRefGoogle Scholar
  13. Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the Association for Computing Machinery, 36(4):929–965.zbMATHGoogle Scholar
  14. Buntine, W. (1990). A Theory of Learning Classification Rules. PhD thesis, University of Technology, Sydney. Forthcoming.Google Scholar
  15. Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans, on Electronic Computers, EC-14:326–334.CrossRefGoogle Scholar
  16. Duda, R. O. and Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley.Google Scholar
  17. Fulk, M. and Case, J., editors (1990). Proceedings of the 1990 Workshop on Computational Learning Theory. Morgan Kaufmann, San Mateo, CA.Google Scholar
  18. Goldreich, O., Goldwasser, S., and Micali, S. (1986). How to construct random functions. J. ACM, 33(4):792–807.CrossRefGoogle Scholar
  19. Hampson, S. E. and Volper, D. J. (1986). Linear function neurons: Structure and training. Biol. Cybern., 53:203–217.zbMATHCrossRefGoogle Scholar
  20. Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial Intelligence, 36:177–221.zbMATHCrossRefGoogle Scholar
  21. Haussler, D. (1989). Learning conjunctive concepts in structural domains. Machine Learning, 4:7–40.Google Scholar
  22. Haussler, D. (1990). Probably approximately correct learning. In Proc. of the 8th National Conference on Artificial Intelligence, pages 1101–1108. Morgan Kaufmann.Google Scholar
  23. Haussler, D. (1991). Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, to appear.Google Scholar
  24. Haussler, D., Kearns, M., Littlestone, N., and Warmuth, M. K. (1991a). Equivalence of models for polynomial learnability. Information and Computation, to appear.Google Scholar
  25. Haussler, D., Kearns, M, and Schapire, R. (1991b). Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension. In Proceedings of the Fourth Workshop on Computational Learning Theory.Google Scholar
  26. Haussler, D., Littlestone, N., and Warmuth, M. (1990). Predicting {0, l}-functions on randomly drawn points. Technical Report UCSC-CRL-90-54, University of California Santa Cruz, Computer Research Laboratory.Google Scholar
  27. Haussler, D. and Pitt, L., editors (1988). Proceedings of the 1988 Workshop on Computational Learning Theory. Morgan Kaufmann, San Mateo, CA.Google Scholar
  28. Helmbold, D. and Long, P. (1991). Tracking drifting concepts using random examples. In Proceedings of the 1991 Workshop on Computational Learning Theory, pages 13–23, San Mateo, CA. Morgan Kaufmann.Google Scholar
  29. Helmbold, D., Sloan, R., and Warmuth, M. K. (1990). Learning nested differences of intersection closed concept classes. Machine Learning, 5:165–196.Google Scholar
  30. Kearns, M. and Li, M. (1988). Learning in the presence of malicious errors. In 20th ACM Symposium on Theory of Computing, pages 267–279, Chicago.Google Scholar
  31. Kearns, M., Li, M., Pitt, L., and Valiant, L. (1987a). On the learnability of boolean formulae. In 19th ACM Symposium on Theory of Computing, pages 285–295, New York.Google Scholar
  32. Kearns, M., Li, M., Pitt, L., and Valiant, L. G. (1987b). On the learnability of Boolean formulae. In Proceedings of the 19th Annual ACM Symposium on Theory of Computing, New York. ACM.Google Scholar
  33. Kearns, M. and Schapire, R. (1990). Efficient distribution-free learning of probabilistic concepts. In 31th Annual IEEE Symposium on Foundations of Computer Science, pages 382–391.Google Scholar
  34. Kearns, M. and Valiant, L. (1989a). Cryptographic limitations on learning boolean formulae and finite automata. In 21st ACM Symposium on Theory of Computing, pages 433–444, Seattle, WA.Google Scholar
  35. Kearns, M. and Valiant, L. G. (1989b). Cryptographic limitations on learning Boolean formulae and finite automata. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing, pages 433–444, New York. ACM.Google Scholar
  36. Levin, L. A. (1987). One-way functions and pseudorandom generators. Combinatorica, 7(4):357–363.zbMATHCrossRefGoogle Scholar
  37. Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318.Google Scholar
  38. Littlestone, N. (1989a). From on-line to batch learning. In Proceedings of the 2nd Workshop on Computational Learning Theory, pages 269–284. published by Morgan Kaufmann.Google Scholar
  39. Littlestone, N. (1989b). Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, University of California Santa Cruz.Google Scholar
  40. Littlestone, N., Long, P., and Warmuth, M. (1991). On-line learning of linear functions. Technical Report UCSC-CRL-91-29, UC Santa Cruz. For an extended abstract see Proceedings of Twenty Third Annual ACM Symposium on Theory of Computing New Orlearns, Louisiana, May 1991, pages 465–475.Google Scholar
  41. Littlestone, N. and Warmuth, M. (1991). The weighted majority algorithm. Technical Report UCSC-CRL-91-28, UC Santa Cruz. A preliminary version appeared in the proceedings of the 30th Annual IEEE Symposium on Foundations of Computer Sience, October 89, pages 256–261.Google Scholar
  42. Long, P. and Warmuth, M. K. (1991). Composite geometric concepts and polynomial learnability. Information and Computation. To appear.Google Scholar
  43. Mitchell, T. (1980). The need for biases in learning generalizations. Technical Report CBM-TR-117, Rutgers University, New Brunswick, NJ.Google Scholar
  44. Natarajan, B. K. (1989). On learning sets and functions. Machine Learning, 4(1).Google Scholar
  45. Opper, M. and Haussler, D. (1991). Calculation of the learning curve of Bayes optimal classification algorithm for learning a perceptron with noise. In Computational Learning Theory: Proceedings of the Fourth Annual Workshop. Morgan Kaufmann.Google Scholar
  46. Pitt, L. (1989). Inductive inference, DFAs, and computational complexity. Technical Report UIUCDCS-R-89-1530, U. Illinois at Urbana-Champaign.Google Scholar
  47. Pitt, L. and Valiant, L. (1988). Computational limitations on learning from examples. J.ACM, 35(4):965–984.zbMATHCrossRefGoogle Scholar
  48. Pitt, L. and Warmuth, M. K. (1990). Prediction preserving reducibility. J. Comp. Sys. Sci., 41(3):430–467. Special issue of the for the Third Annual Conference of Structure in Complexity Theory (Washington, DC, June 88).zbMATHCrossRefGoogle Scholar
  49. Rivest, R., Haussler, D., and Warmuth, M., editors (1989). Proceedings of the 1989 Workshop on Computational Learning Theory. Morgan Kaufmann, San Mateo, CA.Google Scholar
  50. Rivest, R. L. (1987). Learning decision lists. Machine Learning, 2:229–246.Google Scholar
  51. Rumelhart, D. (1990). personal communication.Google Scholar
  52. Sarrett, W. and Pazzani, M. (1989). Average case analysis of empirical and explanation-based learning algorithms. Technical Report 89-35, UC Irvine, to appear in Machine Learning.Google Scholar
  53. Shawe-Taylor, J., Anthony, M., and Biggs, N. (1989). Bounding sample size with the Vapnik-Chervonenkis dimension. Technical Report CSD-TR-618, University of London, Surrey, England.Google Scholar
  54. Tesauro, G. and Cohn, D. (1991). Can neural networks do better than the Vapnik-Chervonenkis bounds? In Lippmann, R., Moody, J., and Touretzky, D., editors, Advances in Neural Information Processing, Vol. 3, pages 911–917. Morgan Kaufmann.Google Scholar
  55. Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11):1134–42.zbMATHCrossRefGoogle Scholar
  56. Valiant, L. G. (1985). Learning disjunctions of conjunctions. In Proc. 9th IJCAI, volume 1, pages 560–6, Los Angeles.Google Scholar
  57. Valiant, L. G. and Warmuth, M., editors (1991). Proceedings of the 1991 Workshop on Computational Learning Theory. Morgan Kaufmann, San Mateo, CA.Google Scholar
  58. Vapnik, V. N. (1982). Estimation of Dependences Based on Empirical Data. Springer-Verlag, New York.zbMATHGoogle Scholar

Copyright information

© Kluwer Academic Publishers 1993

Authors and Affiliations

  • David Haussler
    • 1
  • Manfred Warmuth
    • 1
  1. 1.Baskin Center for Computer Engineering and Information SciencesUniversity of CaliforniaSanta Cruz

Personalised recommendations