Foundations of Knowledge Acquisition pp 291-312 | Cite as
The Probably Approximately Correct (PAC) and Other Learning Models
Chapter
- 3 Citations
- 275 Downloads
Abstract
This paper surveys some recent theoretical results on the efficiency of machine learning algorithms. The main tool described is the notion of Probably Approximately Correct (PAC) learning, introduced by Valiant. We define this learning model and then look at some of the results obtained in it. We then consider some criticisms of the PAC model and the extensions proposed to address these criticisms. Finally, we look briefly at other models recently proposed in computational learning theory.
Preview
Unable to display preview. Download preview PDF.
References
- Amsterdam, J. (1988). The valiant learning model: Extensions and assessment. Master’s thesis, MTT Department of Electrical Engineering and Computer Science.Google Scholar
- Angluin, D. (1987). Learning regular sets from queries and counterexamples. Information and Computation, 75:87–106.zbMATHCrossRefGoogle Scholar
- Angluin, D. (1988). Queries and concept learning. Machine Learning, 2:319–342.Google Scholar
- Angluin, D., Frazier, M., and Pitt, L. (1990a). Learning conjunctions of horn clauses. In 31th Annual IEEE Symposium on Foundations of Computer Science, pages 186–192.Google Scholar
- Angluin, D., Hellerstein, L., and Karpinski, M. (1990b). Learning read-once formulas with queries. JACM. to appear.Google Scholar
- Angluin, D. and Kharitonov, M. (1991). Why won’t membership queries help? In Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, pages 444–454, New Orleans. ACM.Google Scholar
- Angluin, D. and Laird, P. (1988). Learning from noisy examples. Machine Learning, 2(4):343–370.Google Scholar
- Baum, E. (1990). When are k-nearest neighbor and back propagation accurate for feasible sized sets of examples. In Snowbird conference on Neural Networks for Computing. unpublished manuscript.Google Scholar
- Benedek, G. M. and Itai, A. (1988). Learnability by fixed distributions. In Proc. 1988 Workshop on Comp. Learning Theory, pages 80–90, San Mateo, CA. Morgan Kaufmann.Google Scholar
- Bergadano, F. and Saitta, L. (1989). On the error probability of boolean concept descriptions. In Proceedings of the 1989 European Working Session on Learning, pages 25–35.Google Scholar
- Blum, A. and Rivest, R. L. (1988). Training a three-neuron neural net is NP-Complete. In Proceedings of the 1988 Workshop on Computational Learning Theory, pages 9–18, San Mateo, CA. published by Morgan Kaufmann.Google Scholar
- Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M. K. (1987). Occam’s razor. Information Processing Letters, 24:377–380.zbMATHCrossRefGoogle Scholar
- Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the Association for Computing Machinery, 36(4):929–965.zbMATHGoogle Scholar
- Buntine, W. (1990). A Theory of Learning Classification Rules. PhD thesis, University of Technology, Sydney. Forthcoming.Google Scholar
- Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans, on Electronic Computers, EC-14:326–334.CrossRefGoogle Scholar
- Duda, R. O. and Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley.Google Scholar
- Fulk, M. and Case, J., editors (1990). Proceedings of the 1990 Workshop on Computational Learning Theory. Morgan Kaufmann, San Mateo, CA.Google Scholar
- Goldreich, O., Goldwasser, S., and Micali, S. (1986). How to construct random functions. J. ACM, 33(4):792–807.CrossRefGoogle Scholar
- Hampson, S. E. and Volper, D. J. (1986). Linear function neurons: Structure and training. Biol. Cybern., 53:203–217.zbMATHCrossRefGoogle Scholar
- Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial Intelligence, 36:177–221.zbMATHCrossRefGoogle Scholar
- Haussler, D. (1989). Learning conjunctive concepts in structural domains. Machine Learning, 4:7–40.Google Scholar
- Haussler, D. (1990). Probably approximately correct learning. In Proc. of the 8th National Conference on Artificial Intelligence, pages 1101–1108. Morgan Kaufmann.Google Scholar
- Haussler, D. (1991). Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, to appear.Google Scholar
- Haussler, D., Kearns, M., Littlestone, N., and Warmuth, M. K. (1991a). Equivalence of models for polynomial learnability. Information and Computation, to appear.Google Scholar
- Haussler, D., Kearns, M, and Schapire, R. (1991b). Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension. In Proceedings of the Fourth Workshop on Computational Learning Theory.Google Scholar
- Haussler, D., Littlestone, N., and Warmuth, M. (1990). Predicting {0, l}-functions on randomly drawn points. Technical Report UCSC-CRL-90-54, University of California Santa Cruz, Computer Research Laboratory.Google Scholar
- Haussler, D. and Pitt, L., editors (1988). Proceedings of the 1988 Workshop on Computational Learning Theory. Morgan Kaufmann, San Mateo, CA.Google Scholar
- Helmbold, D. and Long, P. (1991). Tracking drifting concepts using random examples. In Proceedings of the 1991 Workshop on Computational Learning Theory, pages 13–23, San Mateo, CA. Morgan Kaufmann.Google Scholar
- Helmbold, D., Sloan, R., and Warmuth, M. K. (1990). Learning nested differences of intersection closed concept classes. Machine Learning, 5:165–196.Google Scholar
- Kearns, M. and Li, M. (1988). Learning in the presence of malicious errors. In 20th ACM Symposium on Theory of Computing, pages 267–279, Chicago.Google Scholar
- Kearns, M., Li, M., Pitt, L., and Valiant, L. (1987a). On the learnability of boolean formulae. In 19th ACM Symposium on Theory of Computing, pages 285–295, New York.Google Scholar
- Kearns, M., Li, M., Pitt, L., and Valiant, L. G. (1987b). On the learnability of Boolean formulae. In Proceedings of the 19th Annual ACM Symposium on Theory of Computing, New York. ACM.Google Scholar
- Kearns, M. and Schapire, R. (1990). Efficient distribution-free learning of probabilistic concepts. In 31th Annual IEEE Symposium on Foundations of Computer Science, pages 382–391.Google Scholar
- Kearns, M. and Valiant, L. (1989a). Cryptographic limitations on learning boolean formulae and finite automata. In 21st ACM Symposium on Theory of Computing, pages 433–444, Seattle, WA.Google Scholar
- Kearns, M. and Valiant, L. G. (1989b). Cryptographic limitations on learning Boolean formulae and finite automata. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing, pages 433–444, New York. ACM.Google Scholar
- Levin, L. A. (1987). One-way functions and pseudorandom generators. Combinatorica, 7(4):357–363.zbMATHCrossRefGoogle Scholar
- Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318.Google Scholar
- Littlestone, N. (1989a). From on-line to batch learning. In Proceedings of the 2nd Workshop on Computational Learning Theory, pages 269–284. published by Morgan Kaufmann.Google Scholar
- Littlestone, N. (1989b). Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, University of California Santa Cruz.Google Scholar
- Littlestone, N., Long, P., and Warmuth, M. (1991). On-line learning of linear functions. Technical Report UCSC-CRL-91-29, UC Santa Cruz. For an extended abstract see Proceedings of Twenty Third Annual ACM Symposium on Theory of Computing New Orlearns, Louisiana, May 1991, pages 465–475.Google Scholar
- Littlestone, N. and Warmuth, M. (1991). The weighted majority algorithm. Technical Report UCSC-CRL-91-28, UC Santa Cruz. A preliminary version appeared in the proceedings of the 30th Annual IEEE Symposium on Foundations of Computer Sience, October 89, pages 256–261.Google Scholar
- Long, P. and Warmuth, M. K. (1991). Composite geometric concepts and polynomial learnability. Information and Computation. To appear.Google Scholar
- Mitchell, T. (1980). The need for biases in learning generalizations. Technical Report CBM-TR-117, Rutgers University, New Brunswick, NJ.Google Scholar
- Natarajan, B. K. (1989). On learning sets and functions. Machine Learning, 4(1).Google Scholar
- Opper, M. and Haussler, D. (1991). Calculation of the learning curve of Bayes optimal classification algorithm for learning a perceptron with noise. In Computational Learning Theory: Proceedings of the Fourth Annual Workshop. Morgan Kaufmann.Google Scholar
- Pitt, L. (1989). Inductive inference, DFAs, and computational complexity. Technical Report UIUCDCS-R-89-1530, U. Illinois at Urbana-Champaign.Google Scholar
- Pitt, L. and Valiant, L. (1988). Computational limitations on learning from examples. J.ACM, 35(4):965–984.zbMATHCrossRefGoogle Scholar
- Pitt, L. and Warmuth, M. K. (1990). Prediction preserving reducibility. J. Comp. Sys. Sci., 41(3):430–467. Special issue of the for the Third Annual Conference of Structure in Complexity Theory (Washington, DC, June 88).zbMATHCrossRefGoogle Scholar
- Rivest, R., Haussler, D., and Warmuth, M., editors (1989). Proceedings of the 1989 Workshop on Computational Learning Theory. Morgan Kaufmann, San Mateo, CA.Google Scholar
- Rivest, R. L. (1987). Learning decision lists. Machine Learning, 2:229–246.Google Scholar
- Rumelhart, D. (1990). personal communication.Google Scholar
- Sarrett, W. and Pazzani, M. (1989). Average case analysis of empirical and explanation-based learning algorithms. Technical Report 89-35, UC Irvine, to appear in Machine Learning.Google Scholar
- Shawe-Taylor, J., Anthony, M., and Biggs, N. (1989). Bounding sample size with the Vapnik-Chervonenkis dimension. Technical Report CSD-TR-618, University of London, Surrey, England.Google Scholar
- Tesauro, G. and Cohn, D. (1991). Can neural networks do better than the Vapnik-Chervonenkis bounds? In Lippmann, R., Moody, J., and Touretzky, D., editors, Advances in Neural Information Processing, Vol. 3, pages 911–917. Morgan Kaufmann.Google Scholar
- Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11):1134–42.zbMATHCrossRefGoogle Scholar
- Valiant, L. G. (1985). Learning disjunctions of conjunctions. In Proc. 9th IJCAI, volume 1, pages 560–6, Los Angeles.Google Scholar
- Valiant, L. G. and Warmuth, M., editors (1991). Proceedings of the 1991 Workshop on Computational Learning Theory. Morgan Kaufmann, San Mateo, CA.Google Scholar
- Vapnik, V. N. (1982). Estimation of Dependences Based on Empirical Data. Springer-Verlag, New York.zbMATHGoogle Scholar
Copyright information
© Kluwer Academic Publishers 1993