Skip to main content

When are k-nearest neighbor and back propagation accurate for feasible sized sets of examples?

  • Part I Invited Papers
  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 412))

Abstract

We first review in pedagogical fashion previous results which gave lower and upper bounds on the number of examples needed for training feedforward neural networks when valid generalization is desired. Experimental tests of generalization versus number of examples are then presented for random target networks and examples drawn from a uniform distribution. The experimental results are roughly consistent with the following heuristic: if a database of M examples is loaded onto a W weight net (for MW), one expects to make a fraction ɛ=W/M errors in classifying future examples drawn from the same distribution. This is consistent with our previous bounds, but if reliable strengthens them in that: (1) the bounds had large numerical constants and log factors, all of which are set equal one in the heuristic, (2) previous lower bounds on number of examples needed were valid only in a distribution independent context, whereas the experiments were conducted for a uniform distribution, and (3) the previous lower bound was valid for nets with one hidden layer only. These experiments also seem to indicate that networks with two hidden layers have Vapnik-Chervonenkis dimension roughly equal to their total number of weights.

We then consider the convergence of the k-nearest neighbor algorithm to a classifier making a fraction ɛ of errors when examples are drawn from the uniform distribution on S n, the unit sphere in n dimensions, and classified according to a simple target function. We prove that if the target function is a single half space, then for k appropriately chosen (kn/ɛ 2 ln−1)), k nearest neighbor yields an ɛ accurate classifier using a database of M=O(n2 ln−1)) classified examples. However, when the target function is a union of two half spaces, k nearest neighbor requires a number of examples exponential in n to achieve high accuracy.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • ABU-MOSTAFA, Y.S., PSALTIS, D., (1987) Optical neural computers, Scientific American v 256, no. 3, pp88–95.

    Google Scholar 

  • ANGLUIN, D., VALIANT, L.G., (1979) Fast probabilistic algorithms for Hamiltonian circuits and matchings, Jour. of Computer and Systems Sciences, v18, pp155–193.

    Google Scholar 

  • BAUM, E. B., (1988), On the capabilities of multilayer perceptrons, Journal of Complexity 4, pp193–215.

    Google Scholar 

  • BAUM, E. B., (1989a), On learning a union of half spaces, Journal of Complexity v5 no. 4.

    Google Scholar 

  • BAUM, E. B., (1989b), The perceptron algorithm is fast for non-malicious distributions, submitted for publication.

    Google Scholar 

  • BAUM, E. B., (1989c), A proposal for more powerful learning algorithms, Neural Computation v1 no 2.

    Google Scholar 

  • BAUM, E. B., HAUSSLER, D., (1989), What size net gives valid generalization?, Neural Computation 1 pp151–160.

    Google Scholar 

  • BLUM, A., RIVEST, R. L., (1988), Training a 3-node neural network is NP-complete, pp494–501 in Advances in neural information processing systems 1, ed. D. S. Touretzky, Morgan Kaufmann, San Mateo CA.

    Google Scholar 

  • BLUMER, A., EHRENFEUCHT, A., HAUSSLER, D., WARMUTH, M., (1987), Learnability and the Vapnik-Chervonenkis dimension, University of California-Santa Cruz Technical Report UCSC-CRL-87-20, and J. ACM, to appear.

    Google Scholar 

  • COVER, T. M., (1965), Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, IEEE Trans. Elec. Comput. EC-14, pp326–334.

    Google Scholar 

  • COVER, T.M., (1968), Rates of convergence of nearest neighbor decision procedures, Proc. First Annal Hawaii Conference on Systems Theory, pp 413–415.

    Google Scholar 

  • COVER, T. M., HART, P.E., (1967), Nearest neighbor pattern classification, IEEE Trans. Info. Theory, IT-13, pp21–27.

    Google Scholar 

  • DENKER, J.S., GARDNER, W.R., GRAF, H. P., HENDERSON, D., HOWARD, R. E., HUBBARD, W., JACKEL, L. D., BAIRD, H. S., GUYON, I., (1988), Neural network re4cognizer for hand-written zip code digits, in Neural Information Processing Systems 1, ed. D. Touretzky, Morgan Kaufmann Inc., San Mateo CA., pp323–331.

    Google Scholar 

  • DUDA, R.O., HART, P.E., (1973) Pattern Classification and Scene Analysis, John Wiley and Sons, NY.

    Google Scholar 

  • EHRENFEUCHT, A., HAUSSLER, D., KEARNS, M., VALIANT, L., (1988), A general lower bound on the number of examples needed for learning, pp 139 to 154 in Proceedings of the 1988 workshop on computational learning theory, eds. D. Haussler and L. Pitt, Morgan Kauffman, San Mateo CA.

    Google Scholar 

  • FRIEDMAN, J.H., BENTLEY, J.L., FINKEL, R.A., (1977), An algorithm for finding best matches in logarithmic expected time, ACM Trans. on Mathematical Software, V.3, no. 3, pp200–226.

    Google Scholar 

  • HAUSSLER, D., (1989), Generalizing the PAC model for neural nets and other learning applications, University of California Santa Cruz Technical Report UCSC-CRL-89-30.

    Google Scholar 

  • JUDD, S., (1988) On the complexity of loading shallow networks, J. of Complexity v4 pp177–192.

    Google Scholar 

  • PITT, L., VALIANT, L. G., (1986), Computational limits on learning from examples, Harvard University Tech report TR-05-86.

    Google Scholar 

  • RIDGEWAY, W. C. III, (1962), An adaptive logic system with generalizing properties, Tech report 1556-1, Solid State Electronics Lab, Stanford University.

    Google Scholar 

  • RIVEST, R., HAUSSLER, D., WARMUTH, M.K., (1989), Proceedings of the second annual workshop on Computational Learning Theory, Morgan Kauffman, San Mateo CA.

    Google Scholar 

  • VALIANT, L.G., (1984), A theory of the learnable, Comm. ACM V27, no. 11, pp1134–1142.

    Google Scholar 

  • WALTZ, D. L. (1988), The prospects for building truly intelligent machines, Daedalus, issued as V117, no. 1 of Proc. National Academy of Arts and Sciences, pp 191–212.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Luis B. Almeida Christian J. Wellekens

Rights and permissions

Reprints and permissions

Copyright information

© 1990 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Baum, E.B. (1990). When are k-nearest neighbor and back propagation accurate for feasible sized sets of examples?. In: Almeida, L.B., Wellekens, C.J. (eds) Neural Networks. EURASIP 1990. Lecture Notes in Computer Science, vol 412. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-52255-7_24

Download citation

  • DOI: https://doi.org/10.1007/3-540-52255-7_24

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-52255-3

  • Online ISBN: 978-3-540-46939-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics