When are k-nearest neighbor and back propagation accurate for feasible sized sets of examples?

Baum, Eric B.

doi:10.1007/3-540-52255-7_24

When are k-nearest neighbor and back propagation accurate for feasible sized sets of examples?

Eric B. Baum¹

Part I Invited Papers
Conference paper
First Online: 01 January 2005

303 Accesses
15 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 412))

Abstract

We first review in pedagogical fashion previous results which gave lower and upper bounds on the number of examples needed for training feedforward neural networks when valid generalization is desired. Experimental tests of generalization versus number of examples are then presented for random target networks and examples drawn from a uniform distribution. The experimental results are roughly consistent with the following heuristic: if a database of M examples is loaded onto a W weight net (for M≫W), one expects to make a fraction ɛ=W/M errors in classifying future examples drawn from the same distribution. This is consistent with our previous bounds, but if reliable strengthens them in that: (1) the bounds had large numerical constants and log factors, all of which are set equal one in the heuristic, (2) previous lower bounds on number of examples needed were valid only in a distribution independent context, whereas the experiments were conducted for a uniform distribution, and (3) the previous lower bound was valid for nets with one hidden layer only. These experiments also seem to indicate that networks with two hidden layers have Vapnik-Chervonenkis dimension roughly equal to their total number of weights.

We then consider the convergence of the k-nearest neighbor algorithm to a classifier making a fraction ɛ of errors when examples are drawn from the uniform distribution on S ⁿ, the unit sphere in n dimensions, and classified according to a simple target function. We prove that if the target function is a single half space, then for k appropriately chosen (k ≈ n/ɛ ² ln(ɛ⁻¹)), k nearest neighbor yields an ɛ accurate classifier using a database of M=O(n/ɛ² _ln(ɛ⁻¹)) classified examples. However, when the target function is a union of two half spaces, k nearest neighbor requires a number of examples exponential in n to achieve high accuracy.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

ABU-MOSTAFA, Y.S., PSALTIS, D., (1987) Optical neural computers, Scientific American v 256, no. 3, pp88–95.
Google Scholar
ANGLUIN, D., VALIANT, L.G., (1979) Fast probabilistic algorithms for Hamiltonian circuits and matchings, Jour. of Computer and Systems Sciences, v18, pp155–193.
Google Scholar
BAUM, E. B., (1988), On the capabilities of multilayer perceptrons, Journal of Complexity 4, pp193–215.
Google Scholar
BAUM, E. B., (1989a), On learning a union of half spaces, Journal of Complexity v5 no. 4.
Google Scholar
BAUM, E. B., (1989b), The perceptron algorithm is fast for non-malicious distributions, submitted for publication.
Google Scholar
BAUM, E. B., (1989c), A proposal for more powerful learning algorithms, Neural Computation v1 no 2.
Google Scholar
BAUM, E. B., HAUSSLER, D., (1989), What size net gives valid generalization?, Neural Computation 1 pp151–160.
Google Scholar
BLUM, A., RIVEST, R. L., (1988), Training a 3-node neural network is NP-complete, pp494–501 in Advances in neural information processing systems 1, ed. D. S. Touretzky, Morgan Kaufmann, San Mateo CA.
Google Scholar
BLUMER, A., EHRENFEUCHT, A., HAUSSLER, D., WARMUTH, M., (1987), Learnability and the Vapnik-Chervonenkis dimension, University of California-Santa Cruz Technical Report UCSC-CRL-87-20, and J. ACM, to appear.
Google Scholar
COVER, T. M., (1965), Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, IEEE Trans. Elec. Comput. EC-14, pp326–334.
Google Scholar
COVER, T.M., (1968), Rates of convergence of nearest neighbor decision procedures, Proc. First Annal Hawaii Conference on Systems Theory, pp 413–415.
Google Scholar
COVER, T. M., HART, P.E., (1967), Nearest neighbor pattern classification, IEEE Trans. Info. Theory, IT-13, pp21–27.
Google Scholar
DENKER, J.S., GARDNER, W.R., GRAF, H. P., HENDERSON, D., HOWARD, R. E., HUBBARD, W., JACKEL, L. D., BAIRD, H. S., GUYON, I., (1988), Neural network re4cognizer for hand-written zip code digits, in Neural Information Processing Systems 1, ed. D. Touretzky, Morgan Kaufmann Inc., San Mateo CA., pp323–331.
Google Scholar
DUDA, R.O., HART, P.E., (1973) Pattern Classification and Scene Analysis, John Wiley and Sons, NY.
Google Scholar
EHRENFEUCHT, A., HAUSSLER, D., KEARNS, M., VALIANT, L., (1988), A general lower bound on the number of examples needed for learning, pp 139 to 154 in Proceedings of the 1988 workshop on computational learning theory, eds. D. Haussler and L. Pitt, Morgan Kauffman, San Mateo CA.
Google Scholar
FRIEDMAN, J.H., BENTLEY, J.L., FINKEL, R.A., (1977), An algorithm for finding best matches in logarithmic expected time, ACM Trans. on Mathematical Software, V.3, no. 3, pp200–226.
Google Scholar
HAUSSLER, D., (1989), Generalizing the PAC model for neural nets and other learning applications, University of California Santa Cruz Technical Report UCSC-CRL-89-30.
Google Scholar
JUDD, S., (1988) On the complexity of loading shallow networks, J. of Complexity v4 pp177–192.
Google Scholar
PITT, L., VALIANT, L. G., (1986), Computational limits on learning from examples, Harvard University Tech report TR-05-86.
Google Scholar
RIDGEWAY, W. C. III, (1962), An adaptive logic system with generalizing properties, Tech report 1556-1, Solid State Electronics Lab, Stanford University.
Google Scholar
RIVEST, R., HAUSSLER, D., WARMUTH, M.K., (1989), Proceedings of the second annual workshop on Computational Learning Theory, Morgan Kauffman, San Mateo CA.
Google Scholar
VALIANT, L.G., (1984), A theory of the learnable, Comm. ACM V27, no. 11, pp1134–1142.
Google Scholar
WALTZ, D. L. (1988), The prospects for building truly intelligent machines, Daedalus, issued as V117, no. 1 of Proc. National Academy of Arts and Sciences, pp 191–212.
Google Scholar

Download references

Author information

Authors and Affiliations

NEC Research Institute, 4 Independence Way, 08540, Princeton, NJ
Eric B. Baum

Authors

Eric B. Baum
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Luis B. Almeida Christian J. Wellekens

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baum, E.B. (1990). When are k-nearest neighbor and back propagation accurate for feasible sized sets of examples?. In: Almeida, L.B., Wellekens, C.J. (eds) Neural Networks. EURASIP 1990. Lecture Notes in Computer Science, vol 412. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-52255-7_24

Download citation

DOI: https://doi.org/10.1007/3-540-52255-7_24
Published: 08 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-52255-3
Online ISBN: 978-3-540-46939-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics