# Speculate-correct error bounds for *k*-nearest neighbor classifiers

Article

First Online:

- 188 Downloads

## Abstract

We introduce the speculate-correct method to derive error bounds for local classifiers. Using it, we show that *k*-nearest neighbor classifiers, in spite of their famously fractured decision boundaries, have exponential error bounds with \(\hbox {O} \left( \sqrt{(k + \ln n)/n} \right) \) range around an estimate of generalization error for *n* in-sample examples.

## Keywords

Nearest neighbors Error bounds Generalization## Notes

### Acknowledgements

We thank the anonymous referees for their detailed and extremely helpful corrections on the main results and advice on testing and presentation.

## References

- Audibert, J. -Y. (2004).
*PAC-Bayesian Statistical Learning Theory*. Ph.D. thesis, Laboratoire de Probabilities et Modeles Aleatoires, Universites Paris 6 and Paris 7. http://cermis.enpc.fr/~audibert/ThesePack.zip. - Bax, E. (2008). Nearly uniform validation improves compression-based error bounds.
*Journal of Machine Learning Research*,*9*, 1741–1755.MathSciNetzbMATHGoogle Scholar - Bax, E. (2012). Validation of \(k\)-nearest neighbor classifiers.
*IEEE Transactions on Information Theory*,*58*(5), 3225–3234.Google Scholar - Bax, E., & Callejas, A. (2008). An error bound based on a worst likely assignment.
*Journal of Machine Learning Research*,*9*, 581–613.MathSciNetzbMATHGoogle Scholar - Bax, E., Li, J., Sonmez, A., & Cataltepe, Z. (2013). Validating collective classification using cohorts. In
*NIPS workshop on frontiers of network analysis: methods, models, and applications*.Google Scholar - Bax, E., & Kooti, F. (2016). Ensemble validation: Selectivity has a price, but variety is free (pg. 3, Inequalities 8 and 9).
*Baylearn 2016*. https://arxiv.org/pdf/1610.01234.pdf. - Bennett, G. (1962). Probability inequalities for the sum of independent random variables.
*Journal of the American Statistical Association*,*57*(297), 33–45.CrossRefGoogle Scholar - Blum, A., & Langford, J. (2003) PAC-MDL bounds. In
*Proceedings of the 16th annual conference on computational learning theory (COLT)*(pp. 344–357).Google Scholar - Chvátal, V. (1979). The tail of the hypergeometric distribution.
*Discrete Mathematics*,*25*(3), 285–287.MathSciNetCrossRefGoogle Scholar - Cristianini, N., & Shawe-Taylor, J. (2000).
*An introduction to support vector machines and other Kernel-based learning methods*. Cambridge: Cambridge University Press.CrossRefGoogle Scholar - Devroye, L., & Wagner, T. (1979). Distribution-free inequalities for the deleted and holdout estimates.
*IEEE Transactions on Information Theory*,*25*, 202–207.MathSciNetCrossRefGoogle Scholar - Devroye, L., Györfi, L., & Lugosi, G. (1996).
*A probabilistic theory of pattern recognition*. Berlin: Springer.CrossRefGoogle Scholar - Duda, R. O., Hart, P. E., & Stork, D. G. (2001).
*Pattern classification*. New York: Wiley.zbMATHGoogle Scholar - Floyd, S., & Warmuth, M. (1995). Sample compression, learnability, and the Vapnik–Chervonenkis dimension.
*Machine Learning*,*21*(3), 1–36.Google Scholar - Hastie, T., Tibshirani, R., & Friedman, J. (2009).
*The elements of statistical learning: Data mining, inference, and prediction*(2nd ed.). Berlin: Springer.CrossRefGoogle Scholar - Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.
*Journal of the American Statistical Association*,*58*(301), 13–30.MathSciNetCrossRefGoogle Scholar - Joachims, T. (2002).
*Learning to classify text using support vector machines*. London: Kluwer Academic Publishers.CrossRefGoogle Scholar - Kedem, D., Tyree, S., Sha, F., Lanckriet, G. R. & Weinberger, K. Q. (2012). Non-linear metric learning. In Pereira, F., Burges, C. J. C., Bottou, L., & Weinberger, K. Q. (Eds.),
*Advances in neural information processing systems*(Vol. 25, pp. 2573–2581). Curran Associates, Inc. http://papers.nips.cc/paper/4840-non-linear-metric-learning.pdf. - Langford, J. (2005). Tutorial on practical prediction theory for classification.
*Journal of Machine Learning Research*,*6*, 273–306.MathSciNetzbMATHGoogle Scholar - Li, J., Sonmez, A., Cataltepe, Z., & Bax, E. (2012). Validation of network classifiers.
*Structural, Syntactic, and Statistical Pattern Recognition Lecture Notes in Computer Science*,*7626*, 448–457.CrossRefGoogle Scholar - Littlestone, N., & Warmuth, M. (1986).
*Relating data compression and learnability*. Unpublished manuscript, University of California, Santa Cruz.Google Scholar - London, B., Huang, B., & Getoor, L. (2012). Improved generalization bounds for large-scale structured prediction. In
*NIPS workshop on algorithmic and statistical approaches for large social networks*.Google Scholar - Macskassy, S. A., & Provost, F. (2007). Classification in networked data: A toolkit and a univariate case study.
*Journal of Machine Learning Research*,*8*, 935–983.Google Scholar - Marchand, M., & Shawe-Taylor, J. (2001). Learning with the set covering machine. In
*Proceedings of the eighteenth international conference on machine learning (ICML 2001)*(pp. 345–352).Google Scholar - Maurer, A., & Pontil, M. (2009). Empirical Bernstein bounds and sample-variance penalization. In
*22nd annual conference on learning theory (COLT)*. http://www0.cs.ucl.ac.uk/staff/M.Pontil/reading/svp-final.pdf. - Mullin, M., & Sukthankar, R. (2000). Complete cross-validation for nearest neighbor classifiers. In
*Proceedings of the seventeenth international conference on machine learning*(pp. 639–646).Google Scholar - Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., & Eliassi-Rad, T. (2008). Collective classification in network data.
*AI Magazine*,*29*(3), 93–106.CrossRefGoogle Scholar - Skala, M. (2013). Hypergeometric tail inequalities: Ending the insanity. arXiv arXiv:1311.5939v1. https://arxiv.org/abs/1311.5939v1.
- Valiant, L. G. (1984). A theory of the learnable.
*Communications of the ACM*,*27*(11), 1134–1142. https://doi.org/10.1145/1968.1972. (ISSN: 0001-0782).CrossRefzbMATHGoogle Scholar - Vapnik, V. (1998).
*Statistical learning theory*. New York: Wiley.zbMATHGoogle Scholar - Vapnik, V., & Chervonenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities.
*Theory of Probability and its Applications*,*16*, 264–280.CrossRefGoogle Scholar

## Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019