Advertisement

Machine Learning

, Volume 108, Issue 12, pp 2087–2111 | Cite as

Speculate-correct error bounds for k-nearest neighbor classifiers

  • Eric BaxEmail author
  • Lingjie Weng
  • Xu Tian
Article
  • 188 Downloads

Abstract

We introduce the speculate-correct method to derive error bounds for local classifiers. Using it, we show that k-nearest neighbor classifiers, in spite of their famously fractured decision boundaries, have exponential error bounds with \(\hbox {O} \left( \sqrt{(k + \ln n)/n} \right) \) range around an estimate of generalization error for n in-sample examples.

Keywords

Nearest neighbors Error bounds Generalization 

Notes

Acknowledgements

We thank the anonymous referees for their detailed and extremely helpful corrections on the main results and advice on testing and presentation.

References

  1. Audibert, J. -Y. (2004). PAC-Bayesian Statistical Learning Theory. Ph.D. thesis, Laboratoire de Probabilities et Modeles Aleatoires, Universites Paris 6 and Paris 7. http://cermis.enpc.fr/~audibert/ThesePack.zip.
  2. Bax, E. (2008). Nearly uniform validation improves compression-based error bounds. Journal of Machine Learning Research, 9, 1741–1755.MathSciNetzbMATHGoogle Scholar
  3. Bax, E. (2012). Validation of \(k\)-nearest neighbor classifiers. IEEE Transactions on Information Theory, 58(5), 3225–3234.Google Scholar
  4. Bax, E., & Callejas, A. (2008). An error bound based on a worst likely assignment. Journal of Machine Learning Research, 9, 581–613.MathSciNetzbMATHGoogle Scholar
  5. Bax, E., Li, J., Sonmez, A., & Cataltepe, Z. (2013). Validating collective classification using cohorts. In NIPS workshop on frontiers of network analysis: methods, models, and applications.Google Scholar
  6. Bax, E., & Kooti, F. (2016). Ensemble validation: Selectivity has a price, but variety is free (pg. 3, Inequalities 8 and 9). Baylearn 2016. https://arxiv.org/pdf/1610.01234.pdf.
  7. Bennett, G. (1962). Probability inequalities for the sum of independent random variables. Journal of the American Statistical Association, 57(297), 33–45.CrossRefGoogle Scholar
  8. Blum, A., & Langford, J. (2003) PAC-MDL bounds. In Proceedings of the 16th annual conference on computational learning theory (COLT) (pp. 344–357).Google Scholar
  9. Chvátal, V. (1979). The tail of the hypergeometric distribution. Discrete Mathematics, 25(3), 285–287.MathSciNetCrossRefGoogle Scholar
  10. Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other Kernel-based learning methods. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  11. Devroye, L., & Wagner, T. (1979). Distribution-free inequalities for the deleted and holdout estimates. IEEE Transactions on Information Theory, 25, 202–207.MathSciNetCrossRefGoogle Scholar
  12. Devroye, L., Györfi, L., & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Berlin: Springer.CrossRefGoogle Scholar
  13. Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification. New York: Wiley.zbMATHGoogle Scholar
  14. Floyd, S., & Warmuth, M. (1995). Sample compression, learnability, and the Vapnik–Chervonenkis dimension. Machine Learning, 21(3), 1–36.Google Scholar
  15. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Berlin: Springer.CrossRefGoogle Scholar
  16. Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301), 13–30.MathSciNetCrossRefGoogle Scholar
  17. Joachims, T. (2002). Learning to classify text using support vector machines. London: Kluwer Academic Publishers.CrossRefGoogle Scholar
  18. Kedem, D., Tyree, S., Sha, F., Lanckriet, G. R. & Weinberger, K. Q. (2012). Non-linear metric learning. In Pereira, F., Burges, C. J. C., Bottou, L., & Weinberger, K. Q. (Eds.), Advances in neural information processing systems (Vol. 25, pp. 2573–2581). Curran Associates, Inc. http://papers.nips.cc/paper/4840-non-linear-metric-learning.pdf.
  19. Langford, J. (2005). Tutorial on practical prediction theory for classification. Journal of Machine Learning Research, 6, 273–306.MathSciNetzbMATHGoogle Scholar
  20. Li, J., Sonmez, A., Cataltepe, Z., & Bax, E. (2012). Validation of network classifiers. Structural, Syntactic, and Statistical Pattern Recognition Lecture Notes in Computer Science, 7626, 448–457.CrossRefGoogle Scholar
  21. Littlestone, N., & Warmuth, M. (1986). Relating data compression and learnability. Unpublished manuscript, University of California, Santa Cruz.Google Scholar
  22. London, B., Huang, B., & Getoor, L. (2012). Improved generalization bounds for large-scale structured prediction. In NIPS workshop on algorithmic and statistical approaches for large social networks.Google Scholar
  23. Macskassy, S. A., & Provost, F. (2007). Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research, 8, 935–983.Google Scholar
  24. Marchand, M., & Shawe-Taylor, J. (2001). Learning with the set covering machine. In Proceedings of the eighteenth international conference on machine learning (ICML 2001) (pp. 345–352).Google Scholar
  25. Maurer, A., & Pontil, M. (2009). Empirical Bernstein bounds and sample-variance penalization. In 22nd annual conference on learning theory (COLT). http://www0.cs.ucl.ac.uk/staff/M.Pontil/reading/svp-final.pdf.
  26. Mullin, M., & Sukthankar, R. (2000). Complete cross-validation for nearest neighbor classifiers. In Proceedings of the seventeenth international conference on machine learning (pp. 639–646).Google Scholar
  27. Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., & Eliassi-Rad, T. (2008). Collective classification in network data. AI Magazine, 29(3), 93–106.CrossRefGoogle Scholar
  28. Skala, M. (2013). Hypergeometric tail inequalities: Ending the insanity. arXiv arXiv:1311.5939v1. https://arxiv.org/abs/1311.5939v1.
  29. Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.  https://doi.org/10.1145/1968.1972. (ISSN: 0001-0782).CrossRefzbMATHGoogle Scholar
  30. Vapnik, V. (1998). Statistical learning theory. New York: Wiley.zbMATHGoogle Scholar
  31. Vapnik, V., & Chervonenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16, 264–280.CrossRefGoogle Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.VerizonPlaya VistaUSA
  2. 2.LinkedInMountain ViewUSA
  3. 3.Sorin Capital ManagementStamfordUSA

Personalised recommendations