On-Line Regression Competitive with Reproducing Kernel Hilbert Spaces

  • Vladimir Vovk
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3959)


We consider the problem of on-line prediction of real-valued labels, assumed bounded in absolute value by a known constant, of new objects from known labeled objects. The prediction algorithm’s performance is measured by the squared deviation of the predictions from the actual labels. No stochastic assumptions are made about the way the labels and objects are generated. Instead, we are given a benchmark class of prediction rules some of which are hoped to produce good predictions. We show that for a wide range of infinite-dimensional benchmark classes one can construct a prediction algorithm whose cumulative loss over the first N examples does not exceed the cumulative loss of any prediction rule in the class plus \(O(\sqrt{N})\); the main differences from the known results are that we do not impose any upper bound on the norm of the considered prediction rules and that we achieve an optimal leading term in the excess loss of our algorithm. If the benchmark class is “universal” (dense in the class of continuous functions on each compact set), this provides an on-line non-stochastic analogue for universally consistent prediction in non-parametric statistics. We use two proof techniques: one is based on the Aggregating Algorithm and the other on the recently developed method of defensive forecasting.


Gradient Descent Prediction Algorithm Prediction Rule Reproduce Kernel Hilbert Space Cumulative Loss 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abramowitz, M., Stegun, I.A. (eds.): Handbook of Mathematical Functions: with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards Applied Mathematics Series. US Government Printing Oce, Washington, DC vol. 55 (1964) Republished many times by Dover, New York, starting from 1965Google Scholar
  2. 2.
    Adams, R.A., Fournier, J.J.F.: Sobolev Spaces, 2nd edn. Pure and Applied Mathematics, vol. 140. Academic Press, Amsterdam (2003)zbMATHGoogle Scholar
  3. 3.
    Aronszajn, N.: Theory of reproducing kernels. Transactions of the American Mathematical Society 68, 337–404 (1950)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Auer, P., Cesa-Bianchi, N., Gentile, C.: Adaptive and self-condent on-line learning algorithms. Journal of Computer and System Sciences 64, 48–75 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Cesa-Bianchi, N., Long, P.M., Warmuth, M.K.: Worst-case quadratic loss bounds for on-line prediction of linear functions by gradient descent. IEEE Transactions on Neural Networks 7, 604–619 (1996)CrossRefGoogle Scholar
  6. 6.
    Devroye, L., Gyȯrfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Applications of Mathematics, vol. 31. Springer, New York (1996)Google Scholar
  7. 7.
    Dudley, R.M.: Real Analysis and Probability Cambridge University Press, Cambridge (2002); Originally published in 1989Google Scholar
  8. 8.
    Engelking, R.: General Topology, 2nd edn. Sigma Series in Pure Mathematics, vol. 6. Heldermann, Berlin (1989); First edition: 1977 (Państwowe Wydawnictwo Naukowe, Warsaw)Google Scholar
  9. 9.
    Foster, D.P.: Prediction in the worst case. Annals of Statistics 19, 1084–1090 (1991)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Foster, D.P., Vohra, R.V.: Asymptotic calibration. Biometrika 85, 379–390 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Fudenberg, D., Levine, D.K.: Conditional universal consistency. Games and Economic Behavior 29, 104–130 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Gács, P.: Uniform test of algorithmic randomness over a general space. Theoretical Computer Science 341, 91–137 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Gammerman, A., Kalnishkan, Y., Vovk, V.: On-line prediction with kernels and the Complexity Approximation Principle. In: Chickering, M., Halpern, J. (eds.) Proceedings of the Twentieth Annual Conference on Uncertainty in Artificial Intelligence, Arlington, VA pp. 170–176. AUAI Press (2004)Google Scholar
  14. 14.
    Gyȯrfi, L., Kohler, M., Krzyz̈ak, A., Walk, H.: A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics. Springer, New York (2002)Google Scholar
  15. 15.
    Haussler, D., Kivinen, J., Warmuth, M.K.: Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory 44, 1906–1925 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Kakade, S.M., Foster, D.P.: Deterministic calibration and Nash equilibrium. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 33–48. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  17. 17.
    Kimber, D., Long, P.M.: On-line learning of smooth functions of a single variable. Theoretical Computer Science 148, 141–156 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Kivinen, J., Warmuth, M.K.: Exponential Gradient versus Gradient Descent for linear predictors. Information and Computation 132, 1–63 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Lehrer, E.: Any inspection is manipulable. Econometrica 69, 1333–1347 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Levin, L.A.: Uniform tests of randomness. Soviet Mathematics Doklady 17, 337–340 (1976)zbMATHGoogle Scholar
  21. 21.
    Long, P.M.: Improved bounds about on-line learning of smooth functions of a single variable. Theoretical Computer Science 241, 25–35 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Sandroni, A.: The reproducible properties of correct forecasts. International Journal of Game Theory 32, 151–159 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Sandroni, A., Smorodinsky, R., Vohra, R.V.: Calibration with many checking rules. Mathematics of Operations Research 28, 141–153 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
  25. 25.
    Shafer, G., Vovk, V.: Probability and Finance: It’s Only a Game! Wiley, New York (2001)CrossRefGoogle Scholar
  26. 26.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)Google Scholar
  27. 27.
    Steinwart, I.: On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research 2, 67–93 (2001)CrossRefMathSciNetGoogle Scholar
  28. 28.
    Stone, C.J.: Consistent nonparametric regression (with discussion). Annals of Statistics 5, 595–645 (1977)zbMATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  30. 30.
    Vovk, V.: Aggregating strategies. In: Fulk, M., Case, J. (eds.) Proceedings of the Third Annual Workshop on Computational Learning Theory, pp. 371–383. Morgan Kaufmann, San Mateo (1990)Google Scholar
  31. 31.
    Vovk, V.: Competitive on-line statistics. International Statistical Review 69, 213–248 (2001)zbMATHCrossRefGoogle Scholar
  32. 32.
    Vovk, V.: Defensive prediction with expert advice. In: Jain, S., Simon, H.U., Tomita, E. (eds.) ALT 2005. LNCS (LNAI), vol. 3734, pp. 444–458. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  33. 33.
    Vovk, V.: Non-asymptotic calibration and resolution. In: Jain, S., Simon, H.U., Tomita, E. (eds.) ALT 2005. LNCS (LNAI), vol. 3734, pp. 429–443. Springer, Heidelberg (2005) A version of this paper can ve downbloaded from the e-Print archive, January 2006CrossRefGoogle Scholar
  34. 34.
    Vovk, V.: Competiting with wild prediction rules. Technical Report arXiv:cs.LG/0512059 (version 2), e-Print archive (January 2006)Google Scholar
  35. 35.
    Vovk, V.: On-line regression competitive with reproducing kernel Hilbert spaces. Technical Report arXiv:cs.LG/0511058 (version 2), e-Print archive (January 2006)Google Scholar
  36. 36.
    Vovk, V., Shafer, G.: Good randomized sequential probability forecasting is always possible. Journal of the Royal Statistical Society B 67, 747–763 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  37. 37.
    Vovk, V., Takemura, A., Shafer, G.: Defensive forecasting. In: Cowell, R.G., Ghahramani, Z. (eds.) Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics. Society for Artificial Intelligence and Statistics, pp. 365–372 (2005), Available electronically, at
  38. 38.
    Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the Twentieth International Conference on Machine Learning, pp. 768–775. AAAI Press, Menlo Park (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Vladimir Vovk
    • 1
  1. 1.Computer Learning Research Centre, Department of Computer Science, Royal HollowayUniversity of LondonEgham, SurreyUK

Personalised recommendations