Advertisement

Universal Prediction

  • N. Cesa-Bianchi
Part of the International Centre for Mechanical Sciences book series (CISM, volume 434)

Abstract

Consider the problem of forecasting the elements of a data sequence y l,y 2... where the prediction \({\hat p_t}\) for the t-th element y t can only be based on the past data y s , s < t, but not on the future ones.

Keywords

Loss Function Investment Strategy Expert Advice Trading Period Static Expert 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. Auer, N. Cesa-Bianchi, and C. Gentile. Adaptive and self-confident on-line learning algorithms. Machine Learning, 2001. To appear.Google Scholar
  2. K. Azuma. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, 68: 357–367, 1967.MathSciNetCrossRefGoogle Scholar
  3. D. Blackwell. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, 6: 1–8, 1956.MathSciNetCrossRefMATHGoogle Scholar
  4. H.D. Block. The perceptron: A model for brain functioning. Review of Modern Physics, 34: 123–135, 1962.MathSciNetCrossRefMATHGoogle Scholar
  5. A. Borodin and R. El-Yaniv. Online Computation and Competitive Analysis. Cambridge University Press, 1998.Google Scholar
  6. N. Cesa-Bianchi. Analysis of two gradient-based algorithms for on-line regression. Journal of Computer and System Sciences, 59 (3): 392–411, 1999.MathSciNetCrossRefMATHGoogle Scholar
  7. N. Cesa-Bianchi, Y. Freund, D.P. Heimbold, D. Haussier, R. Schapire, and M.K. Warmuth. How to use expert advice. Journal of the ACM, 44 (3): 427–485, 1997.MathSciNetCrossRefMATHGoogle Scholar
  8. N. Cesa-Bianchi, D.P. Heimbold, and S. Panizza. On Bayes methods for on-line Boolean prediction. Algorithmica, 22 (1/2): 112–137, 1998.MathSciNetCrossRefMATHGoogle Scholar
  9. N. Cesa-Bianchi and G. Lugosi. On prediction of individual sequences. Annals of Statistics, 27 (6): 1865–1895, 1999.MathSciNetCrossRefMATHGoogle Scholar
  10. N. Cesa-Bianchi and G. Lugosi. Minimax regret under log loss for general classes of experts. Machine Learning, 3 (43): 247–264, 2001.CrossRefGoogle Scholar
  11. T.H. Chung. Approximate methods for sequential decision making using expert advice. In Proceedings of the 7th Annual ACM Workshop on Computational Learning Theory, pages 183–189. ACM Press, 1994.Google Scholar
  12. T.H. Chung. Minimax Learning in Iterated Games via Distributional Majorization. PhD thesis, Stanford University, 1994.Google Scholar
  13. T. Cover. Behaviour of sequential predictors of binary sequences. In Proceedings of the 4th Prague Conference on Information Theory, Statistical Decision Functions and Random Processes, pages 263–272. Publishing house of the Czechoslovak Academy of Sciences, 1965.Google Scholar
  14. T.M. Cover and J.A. Thomas. Elements of Information Theory. John Wiley and Sons, 1991.Google Scholar
  15. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, 2001.Google Scholar
  16. A. Dawid. Statistical theory: the prequential approach. J. Roy. Statist. Soc. A, pages 278–292, 1984.Google Scholar
  17. R.M. Dudley. Uniform Central Limit Theorems. Cambridge University Press, 2001.Google Scholar
  18. M. Feder, N. Merhav, and M. Gutman. Universal prediction of individual sequences. IEEE Trans. on Information Theory, 38: 1258–1270, 1992.MathSciNetCrossRefMATHGoogle Scholar
  19. D. Foster. Prediction in the worst-case. Annals of Statistics, 19: 1084–1090, 1991.MathSciNetCrossRefMATHGoogle Scholar
  20. D. Foster and R. Vohra. A randomized rule for selecting forecasts. Operations Research, 41: 704–707, 1993.CrossRefMATHGoogle Scholar
  21. Y. Freund. Predicting a binary sequence almost as well as the optimal biased coin. In Proceedings of the 9th Annual Conference on Computational Learning Theory, pages 89–98. ACM Press, 1996.Google Scholar
  22. Y. Freund and R. Schapire. Large margin classification using the Perceptron algorithm. Machine Learning, pages 277–296, 1999.Google Scholar
  23. Y. Freund and R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55 (1): 119–139, 1997.MathSciNetCrossRefMATHGoogle Scholar
  24. D. Fudenberg and D.K. Levine. The Theory of Learning in Games. MIT Press, 1998.Google Scholar
  25. J. Galambos. The Asymptotic Theory of Extreme Order Statistics. R.E. Kreiger, 1987. Second edition.Google Scholar
  26. C. Gentile. A new approximate maximal margin classification algorithm. Technical Report NC-TR-01–096, NeuroCOLT Working Group, 2001. An extended abstract appeared in the Proceedings of NIPS*2000.Google Scholar
  27. C. Gentile. The robustness of the p-norm algorithms. An extended abstract (co-authored with N. Littlestone) appeared in the Proceedings of the 12th Annual Conference on Computational Learning Theory, pages 1–11. ACM Press, 1999., 2001.Google Scholar
  28. E Giné. Empirical processes and applications: an overiew. Bernoulli, 2: 1–28, 1996.MathSciNetCrossRefMATHGoogle Scholar
  29. N. Merhav and M. Feder. Universal prediction. IEEE Transactions on Information Theory, 44 (6): 2124–2147, 1998.MathSciNetCrossRefMATHGoogle Scholar
  30. A.J. Grove, N. Littlestone, and D. Schuurmans General convergence results for linear discriminant updates. In Proceedings of the 10th Annual Conference on Computational Learning Theory, pages 171–183. ACM Press, 1997.Google Scholar
  31. A.J. Grove, N. Littlestone, and D. Schuurmans. General convergence results for linear discriminant updates. Machine Learning, 43 (3): 173–210, 2001.CrossRefMATHGoogle Scholar
  32. J. Hannan. Approximation to Bayes risk in repeated play. Contributions to the theory of games, 3: 97–139, 1957.Google Scholar
  33. S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68: 1127–1150, 2000.MathSciNetCrossRefMATHGoogle Scholar
  34. S. Hart and A. Mas-Colell. A general class of adaptive strategies. Journal of Economic Theory, 98 (1): 26–54, 2001.MathSciNetCrossRefMATHGoogle Scholar
  35. D. Haussler, J. Kivinen, and M.K. Warmuth. Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory, 44: 1906–1925, 1998.MathSciNetCrossRefMATHGoogle Scholar
  36. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58: 13–30, 1963.MathSciNetCrossRefMATHGoogle Scholar
  37. J. Kivinen and M.K. Warmuth. Averaging expert predictions. In Proceedings of the Fourth European Conference on Computational Learning Theory, pages 153–167. Lecture Notes in Artificial Intelligence, Vol. 1572. Springer, 1999.Google Scholar
  38. N. Littlestone. Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Machine Learning, 2 (4): 285–318, 1988.Google Scholar
  39. N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, University of California at Santa Cruz, 1989.Google Scholar
  40. N. Littlestone and M.K. Warmuth. The weighted majority algorithm. Technical Report UCSC-CRL-91–28, University of California at Santa Cruz, 1991. An extended abstract appeared in the Proceedings of the 30th Annual Symposium on the Foundations of Computer Science.Google Scholar
  41. N. Littlestone and M.K. Warmuth. The weighted majority algorithm. Information and Computation, 108: 212–261, 1994.MathSciNetCrossRefMATHGoogle Scholar
  42. A.B.J. Novikoff. On convergence proofs of perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume XII, pages 615–622. 1962.Google Scholar
  43. M. Opper and D. Haussler. Worst case prediction over sequences under log loss. In The Mathematics of Information Coding, Extraction, and Distribution. Springer Verlag, 1997.Google Scholar
  44. D. Pollard. Empirical Processes: Theory and Applications, volume 2 of NSF-CBMS Regional Conference Series in Probability and Statistics. Institute of Math. Stat. and Am. Stat. Assoc., 1990.Google Scholar
  45. J. Rissanen. Fischer information and stochastic complexity. IEEE Transactions on Information Theory, 42: 40–47, 1996.MathSciNetCrossRefMATHGoogle Scholar
  46. F Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65: 386–408, 1958.MathSciNetCrossRefGoogle Scholar
  47. A. De Santis, G. Markowski, and M.N. Wegman. Learning probabilistic prediction functions. In Proceedings of the 1st Annual Workshop on Computational Learning Theory, pages 312–328. Morgan Kaufmann, 1988.Google Scholar
  48. Y.M. Shtarkov. Universal sequential coding of single messages. Translated from: Problems in Information Transmission, 23 (3): 3–17, 1987.MathSciNetGoogle Scholar
  49. V.N. Vapnik. Statistical Learning Theory. Wiley, 1998.Google Scholar
  50. V.G. Vovk. Aggregating strategies. In Proceedings of the 3rd Annual Workshop on Computational Learning Theory, pages 372–383, 1990.Google Scholar
  51. V.G. Vovk. Universal forecasting algorithms. Information and Computation, 96 (2): 245–277, 1992.MathSciNetCrossRefMATHGoogle Scholar
  52. V.G. Vovk. A game of prediction with expert advice. Journal of Computer and System Sciences, 56 (2): 153–173, 1998.MathSciNetCrossRefMATHGoogle Scholar
  53. Q. Xie and A.R. Barron. Asymptotic minimax regret for data compression, gambling, and prediction. IEEE Transactions on Information Theory, 46: 431–445, 2000.MathSciNetCrossRefMATHGoogle Scholar
  54. K. Yamanishi. A decision-theoretic extension of stochastic complexity and its application to learning. IEEE Transactions on Information Theory, 44: 1424–1440, 1998.MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Wien 2002

Authors and Affiliations

  • N. Cesa-Bianchi
    • 1
  1. 1.Universita di MilanoMilanoItaly

Personalised recommendations