UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Auer, Peter; Ortner, Ronald

doi:10.1007/s10998-010-3055-6

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Published: 01 October 2010

Volume 61, pages 55–65, (2010)
Cite this article

Periodica Mathematica Hungarica Aims and scope Submit manuscript

Peter Auer¹ &
Ronald Ortner¹

1967 Accesses
112 Citations
Explore all metrics

Abstract

In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in K-armed bandits after T trials is bounded by const · \( \frac{{K\log (T)}} {\Delta } \), where Δ measures the distance between a suboptimal arm and the optimal arm, for the modified UCB algorithm we show an upper bound on the regret of const · \( \frac{{K\log (T\Delta ^2 )}} {\Delta } \).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Rajeev Agrawal, Sample mean based index policies with O(log n) regret for the multi-armed bandit problem, Adv. in Appl. Probab., 27 (1995), 1054–1078.
Article MATH MathSciNet Google Scholar
Jean-Yves Audibert and Sébastien Bubeck, Minimax policies for adversarial and stochastic bandits, Proceedings of the 22nd Annual Conference on Learning Theory (COLT2009), 2009, 217–226.
Jean-Yves Audibert, Rémi Munos and Csaba Szepesvári, Exploration-exploitation tradeoff using variance estimates in multi-armed bandits, Theor. Comput. Sci., 410 (2009), 1876–1902.
Article MATH Google Scholar
Peter Auer, Nicolò Cesa-Bianchi and Paul Fischer, Finite-Time Analysis of the Multi-Armed Bandit Problem, Mach. Learn., 47 (2002), 235–256.
Article MATH Google Scholar
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund and Robert E. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM J. Comput., 32 (2002), 48–77.
Article MATH MathSciNet Google Scholar
Eyal Even-Dar, Shie Mannor and Yishay Mansour, Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems, J. Mach. Learn. Res., 7 (2006), 1079–1105.
MathSciNet Google Scholar
Wassily Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58 (1963), 13–30.
Article MATH MathSciNet Google Scholar
Robert D. Kleinberg, Nearly Tight Bounds for the Continuum-Armed Bandit Problem, Advances in Neural Information Processing Systems 17, MIT Press, 2005, 697–704.
Tze Leung Lai and Herbert Robbins, Asymptotically Efficient Adaptive Allocation Rules, Adv. in Appl. Math., 6 (1985), 4–22.
Article MATH MathSciNet Google Scholar
Shie Mannor and John N. Tsitsiklis, The Sample Complexity of Exploration in the Multi-Armed Bandit Problem, J. Mach. Learn. Res., 5 (2004), 623–648.
MathSciNet Google Scholar
Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998.

Download references

Author information

Authors and Affiliations

Lehrstuhl für Informationstechnologie, Montanuniversität Leoben, Franz-Josef-Straße 18, A-8700, Leoben, Austria
Peter Auer & Ronald Ortner

Authors

Peter Auer
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Ortner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Auer.

Additional information

Dedicated to Endre Csáki and Pál Révész on the occasion of their 75th birthdays

Rights and permissions

Reprints and permissions

About this article

Cite this article

Auer, P., Ortner, R. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Period Math Hung 61, 55–65 (2010). https://doi.org/10.1007/s10998-010-3055-6

Download citation

Received: 10 March 2010
Accepted: 14 May 2010
Published: 01 October 2010
Issue Date: September 2010
DOI: https://doi.org/10.1007/s10998-010-3055-6

Mathematics subject classification numbers

Key words and phrases

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Mathematics subject classification numbers

Key words and phrases

Navigation

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Mathematics subject classification numbers

Key words and phrases

Search

Navigation