Successive Reduction of Arms in Multi-Armed Bandits

Gupta, Neha; Granmo, Ole-Christoffer; Agrawala, Ashok

doi:10.1007/978-1-4471-2318-7_13

Neha Gupta⁴,
Ole-Christoffer Granmo⁵ &
Ashok Agrawala⁴

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

720 Accesses
1 Citations

Abstract

The relevance of the multi-armed bandit problem has risen in the past few years with the need for online optimization techniques in Internet systems, such as online advertisement and news article recommendation. At the same time, these applications reveal that state-of-the-art solution schemes do not scale well with the number of bandit arms. In this paper, we present two types of Successive Reduction (SR) strategies - 1) Successive Reduction Hoeffding (SRH) and 2) Successive Reduction Order Statistics (SRO). Both use an Order Statistics based Thompson Sampling method for arm selection, and then successively eliminate bandit arms from consideration based on a confidence threshold. While SRH uses Hoeffding Bounds for elimination, SRO uses the probability of an arm being superior to the currently selected arm to measure confidence. A computationally efficient scheme for pairwise calculation of the latter probability is also presented in this paper. Using SR strategies, sampling resources and arm pulls are not wasted on arms that are unlikely to be the optimal one. To demonstrate the scalability of our proposed schemes, we compare them with two state-of-the-art approaches, namely pure Thompson Sampling and UCB-Tuned. The empirical results are truly conclusive, with the performance advantage of proposed SRO scheme increasing persistently with the number of bandit arms while the SRH scheme shows similar performance as pure Thompson Sampling. We thus believe that SR algorithms will open up for improved performance in Internet based on-line optimization, and tackling of larger problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. Agarwal, B.-C. Chen, and P. Elango. Explore/exploit schemes for web content optimization. In ICDM, 2009.
Google Scholar
R. Agrawal. Sample mean based index policies with o(log n) regret for multi-armed bandit problem. Advances in Applied Probability., 27:1054–1078, November 1995.
Article MathSciNet MATH Google Scholar
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time analysis of multi-armed bandit problem. Machine Learning, 27(2-3):235–256, 2002.
Article Google Scholar
D. Chakrabarti, R. Kumar, F. Radlinski, and E. Upfal. Mortal multi-armed bandits. In NIPS, 2008.
Google Scholar
O. Chapelle and L. Li. An empirical evaluation of thompson sampling. In ICML Workshop - Online Trading of Exploration and Exploitation 2, 2011.
Google Scholar
H. A. David. Order Statistics, volume 1 of Wiley Series in Probability and Mathematical Statistics, chapter 1, pages 1–32. John Wiley and Sons, Inc., second edition, 1981.
Google Scholar
J. C. Gittins. Bandit processes and dynamic allocation indices. Journal of Royal Statistical Society. Series B, 41(2):148–177, 1979.
MathSciNet MATH Google Scholar
T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich. Web-scale bayesian click-through rate prediction for sponsored search advertising in microsofts bing search engine. In ICML, 2010.
Google Scholar
O.-C. Granmo. The bayesian learning automaton empirical evaluation with two-armed bernoulli bandit problems. Research and Development in Intelligent Systems XXV, pages 235–248, 2009.
Google Scholar
O.-C. Granmo. Solving two-armed bernoulli bandit problems using a bayesian learning automaton. International Journal of Intelligent Computing and Cybernetics, 2(3):207–234, 2010.
Article MathSciNet Google Scholar
A. K. Gupta and S. Nadarajah. Handbook of Beta Distribution and its applications. Marcer Dekker Inc., New York, 2004.
MATH Google Scholar
S. S. Gupta and S. Panchapakesan. Multiple Decision Procedures, volume 1 of Wiley Series in Probability and Mathematical Statistics, chapter 4, pages 59–93. John Wiley and Sons, Inc., second edition, 1979.
Google Scholar
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963.
Article MathSciNet MATH Google Scholar
Jean-Yves, S. Bubeck, and R. Munos. Best arm identification in multi-armed bandits. In COLT, 2010.
Google Scholar
T. L. Lai and H. Robbins. Asymptotically efficient adaptive bandit rules. Advances in Applied Mathemetics, 1985.
Google Scholar
O. Maron and A. W. Moore. Hoeffding races: Accelerating model selection search for classification and function approximation. In NIPS, 1994.
Google Scholar
B. C. May, N. Korda, A. Lee, and D. S. Leslie. Optimistic bayesian sampling in contextualbandit problems. Submitted to the Annals of Applied Probability.
Google Scholar
T. Norheim, T. Brdland, O.-C. Granmo, and B. J. Oommen. A generic solution to multiarmed bernoulli bandit problems based on random sampling from sibling conjugate priors. In ICAART, 2010.
Google Scholar
W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 1933.
Google Scholar
J. Wyatt. Exploration and inference in learning from reinforcement. Ph.D. thesis, University of Edinburgh, 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Maryland, College Park, USA
Neha Gupta & Ashok Agrawala
University of Agder, Agder, Norway
Ole-Christoffer Granmo

Authors

Neha Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Ole-Christoffer Granmo
View author publications
You can also search for this author in PubMed Google Scholar
Ashok Agrawala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neha Gupta .

Editor information

Editors and Affiliations

University of Portsmouth, Lion Terrace, Portsmouth, PO1 3HE, United Kingdom
Max Bramer
School of Computing &, Mathematical Sciences, University of Greenwich, Park Row 30, London, SE10 9LS, United Kingdom
Miltos Petridis
, School of Computing and Informatics, Nottingham Trent University, Burton Street, Nottingham, NG1 4BU, United Kingdom
Lars Nolle

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, N., Granmo, OC., Agrawala, A. (2011). Successive Reduction of Arms in Multi-Armed Bandits. In: Bramer, M., Petridis, M., Nolle, L. (eds) Research and Development in Intelligent Systems XXVIII. SGAI 2011. Springer, London. https://doi.org/10.1007/978-1-4471-2318-7_13

Download citation

DOI: https://doi.org/10.1007/978-1-4471-2318-7_13
Published: 14 October 2011
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2317-0
Online ISBN: 978-1-4471-2318-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics