Skip to main content

Successive Reduction of Arms in Multi-Armed Bandits

  • Conference paper
  • First Online:
Book cover Research and Development in Intelligent Systems XXVIII (SGAI 2011)

Abstract

The relevance of the multi-armed bandit problem has risen in the past few years with the need for online optimization techniques in Internet systems, such as online advertisement and news article recommendation. At the same time, these applications reveal that state-of-the-art solution schemes do not scale well with the number of bandit arms. In this paper, we present two types of Successive Reduction (SR) strategies - 1) Successive Reduction Hoeffding (SRH) and 2) Successive Reduction Order Statistics (SRO). Both use an Order Statistics based Thompson Sampling method for arm selection, and then successively eliminate bandit arms from consideration based on a confidence threshold. While SRH uses Hoeffding Bounds for elimination, SRO uses the probability of an arm being superior to the currently selected arm to measure confidence. A computationally efficient scheme for pairwise calculation of the latter probability is also presented in this paper. Using SR strategies, sampling resources and arm pulls are not wasted on arms that are unlikely to be the optimal one. To demonstrate the scalability of our proposed schemes, we compare them with two state-of-the-art approaches, namely pure Thompson Sampling and UCB-Tuned. The empirical results are truly conclusive, with the performance advantage of proposed SRO scheme increasing persistently with the number of bandit arms while the SRH scheme shows similar performance as pure Thompson Sampling. We thus believe that SR algorithms will open up for improved performance in Internet based on-line optimization, and tackling of larger problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Agarwal, B.-C. Chen, and P. Elango. Explore/exploit schemes for web content optimization. In ICDM, 2009.

    Google Scholar 

  2. R. Agrawal. Sample mean based index policies with o(log n) regret for multi-armed bandit problem. Advances in Applied Probability., 27:1054–1078, November 1995.

    Article  MathSciNet  MATH  Google Scholar 

  3. P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time analysis of multi-armed bandit problem. Machine Learning, 27(2-3):235–256, 2002.

    Article  Google Scholar 

  4. D. Chakrabarti, R. Kumar, F. Radlinski, and E. Upfal. Mortal multi-armed bandits. In NIPS, 2008.

    Google Scholar 

  5. O. Chapelle and L. Li. An empirical evaluation of thompson sampling. In ICML Workshop - Online Trading of Exploration and Exploitation 2, 2011.

    Google Scholar 

  6. H. A. David. Order Statistics, volume 1 of Wiley Series in Probability and Mathematical Statistics, chapter 1, pages 1–32. John Wiley and Sons, Inc., second edition, 1981.

    Google Scholar 

  7. J. C. Gittins. Bandit processes and dynamic allocation indices. Journal of Royal Statistical Society. Series B, 41(2):148–177, 1979.

    MathSciNet  MATH  Google Scholar 

  8. T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich. Web-scale bayesian click-through rate prediction for sponsored search advertising in microsofts bing search engine. In ICML, 2010.

    Google Scholar 

  9. O.-C. Granmo. The bayesian learning automaton empirical evaluation with two-armed bernoulli bandit problems. Research and Development in Intelligent Systems XXV, pages 235–248, 2009.

    Google Scholar 

  10. O.-C. Granmo. Solving two-armed bernoulli bandit problems using a bayesian learning automaton. International Journal of Intelligent Computing and Cybernetics, 2(3):207–234, 2010.

    Article  MathSciNet  Google Scholar 

  11. A. K. Gupta and S. Nadarajah. Handbook of Beta Distribution and its applications. Marcer Dekker Inc., New York, 2004.

    MATH  Google Scholar 

  12. S. S. Gupta and S. Panchapakesan. Multiple Decision Procedures, volume 1 of Wiley Series in Probability and Mathematical Statistics, chapter 4, pages 59–93. John Wiley and Sons, Inc., second edition, 1979.

    Google Scholar 

  13. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963.

    Article  MathSciNet  MATH  Google Scholar 

  14. Jean-Yves, S. Bubeck, and R. Munos. Best arm identification in multi-armed bandits. In COLT, 2010.

    Google Scholar 

  15. T. L. Lai and H. Robbins. Asymptotically efficient adaptive bandit rules. Advances in Applied Mathemetics, 1985.

    Google Scholar 

  16. O. Maron and A. W. Moore. Hoeffding races: Accelerating model selection search for classification and function approximation. In NIPS, 1994.

    Google Scholar 

  17. B. C. May, N. Korda, A. Lee, and D. S. Leslie. Optimistic bayesian sampling in contextualbandit problems. Submitted to the Annals of Applied Probability.

    Google Scholar 

  18. T. Norheim, T. Brdland, O.-C. Granmo, and B. J. Oommen. A generic solution to multiarmed bernoulli bandit problems based on random sampling from sibling conjugate priors. In ICAART, 2010.

    Google Scholar 

  19. W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 1933.

    Google Scholar 

  20. J. Wyatt. Exploration and inference in learning from reinforcement. Ph.D. thesis, University of Edinburgh, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neha Gupta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this paper

Cite this paper

Gupta, N., Granmo, OC., Agrawala, A. (2011). Successive Reduction of Arms in Multi-Armed Bandits. In: Bramer, M., Petridis, M., Nolle, L. (eds) Research and Development in Intelligent Systems XXVIII. SGAI 2011. Springer, London. https://doi.org/10.1007/978-1-4471-2318-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2318-7_13

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-2317-0

  • Online ISBN: 978-1-4471-2318-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics