Skip to main content

Thompson Sampling for Bayesian Bandits with Resets

  • Conference paper
Algorithmic Decision Theory (ADT 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8176))

Included in the following conference series:

Abstract

Multi-armed bandit problems are challenging sequential decision problems that have been widely studied as they constitute a mathematical framework that abstracts many different decision problems in fields such as machine learning, logistics, industrial optimization, management of clinical trials, etc. In this paper we address a non stationary environment with expected rewards that are dynamically evolving, considering a particular type of drift, that we call resets, in which the arm qualities are re-initialized from time to time. We compare different arm selection strategies with simulations, focusing on a Bayesian method based on Thompson sampling (a simple, yet effective, technique for trading off between exploration and exploitation).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2-3), 235–256 (2002)

    Article  MATH  Google Scholar 

  2. Bhulai, S., Koole, G.: On the value of learning for Bernoulli bandits with unknown parameters. IEEE Transactions on Automatic Control 45(11), 2135–2140 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bubeck, S., Cesa-Bianchi, N.: Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning 5(1), 1–122 (2012)

    Article  MATH  Google Scholar 

  4. Chakrabarti, D., Kumar, R., Radlinski, F., Upfal, E.: Mortal Multi-Armed Bandits. In: Advances in Neural Information Processing Systems 21. Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS 2008), pp. 273–280 (2008)

    Google Scholar 

  5. Chapelle, O., Li, L.: An Empirical Evaluation of Thompson Sampling. In: Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems (NIPS 2011), pp. 2249–2257 (2011)

    Google Scholar 

  6. Gittins, J., Glazebrook, K., Weber, R.: Multi-armed Bandit Allocation Indices, 2nd edn. Wiley (March 2011)

    Google Scholar 

  7. Granmo, O.-C.: Solving Two-Armed Bernoulli Bandit Problems Using a Bayesian Learning Automaton. International Journal of Intelligent Computing and Cybernetics (IJICC) 3, 207–234 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  8. Gupta, N., Granmo, O.-C., Agrawala, A.: Thompson Sampling for Dynamic Multi-armed Bandits. In: Fourth International Conference on Machine Learning and Applications, vol. 1, pp. 484–489 (2011)

    Google Scholar 

  9. Kaufmann, E., Korda, N., Munos, R.: Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568, pp. 199–213. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  10. Lin, C.-T., Shiau, C.J.: Some optimal strategies for bandit problems with beta prior distributions. Ann. Inst. Stat. Math. 52(2), 397–405 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  11. Macready, W.G., Wolpert, D.: Bandit problems and the exploration/exploitation tradeoff. IEEE Trans. Evolutionary Computation 2(1), 2–22 (1998)

    Article  Google Scholar 

  12. Ryzhov, I.O., Powell, W.B.: The value of information in multi-armed bandits with exponentially distributed rewards. Procedia CS 4, 1363–1372 (2011)

    Google Scholar 

  13. Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4), 285–294 (1933)

    Article  MATH  Google Scholar 

  14. Valizadegan, H., Jin, R., Wang, S.: Learning to trade off between exploration and exploitation in multiclass bandit prediction. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2011), pp. 204–212 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Viappiani, P. (2013). Thompson Sampling for Bayesian Bandits with Resets. In: Perny, P., Pirlot, M., Tsoukiàs, A. (eds) Algorithmic Decision Theory. ADT 2013. Lecture Notes in Computer Science(), vol 8176. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41575-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41575-3_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41574-6

  • Online ISBN: 978-3-642-41575-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics