Thompson Sampling for Bayesian Bandits with Resets

Viappiani, Paolo

doi:10.1007/978-3-642-41575-3_31

Paolo Viappiani²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8176))

Included in the following conference series:

International Conference on Algorithmic Decision Theory

1317 Accesses
2 Citations

Abstract

Multi-armed bandit problems are challenging sequential decision problems that have been widely studied as they constitute a mathematical framework that abstracts many different decision problems in fields such as machine learning, logistics, industrial optimization, management of clinical trials, etc. In this paper we address a non stationary environment with expected rewards that are dynamically evolving, considering a particular type of drift, that we call resets, in which the arm qualities are re-initialized from time to time. We compare different arm selection strategies with simulations, focusing on a Bayesian method based on Thompson sampling (a simple, yet effective, technique for trading off between exploration and exploitation).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2-3), 235–256 (2002)
Article MATH Google Scholar
Bhulai, S., Koole, G.: On the value of learning for Bernoulli bandits with unknown parameters. IEEE Transactions on Automatic Control 45(11), 2135–2140 (2000)
Article MathSciNet MATH Google Scholar
Bubeck, S., Cesa-Bianchi, N.: Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning 5(1), 1–122 (2012)
Article MATH Google Scholar
Chakrabarti, D., Kumar, R., Radlinski, F., Upfal, E.: Mortal Multi-Armed Bandits. In: Advances in Neural Information Processing Systems 21. Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS 2008), pp. 273–280 (2008)
Google Scholar
Chapelle, O., Li, L.: An Empirical Evaluation of Thompson Sampling. In: Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems (NIPS 2011), pp. 2249–2257 (2011)
Google Scholar
Gittins, J., Glazebrook, K., Weber, R.: Multi-armed Bandit Allocation Indices, 2nd edn. Wiley (March 2011)
Google Scholar
Granmo, O.-C.: Solving Two-Armed Bernoulli Bandit Problems Using a Bayesian Learning Automaton. International Journal of Intelligent Computing and Cybernetics (IJICC) 3, 207–234 (2010)
Article MathSciNet MATH Google Scholar
Gupta, N., Granmo, O.-C., Agrawala, A.: Thompson Sampling for Dynamic Multi-armed Bandits. In: Fourth International Conference on Machine Learning and Applications, vol. 1, pp. 484–489 (2011)
Google Scholar
Kaufmann, E., Korda, N., Munos, R.: Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568, pp. 199–213. Springer, Heidelberg (2012)
Chapter Google Scholar
Lin, C.-T., Shiau, C.J.: Some optimal strategies for bandit problems with beta prior distributions. Ann. Inst. Stat. Math. 52(2), 397–405 (2000)
Article MathSciNet MATH Google Scholar
Macready, W.G., Wolpert, D.: Bandit problems and the exploration/exploitation tradeoff. IEEE Trans. Evolutionary Computation 2(1), 2–22 (1998)
Article Google Scholar
Ryzhov, I.O., Powell, W.B.: The value of information in multi-armed bandits with exponentially distributed rewards. Procedia CS 4, 1363–1372 (2011)
Google Scholar
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4), 285–294 (1933)
Article MATH Google Scholar
Valizadegan, H., Jin, R., Wang, S.: Learning to trade off between exploration and exploitation in multiclass bandit prediction. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2011), pp. 204–212 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

CNRS-LIP6, Univ. Pierre et Marie Curie, France
Paolo Viappiani

Authors

Paolo Viappiani
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIP 6, UPMC, 75005, Paris, France
Patrice Perny
UMONS, Faculty of Engineering, Mathematics and Operations Research, Université de Mons, 9, Rue de Houdain, 7000, Mons, Belgium
Marc Pirlot
CNRS, LAMSADE, Université Paris Dauphine, Place du Maréchal de Lattre de Tassigny, 75016, Paris, France
Alexis Tsoukiàs

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Viappiani, P. (2013). Thompson Sampling for Bayesian Bandits with Resets. In: Perny, P., Pirlot, M., Tsoukiàs, A. (eds) Algorithmic Decision Theory. ADT 2013. Lecture Notes in Computer Science(), vol 8176. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41575-3_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-41575-3_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41574-6
Online ISBN: 978-3-642-41575-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics