Abstract
Multi-armed bandit problems are challenging sequential decision problems that have been widely studied as they constitute a mathematical framework that abstracts many different decision problems in fields such as machine learning, logistics, industrial optimization, management of clinical trials, etc. In this paper we address a non stationary environment with expected rewards that are dynamically evolving, considering a particular type of drift, that we call resets, in which the arm qualities are re-initialized from time to time. We compare different arm selection strategies with simulations, focusing on a Bayesian method based on Thompson sampling (a simple, yet effective, technique for trading off between exploration and exploitation).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2-3), 235–256 (2002)
Bhulai, S., Koole, G.: On the value of learning for Bernoulli bandits with unknown parameters. IEEE Transactions on Automatic Control 45(11), 2135–2140 (2000)
Bubeck, S., Cesa-Bianchi, N.: Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning 5(1), 1–122 (2012)
Chakrabarti, D., Kumar, R., Radlinski, F., Upfal, E.: Mortal Multi-Armed Bandits. In: Advances in Neural Information Processing Systems 21. Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS 2008), pp. 273–280 (2008)
Chapelle, O., Li, L.: An Empirical Evaluation of Thompson Sampling. In: Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems (NIPS 2011), pp. 2249–2257 (2011)
Gittins, J., Glazebrook, K., Weber, R.: Multi-armed Bandit Allocation Indices, 2nd edn. Wiley (March 2011)
Granmo, O.-C.: Solving Two-Armed Bernoulli Bandit Problems Using a Bayesian Learning Automaton. International Journal of Intelligent Computing and Cybernetics (IJICC) 3, 207–234 (2010)
Gupta, N., Granmo, O.-C., Agrawala, A.: Thompson Sampling for Dynamic Multi-armed Bandits. In: Fourth International Conference on Machine Learning and Applications, vol. 1, pp. 484–489 (2011)
Kaufmann, E., Korda, N., Munos, R.: Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568, pp. 199–213. Springer, Heidelberg (2012)
Lin, C.-T., Shiau, C.J.: Some optimal strategies for bandit problems with beta prior distributions. Ann. Inst. Stat. Math. 52(2), 397–405 (2000)
Macready, W.G., Wolpert, D.: Bandit problems and the exploration/exploitation tradeoff. IEEE Trans. Evolutionary Computation 2(1), 2–22 (1998)
Ryzhov, I.O., Powell, W.B.: The value of information in multi-armed bandits with exponentially distributed rewards. Procedia CS 4, 1363–1372 (2011)
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4), 285–294 (1933)
Valizadegan, H., Jin, R., Wang, S.: Learning to trade off between exploration and exploitation in multiclass bandit prediction. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2011), pp. 204–212 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Viappiani, P. (2013). Thompson Sampling for Bayesian Bandits with Resets. In: Perny, P., Pirlot, M., Tsoukià s, A. (eds) Algorithmic Decision Theory. ADT 2013. Lecture Notes in Computer Science(), vol 8176. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41575-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-41575-3_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41574-6
Online ISBN: 978-3-642-41575-3
eBook Packages: Computer ScienceComputer Science (R0)