Abstract
The multi-armed bandit problem is a statistical decision model of an agent trying to optimize his decisions while improving his information at the same time. This classic problem has received much attention in economics as it concisely models the tradeoff between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsBibliography
Banks, J., and R. Sundaram. 1992. Denumerable-armed bandits. Econometrica 60: 1071–1096.
Banks, J., and R. Sundaram. 1994. Switching costs and the Gittins index. Econometrica 62: 687–694.
Bergemann, D., and U. Hege. 1998. Dynamic venture capital financing, learning and moral hazard. Journal of Banking and Finance 22: 703–735.
Bergemann, D., and U. Hege. 2005. The financing of innovation: Learning and stopping. RAND Journal of Economics 36: 719–752.
Bergemann, D., and J. Välimäki. 1996. Learning and strategic pricing. Econometrica 64: 1125–1149.
Bergemann, D., and J. Välimäki. 2000. Experimentation in markets. Review of Economic Studies 67: 213–234.
Bergemann, D., and J. Välimäki. 2001. Stationary multi choice bandit problems. Journal of Economic Dynamics and Control 25: 1585–1594.
Bergemann, D., and J. Välimäki. 2006. Dynamic price competition. Journal of Economic Theory 127: 232–263.
Berry, D., and B. Fristedt. 1985. Bandit problems. London: Chapman and Hall.
Bolton, P., and C. Harris. 1999. Strategic experimentation. Econometrica 67: 349–374.
Felli, L., and C. Harris. 1996. Job matching, learning and firm-specific human capital. Journal of Political Economy 104: 838–868.
Gittins, J. 1989. Allocation indices for multi-armed bandits. London: Wiley.
Gittins, J., and D. Jones. 1974. A dynamic allocation index for the sequential allocation of experiments. In Progress in statistics, ed. J. Gani. Amsterdam: North-Holland.
Hong, H., and S. Rady. 2002. Strategic trading and learning about liquidity. Journal of Financial Markets 5: 419–450.
Jovanovic, B. 1979. Job search and the theory of turnover. Journal of Political Economy 87: 972–990.
Karatzas, I. 1984. Gittins indices in the dynamic allocation problem for diffusion processes. Annals of Probability 12: 173–192.
Karoui, N., and I. Karatzas. 1997. Synchronization and optimality for multi-armed bandit problems in continuous time. Computational and Applied Mathematics 16: 117–152.
Keller, G., and S. Rady. 1999. Optimal experimentation in a changing environment. Review of Economic Studies 66: 475–507.
Keller, G., S. Rady, and M. Cripps. 2005. Strategic experimentation with exponential bandits. Econometrica 73: 39–68.
McLennan, A. 1984. Price dispersion and incomplete learning in the long run. Journal of Economic Dynamics and Control 7: 331–347.
Miller, R. 1984. Job matching and occupational choice. Journal of Political Economy 92: 1086–1120.
Robbins, H. 1952. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 55: 527–535.
Roberts, K., and M. Weitzman. 1981. Funding criteria for research, development and exploration of projects. Econometrica 49: 1261–1288.
Rothschild, M. 1974. A two-armed bandit theory of market pricing. Journal of Economic Theory 9: 185–202.
Rustichini, A., and A. Wolinsky. 1995. Learning about variable demand in the long run. Journal of Economic Dynamics and Control 19: 1283–1292.
Varaiya, P., J. Walrand, and C. Buyukkoc. 1985. Extensions of the multiarmed bandit problem: The discounted case. IEEE Transactions on Automatic Control AC-30: 426–439.
Weber, R. 1992. On the Gittins index for multi-armed bandits. Annals of Applied Probability 2: 1024–1033.
Weitzman, M. 1979. Optimal search for the best alternative. Econometrica 47: 641–654.
Whittle, P. 1981. Arm-acquiring bandits. Annals of Probability 9: 284–292.
Whittle, P. 1982. Optimization over time. Vol. 1. Chichester: Wiley.
Author information
Authors and Affiliations
Editor information
Copyright information
© 2018 Macmillan Publishers Ltd.
About this entry
Cite this entry
Bergemann, D., Välimäki, J. (2018). Bandit Problems. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, London. https://doi.org/10.1057/978-1-349-95189-5_2386
Download citation
DOI: https://doi.org/10.1057/978-1-349-95189-5_2386
Published:
Publisher Name: Palgrave Macmillan, London
Print ISBN: 978-1-349-95188-8
Online ISBN: 978-1-349-95189-5
eBook Packages: Economics and FinanceReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences