Bandit Problems

Bergemann, Dirk; Välimäki, Juuso

doi:10.1057/978-1-349-95189-5_2386

Bandit Problems

Dirk Bergemann¹ &
Juuso Välimäki¹

Reference work entry
First Online: 01 January 2018

125 Accesses

Abstract

The multi-armed bandit problem is a statistical decision model of an agent trying to optimize his decisions while improving his information at the same time. This classic problem has received much attention in economics as it concisely models the tradeoff between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 6,499.99; Price excludes VAT (USA)

Hardcover Book: USD 8,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Bibliography

Banks, J., and R. Sundaram. 1992. Denumerable-armed bandits. Econometrica 60: 1071–1096.
Article Google Scholar
Banks, J., and R. Sundaram. 1994. Switching costs and the Gittins index. Econometrica 62: 687–694.
Article Google Scholar
Bergemann, D., and U. Hege. 1998. Dynamic venture capital financing, learning and moral hazard. Journal of Banking and Finance 22: 703–735.
Article Google Scholar
Bergemann, D., and U. Hege. 2005. The financing of innovation: Learning and stopping. RAND Journal of Economics 36: 719–752.
Google Scholar
Bergemann, D., and J. Välimäki. 1996. Learning and strategic pricing. Econometrica 64: 1125–1149.
Article Google Scholar
Bergemann, D., and J. Välimäki. 2000. Experimentation in markets. Review of Economic Studies 67: 213–234.
Article Google Scholar
Bergemann, D., and J. Välimäki. 2001. Stationary multi choice bandit problems. Journal of Economic Dynamics and Control 25: 1585–1594.
Article Google Scholar
Bergemann, D., and J. Välimäki. 2006. Dynamic price competition. Journal of Economic Theory 127: 232–263.
Article Google Scholar
Berry, D., and B. Fristedt. 1985. Bandit problems. London: Chapman and Hall.
Book Google Scholar
Bolton, P., and C. Harris. 1999. Strategic experimentation. Econometrica 67: 349–374.
Article Google Scholar
Felli, L., and C. Harris. 1996. Job matching, learning and firm-specific human capital. Journal of Political Economy 104: 838–868.
Article Google Scholar
Gittins, J. 1989. Allocation indices for multi-armed bandits. London: Wiley.
Google Scholar
Gittins, J., and D. Jones. 1974. A dynamic allocation index for the sequential allocation of experiments. In Progress in statistics, ed. J. Gani. Amsterdam: North-Holland.
Google Scholar
Hong, H., and S. Rady. 2002. Strategic trading and learning about liquidity. Journal of Financial Markets 5: 419–450.
Article Google Scholar
Jovanovic, B. 1979. Job search and the theory of turnover. Journal of Political Economy 87: 972–990.
Article Google Scholar
Karatzas, I. 1984. Gittins indices in the dynamic allocation problem for diffusion processes. Annals of Probability 12: 173–192.
Article Google Scholar
Karoui, N., and I. Karatzas. 1997. Synchronization and optimality for multi-armed bandit problems in continuous time. Computational and Applied Mathematics 16: 117–152.
Google Scholar
Keller, G., and S. Rady. 1999. Optimal experimentation in a changing environment. Review of Economic Studies 66: 475–507.
Article Google Scholar
Keller, G., S. Rady, and M. Cripps. 2005. Strategic experimentation with exponential bandits. Econometrica 73: 39–68.
Article Google Scholar
McLennan, A. 1984. Price dispersion and incomplete learning in the long run. Journal of Economic Dynamics and Control 7: 331–347.
Article Google Scholar
Miller, R. 1984. Job matching and occupational choice. Journal of Political Economy 92: 1086–1120.
Article Google Scholar
Robbins, H. 1952. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 55: 527–535.
Article Google Scholar
Roberts, K., and M. Weitzman. 1981. Funding criteria for research, development and exploration of projects. Econometrica 49: 1261–1288.
Article Google Scholar
Rothschild, M. 1974. A two-armed bandit theory of market pricing. Journal of Economic Theory 9: 185–202.
Article Google Scholar
Rustichini, A., and A. Wolinsky. 1995. Learning about variable demand in the long run. Journal of Economic Dynamics and Control 19: 1283–1292.
Article Google Scholar
Varaiya, P., J. Walrand, and C. Buyukkoc. 1985. Extensions of the multiarmed bandit problem: The discounted case. IEEE Transactions on Automatic Control AC-30: 426–439.
Article Google Scholar
Weber, R. 1992. On the Gittins index for multi-armed bandits. Annals of Applied Probability 2: 1024–1033.
Article Google Scholar
Weitzman, M. 1979. Optimal search for the best alternative. Econometrica 47: 641–654.
Article Google Scholar
Whittle, P. 1981. Arm-acquiring bandits. Annals of Probability 9: 284–292.
Article Google Scholar
Whittle, P. 1982. Optimization over time. Vol. 1. Chichester: Wiley.
Google Scholar

Download references

Author information

Authors and Affiliations

http://link.springer.com/referencework/10.1057/978-1-349-95121-5
Dirk Bergemann & Juuso Välimäki

Authors

Dirk Bergemann
View author publications
You can also search for this author in PubMed Google Scholar
Juuso Välimäki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Copyright information

About this entry

Cite this entry

Bergemann, D., Välimäki, J. (2018). Bandit Problems. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, London. https://doi.org/10.1057/978-1-349-95189-5_2386

Download citation

DOI: https://doi.org/10.1057/978-1-349-95189-5_2386
Published: 15 February 2018
Publisher Name: Palgrave Macmillan, London
Print ISBN: 978-1-349-95188-8
Online ISBN: 978-1-349-95189-5
eBook Packages: Economics and FinanceReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences

Publish with us

Policies and ethics