Skip to main content

Bandit Problems

  • Reference work entry
  • First Online:
  • 125 Accesses

Abstract

The multi-armed bandit problem is a statistical decision model of an agent trying to optimize his decisions while improving his information at the same time. This classic problem has received much attention in economics as it concisely models the tradeoff between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   6,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   8,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Bibliography

  • Banks, J., and R. Sundaram. 1992. Denumerable-armed bandits. Econometrica 60: 1071–1096.

    Article  Google Scholar 

  • Banks, J., and R. Sundaram. 1994. Switching costs and the Gittins index. Econometrica 62: 687–694.

    Article  Google Scholar 

  • Bergemann, D., and U. Hege. 1998. Dynamic venture capital financing, learning and moral hazard. Journal of Banking and Finance 22: 703–735.

    Article  Google Scholar 

  • Bergemann, D., and U. Hege. 2005. The financing of innovation: Learning and stopping. RAND Journal of Economics 36: 719–752.

    Google Scholar 

  • Bergemann, D., and J. Välimäki. 1996. Learning and strategic pricing. Econometrica 64: 1125–1149.

    Article  Google Scholar 

  • Bergemann, D., and J. Välimäki. 2000. Experimentation in markets. Review of Economic Studies 67: 213–234.

    Article  Google Scholar 

  • Bergemann, D., and J. Välimäki. 2001. Stationary multi choice bandit problems. Journal of Economic Dynamics and Control 25: 1585–1594.

    Article  Google Scholar 

  • Bergemann, D., and J. Välimäki. 2006. Dynamic price competition. Journal of Economic Theory 127: 232–263.

    Article  Google Scholar 

  • Berry, D., and B. Fristedt. 1985. Bandit problems. London: Chapman and Hall.

    Book  Google Scholar 

  • Bolton, P., and C. Harris. 1999. Strategic experimentation. Econometrica 67: 349–374.

    Article  Google Scholar 

  • Felli, L., and C. Harris. 1996. Job matching, learning and firm-specific human capital. Journal of Political Economy 104: 838–868.

    Article  Google Scholar 

  • Gittins, J. 1989. Allocation indices for multi-armed bandits. London: Wiley.

    Google Scholar 

  • Gittins, J., and D. Jones. 1974. A dynamic allocation index for the sequential allocation of experiments. In Progress in statistics, ed. J. Gani. Amsterdam: North-Holland.

    Google Scholar 

  • Hong, H., and S. Rady. 2002. Strategic trading and learning about liquidity. Journal of Financial Markets 5: 419–450.

    Article  Google Scholar 

  • Jovanovic, B. 1979. Job search and the theory of turnover. Journal of Political Economy 87: 972–990.

    Article  Google Scholar 

  • Karatzas, I. 1984. Gittins indices in the dynamic allocation problem for diffusion processes. Annals of Probability 12: 173–192.

    Article  Google Scholar 

  • Karoui, N., and I. Karatzas. 1997. Synchronization and optimality for multi-armed bandit problems in continuous time. Computational and Applied Mathematics 16: 117–152.

    Google Scholar 

  • Keller, G., and S. Rady. 1999. Optimal experimentation in a changing environment. Review of Economic Studies 66: 475–507.

    Article  Google Scholar 

  • Keller, G., S. Rady, and M. Cripps. 2005. Strategic experimentation with exponential bandits. Econometrica 73: 39–68.

    Article  Google Scholar 

  • McLennan, A. 1984. Price dispersion and incomplete learning in the long run. Journal of Economic Dynamics and Control 7: 331–347.

    Article  Google Scholar 

  • Miller, R. 1984. Job matching and occupational choice. Journal of Political Economy 92: 1086–1120.

    Article  Google Scholar 

  • Robbins, H. 1952. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 55: 527–535.

    Article  Google Scholar 

  • Roberts, K., and M. Weitzman. 1981. Funding criteria for research, development and exploration of projects. Econometrica 49: 1261–1288.

    Article  Google Scholar 

  • Rothschild, M. 1974. A two-armed bandit theory of market pricing. Journal of Economic Theory 9: 185–202.

    Article  Google Scholar 

  • Rustichini, A., and A. Wolinsky. 1995. Learning about variable demand in the long run. Journal of Economic Dynamics and Control 19: 1283–1292.

    Article  Google Scholar 

  • Varaiya, P., J. Walrand, and C. Buyukkoc. 1985. Extensions of the multiarmed bandit problem: The discounted case. IEEE Transactions on Automatic Control AC-30: 426–439.

    Article  Google Scholar 

  • Weber, R. 1992. On the Gittins index for multi-armed bandits. Annals of Applied Probability 2: 1024–1033.

    Article  Google Scholar 

  • Weitzman, M. 1979. Optimal search for the best alternative. Econometrica 47: 641–654.

    Article  Google Scholar 

  • Whittle, P. 1981. Arm-acquiring bandits. Annals of Probability 9: 284–292.

    Article  Google Scholar 

  • Whittle, P. 1982. Optimization over time. Vol. 1. Chichester: Wiley.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Copyright information

© 2018 Macmillan Publishers Ltd.

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Bergemann, D., Välimäki, J. (2018). Bandit Problems. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, London. https://doi.org/10.1057/978-1-349-95189-5_2386

Download citation

Publish with us

Policies and ethics