The New Palgrave Dictionary of Economics

2018 Edition
| Editors: Macmillan Publishers Ltd

Bandit Problems

  • Dirk Bergemann
  • Juuso Välimäki
Reference work entry


The multi-armed bandit problem is a statistical decision model of an agent trying to optimize his decisions while improving his information at the same time. This classic problem has received much attention in economics as it concisely models the tradeoff between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff).


Asset pricing Bandit problems Branching bandit problem Continuous-Time models Corporate finance Descending price auction Experimentation Free-Rider problem Gittins index th Informational efficiency Learning Liquidity Markov equilibria Matching markets Moral hazard Noise trader Probability distribution Product differentiation Regime switch 

JEL Classifications

C72 C73 D43 D81 D82 D83 D92 G24 G31 
This is a preview of subscription content, log in to check access.


  1. Banks, J., and R. Sundaram. 1992. Denumerable-armed bandits. Econometrica 60: 1071–1096.CrossRefGoogle Scholar
  2. Banks, J., and R. Sundaram. 1994. Switching costs and the Gittins index. Econometrica 62: 687–694.CrossRefGoogle Scholar
  3. Bergemann, D., and U. Hege. 1998. Dynamic venture capital financing, learning and moral hazard. Journal of Banking and Finance 22: 703–735.CrossRefGoogle Scholar
  4. Bergemann, D., and U. Hege. 2005. The financing of innovation: Learning and stopping. RAND Journal of Economics 36: 719–752.Google Scholar
  5. Bergemann, D., and J. Välimäki. 1996. Learning and strategic pricing. Econometrica 64: 1125–1149.CrossRefGoogle Scholar
  6. Bergemann, D., and J. Välimäki. 2000. Experimentation in markets. Review of Economic Studies 67: 213–234.CrossRefGoogle Scholar
  7. Bergemann, D., and J. Välimäki. 2001. Stationary multi choice bandit problems. Journal of Economic Dynamics and Control 25: 1585–1594.CrossRefGoogle Scholar
  8. Bergemann, D., and J. Välimäki. 2006. Dynamic price competition. Journal of Economic Theory 127: 232–263.CrossRefGoogle Scholar
  9. Berry, D., and B. Fristedt. 1985. Bandit problems. London: Chapman and Hall.CrossRefGoogle Scholar
  10. Bolton, P., and C. Harris. 1999. Strategic experimentation. Econometrica 67: 349–374.CrossRefGoogle Scholar
  11. Felli, L., and C. Harris. 1996. Job matching, learning and firm-specific human capital. Journal of Political Economy 104: 838–868.CrossRefGoogle Scholar
  12. Gittins, J. 1989. Allocation indices for multi-armed bandits. London: Wiley.Google Scholar
  13. Gittins, J., and D. Jones. 1974. A dynamic allocation index for the sequential allocation of experiments. In Progress in statistics, ed. J. Gani. Amsterdam: North-Holland.Google Scholar
  14. Hong, H., and S. Rady. 2002. Strategic trading and learning about liquidity. Journal of Financial Markets 5: 419–450.CrossRefGoogle Scholar
  15. Jovanovic, B. 1979. Job search and the theory of turnover. Journal of Political Economy 87: 972–990.CrossRefGoogle Scholar
  16. Karatzas, I. 1984. Gittins indices in the dynamic allocation problem for diffusion processes. Annals of Probability 12: 173–192.CrossRefGoogle Scholar
  17. Karoui, N., and I. Karatzas. 1997. Synchronization and optimality for multi-armed bandit problems in continuous time. Computational and Applied Mathematics 16: 117–152.Google Scholar
  18. Keller, G., and S. Rady. 1999. Optimal experimentation in a changing environment. Review of Economic Studies 66: 475–507.CrossRefGoogle Scholar
  19. Keller, G., S. Rady, and M. Cripps. 2005. Strategic experimentation with exponential bandits. Econometrica 73: 39–68.CrossRefGoogle Scholar
  20. McLennan, A. 1984. Price dispersion and incomplete learning in the long run. Journal of Economic Dynamics and Control 7: 331–347.CrossRefGoogle Scholar
  21. Miller, R. 1984. Job matching and occupational choice. Journal of Political Economy 92: 1086–1120.CrossRefGoogle Scholar
  22. Robbins, H. 1952. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 55: 527–535.CrossRefGoogle Scholar
  23. Roberts, K., and M. Weitzman. 1981. Funding criteria for research, development and exploration of projects. Econometrica 49: 1261–1288.CrossRefGoogle Scholar
  24. Rothschild, M. 1974. A two-armed bandit theory of market pricing. Journal of Economic Theory 9: 185–202.CrossRefGoogle Scholar
  25. Rustichini, A., and A. Wolinsky. 1995. Learning about variable demand in the long run. Journal of Economic Dynamics and Control 19: 1283–1292.CrossRefGoogle Scholar
  26. Varaiya, P., J. Walrand, and C. Buyukkoc. 1985. Extensions of the multiarmed bandit problem: The discounted case. IEEE Transactions on Automatic Control AC-30: 426–439.CrossRefGoogle Scholar
  27. Weber, R. 1992. On the Gittins index for multi-armed bandits. Annals of Applied Probability 2: 1024–1033.CrossRefGoogle Scholar
  28. Weitzman, M. 1979. Optimal search for the best alternative. Econometrica 47: 641–654.CrossRefGoogle Scholar
  29. Whittle, P. 1981. Arm-acquiring bandits. Annals of Probability 9: 284–292.CrossRefGoogle Scholar
  30. Whittle, P. 1982. Optimization over time. Vol. 1. Chichester: Wiley.Google Scholar

Copyright information

© Macmillan Publishers Ltd. 2018

Authors and Affiliations

  • Dirk Bergemann
    • 1
  • Juuso Välimäki
    • 1
  1. 1.