The multi-armed bandit problem is a statistical decision model of an agent trying to optimize his decisions while improving his information at the same time. This classic problem has received much attention in economics as it concisely models the tradeoff between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff).
KeywordsAsset pricing Bandit problems Branching bandit problem Continuous-Time models Corporate finance Descending price auction Experimentation Free-Rider problem Gittins index th Informational efficiency Learning Liquidity Markov equilibria Matching markets Moral hazard Noise trader Probability distribution Product differentiation Regime switch
JEL ClassificationsC72 C73 D43 D81 D82 D83 D92 G24 G31
- Bergemann, D., and U. Hege. 2005. The financing of innovation: Learning and stopping. RAND Journal of Economics 36: 719–752.Google Scholar
- Gittins, J. 1989. Allocation indices for multi-armed bandits. London: Wiley.Google Scholar
- Gittins, J., and D. Jones. 1974. A dynamic allocation index for the sequential allocation of experiments. In Progress in statistics, ed. J. Gani. Amsterdam: North-Holland.Google Scholar
- Karoui, N., and I. Karatzas. 1997. Synchronization and optimality for multi-armed bandit problems in continuous time. Computational and Applied Mathematics 16: 117–152.Google Scholar
- Whittle, P. 1982. Optimization over time. Vol. 1. Chichester: Wiley.Google Scholar