Learning in a Game of Strategic Experimentation with Three-Armed Exponential Bandits
The present article provides some additional results for the two-player game of strategic experimentation with three-armed exponential bandits analyzed in Klein (Games Econ Behav 82:636–657, 2013). Players play replica bandits, with one safe arm and two risky arms, which are known to be of opposite types. It is initially unknown, however, which risky arm is good and which is bad. A good risky arm yields lump sums at exponentially distributed times when pulled. A bad risky arm never yields any payoff. In this article, I give a necessary and sufficient condition for the state of the world eventually to be found out with probability 1 in any Markov perfect equilibrium in which at least one player’s value function is continuously differentiable. Furthermore, I provide closed-form expressions for the players’ value function in a symmetric Markov perfect equilibrium for low and intermediate stakes.
- 1.Bellman, R.: A problem in the sequential design of experiments. Sankhya Indian J. Stat. (1933–1960) 16(3/4), 221–229 (1956)Google Scholar
- 3.Bolton, P., Harris, C.: Strategic experimentation: the Undiscounted case. In: Hammond, P.J., Myles, G.D. (eds.) Incentives, Organizations and Public Economics – Papers in Honour of Sir James Mirrlees, pp. 53–68. Oxford University Press, Oxford (2000)Google Scholar
- 5.Gittins, J., Jones, D.: A dynamic allocation index for the sequential design of experiments. In: Progress in Statistics, European Meeting of Statisticians, 1972, vol. 1, pp. 241–266. North-Holland, Amsterdam (1974)Google Scholar