Incentivizing Exploration with Heterogeneous Value of Money
Recently, Frazier et al. proposed a natural model for crowdsourced exploration of different a priori unknown options: a principal is interested in the long-term welfare of a population of agents who arrive one by one in a multi-armed bandit setting. However, each agent is myopic, so in order to incentivize him to explore options with better long-term prospects, the principal must offer the agent money. Frazier et al. showed that a simple class of policies called time-expanded are optimal in the worst case, and characterized their budget-reward tradeoff. The previous work assumed that all agents are equally and uniformly susceptible to financial incentives. In reality, agents may have different utility for money. We therefore extend the model of Frazier et al. to allow agents that have heterogeneous and non-linear utilities for money. The principal is informed of the agent’s tradeoff via a signal that could be more or less informative.
Our main result is to show that a convex program can be used to derive a signal-dependent time-expanded policy which achieves the best possible Lagrangian reward in the worst case. The worst-case guarantee is matched by so-called “Diamonds in the Rough” instances; the proof that the guarantees match is based on showing that two different convex programs have the same optimal solution for these specific instances.
KeywordsMulti-armed bandit problems Mechanism design Incentives
- 1.Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed banditproblem. In: Proceedings of the 36th IEEE Symposium on Foundations of Computer Science, pp. 322–331 (1995)Google Scholar
- 4.Frazier, P., Kempe, D., Kleinberg, J., Kleinberg, R.: Incentivizing exploration. In: Proceedings of the 16th ACM Conference on Economics and Computation, pp. 5–22 (2014)Google Scholar
- 7.Gittins, J.C., Jones, D.M.: A dynamic allocation index for the sequential design of experiments. In: Gani, J. (ed.) Progress in Statistics, pp. 241–266 (1974)Google Scholar
- 8.Ho, C.J., Slivkins, A., Vaughan, J.W.: Adaptive contract design for crowdsourcing markets: bandit algorithms for repeated principal-agent problems. In: Proceedings of the 16th ACM Conf. on Economics and Computation, pp. 359–376 (2014)Google Scholar
- 9.Katehakis, M.N., Veinott Jr., A.F.: The multi-armed bandit problem: decomposition and computation. Math. Oper. Res. 12(2), 262–268 (1987)Google Scholar
- 10.Kremer, I., Mansour, Y., Perry, M.: Implementing the “wisdom of the crowd”. In: Proceedings of the 15th ACM Conf. on Electronic Commerce, pp. 605–606 (2013)Google Scholar
- 12.Mansour, Y., Slivkins, A., Syrgkanis, V.: Bayesian incentive-compatible bandit exploration. In: Proceedings of the 17th ACM Conference on Economics and Computation, pp. 565–582 (2015)Google Scholar
- 14.Singla, A., Krause, A.: Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. In: 22nd International World Wide Web Conference, pp. 1167–1178 (2013)Google Scholar
Open Access This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.