Incentivizing Exploration with Heterogeneous Value of Money

  • Li Han
  • David Kempe
  • Ruixin Qiang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9470)


Recently, Frazier et al. proposed a natural model for crowdsourced exploration of different a priori unknown options: a principal is interested in the long-term welfare of a population of agents who arrive one by one in a multi-armed bandit setting. However, each agent is myopic, so in order to incentivize him to explore options with better long-term prospects, the principal must offer the agent money. Frazier et al. showed that a simple class of policies called time-expanded are optimal in the worst case, and characterized their budget-reward tradeoff. The previous work assumed that all agents are equally and uniformly susceptible to financial incentives. In reality, agents may have different utility for money. We therefore extend the model of Frazier et al. to allow agents that have heterogeneous and non-linear utilities for money. The principal is informed of the agent’s tradeoff via a signal that could be more or less informative.

Our main result is to show that a convex program can be used to derive a signal-dependent time-expanded policy which achieves the best possible Lagrangian reward in the worst case. The worst-case guarantee is matched by so-called “Diamonds in the Rough” instances; the proof that the guarantees match is based on showing that two different convex programs have the same optimal solution for these specific instances.


Multi-armed bandit problems Mechanism design  Incentives 


  1. 1.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed banditproblem. In: Proceedings of the 36th IEEE Symposium on Foundations of Computer Science, pp. 322–331 (1995)Google Scholar
  2. 2.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  4. 4.
    Frazier, P., Kempe, D., Kleinberg, J., Kleinberg, R.: Incentivizing exploration. In: Proceedings of the 16th ACM Conference on Economics and Computation, pp. 5–22 (2014)Google Scholar
  5. 5.
    Gittins, J.C.: Multi-Armed Bandit Allocation Indices. Wiley, New York (1989)zbMATHGoogle Scholar
  6. 6.
    Gittins, J.C., Glazebrook, K.D., Weber, R.: Multi-Armed Bandit Allocation Indices, 2nd edn. Wiley, New York (2011)CrossRefzbMATHGoogle Scholar
  7. 7.
    Gittins, J.C., Jones, D.M.: A dynamic allocation index for the sequential design of experiments. In: Gani, J. (ed.) Progress in Statistics, pp. 241–266 (1974)Google Scholar
  8. 8.
    Ho, C.J., Slivkins, A., Vaughan, J.W.: Adaptive contract design for crowdsourcing markets: bandit algorithms for repeated principal-agent problems. In: Proceedings of the 16th ACM Conf. on Economics and Computation, pp. 359–376 (2014)Google Scholar
  9. 9.
    Katehakis, M.N., Veinott Jr., A.F.: The multi-armed bandit problem: decomposition and computation. Math. Oper. Res. 12(2), 262–268 (1987)Google Scholar
  10. 10.
    Kremer, I., Mansour, Y., Perry, M.: Implementing the “wisdom of the crowd”. In: Proceedings of the 15th ACM Conf. on Electronic Commerce, pp. 605–606 (2013)Google Scholar
  11. 11.
    Lai, T.L., Robbins, H.E.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Mansour, Y., Slivkins, A., Syrgkanis, V.: Bayesian incentive-compatible bandit exploration. In: Proceedings of the 17th ACM Conference on Economics and Computation, pp. 565–582 (2015)Google Scholar
  13. 13.
    Robbins, H.E.: Some aspects of the sequential design of experiments. Bull. Am. Math. Soc. 58, 527–535 (1952)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Singla, A., Krause, A.: Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. In: 22nd International World Wide Web Conference, pp. 1167–1178 (2013)Google Scholar
  15. 15.
    Slivkins, A., Wortman Vaughan, J.: Online decision making in crowdsourcing markets: theoretical challenges (position paper). ACM SIGecam Exch. 12(2), 4–23 (2013)CrossRefGoogle Scholar
  16. 16.
    Spence, M.: Job market signaling. Q. J. Econ. 87, 355–374 (1973)CrossRefGoogle Scholar
  17. 17.
    Whittle, P.: Multi-armed bandits and the Gittins index. J. Roy. Stat. Soc. Ser. B (Methodol.) 42(2), 143–149 (1980)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Open Access This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  1. 1.University of Southern CaliforniaLos AngelesUSA

Personalised recommendations