Skip to main content

Incentivising Exploration and Recommendations for Contextual Bandits with Payments

  • Conference paper
  • First Online:
Multi-Agent Systems and Agreement Technologies (EUMAS 2020, AT 2020)

Abstract

We propose a contextual bandit based model to capture the learning and social welfare goals of a web platform in the presence of myopic users. By using payments to incentivize these agents to explore different items/recommendations, we show how the platform can learn the inherent attributes of items and achieve a sublinear regret while maximizing cumulative social welfare. We also calculate theoretical bounds on the cumulative costs of incentivization to the platform. Unlike previous works in this domain, we consider contexts to be completely adversarial, and the behavior of the adversary is unknown to the platform. Our approach can improve various engagement metrics of users on e-commerce stores, recommendation engines and matching platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In a typical explore-then-commit learning strategy, there is an initial pure exploration phase by the end of which the learner commits to a single best action till the end of the horizon T [12].

References

  1. Bastani, H., Bayati, M., Khosravi, K.: Mostly exploration-free algorithms for contextual bandits. arXiv preprint arXiv:1704.09011 (2017)

  2. Bietti, A., Agarwal, A., Langford, J.: A contextual bandit bake-off. arXiv preprint arXiv:1802.04064 (2018)

  3. Chen, B., Frazier, P., Kempe, D.: Incentivizing exploration by heterogeneous users. In: Conference On Learning Theory, pp. 798–818 (2018)

    Google Scholar 

  4. Cohen, L., Mansour, Y.: Optimal algorithm for Bayesian incentive-compatible. arXiv preprint arXiv:1810.10304 (2018)

  5. Dantzig, S., Geleijnse, G., Halteren, A.T.: Toward a persuasive mobile application to reduce sedentary behavior. Pers. Ubiquit. Comput. 17(6), 1237–1246 (2013)

    Article  Google Scholar 

  6. Frazier, P., Kempe, D., Kleinberg, J., Kleinberg, R.: Incentivizing exploration. In: Proceedings of the Fifteenth ACM Conference on Economics and Computation, pp. 5–22. ACM (2014)

    Google Scholar 

  7. Han, L., Kempe, D., Qiang, R.: Incentivizing exploration with heterogeneous value of money. In: Markakis, E., Schäfer, G. (eds.) WINE 2015. LNCS, vol. 9470, pp. 370–383. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48995-6_27

    Chapter  Google Scholar 

  8. Immorlica, N., Mao, J., Slivkins, A., Wu, Z.S.: Incentivizing exploration with unbiased histories. arXiv preprint arXiv:1811.06026 (2018)

  9. Immorlica, N., Mao, J., Slivkins, A., Wu, Z.S.: Bayesian exploration with heterogeneous agents. In: The World Wide Web Conference, pp. 751–761. ACM (2019)

    Google Scholar 

  10. Kannan, S., et al.: Fairness incentives for myopic agents. In: Proceedings of the 2017 ACM Conference on Economics and Computation, pp. 369–386. ACM (2017)

    Google Scholar 

  11. Kannan, S., Morgenstern, J.H., Roth, A., Waggoner, B., Wu, Z.S.: A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Adv. Neural Inf. Process. Syst. 31, 2227–2236 (2018)

    Google Scholar 

  12. Langford, J., Zhang, T.: The epoch-greedy algorithm for contextual multi-armed bandits. In: Proceedings of the 20th International Conference on Neural Information Processing Systems, pp. 817–824. Citeseer (2007)

    Google Scholar 

  13. Lattimore, T., Szepesvári, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2020)

    Google Scholar 

  14. Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 661–670. ACM (2010)

    Google Scholar 

  15. Mansour, Y., Slivkins, A., Syrgkanis, V.: Bayesian incentive-compatible bandit exploration. In: Proceedings of the Sixteenth ACM Conference on Economics and Computation, pp. 565–582. ACM (2015)

    Google Scholar 

  16. Mansour, Y., Slivkins, A., Syrgkanis, V., Wu, Z.S.: Bayesian exploration: incentivizing exploration in Bayesian games. arXiv preprint arXiv:1602.07570 (2016)

  17. Riquelme, C., Tucker, G., Snoek, J.: Deep Bayesian bandits showdown: an empirical comparison of Bayesian deep networks for thompson sampling. In: International Conference on Learning Representations, ICLR (2018)

    Google Scholar 

  18. Wang, S., Huang, L.: Multi-armed bandits with compensation. In: Advances in Neural Information Processing Systems, pp. 5114–5122 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Priyank Agrawal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Agrawal, P., Tulabandhula, T. (2020). Incentivising Exploration and Recommendations for Contextual Bandits with Payments. In: Bassiliades, N., Chalkiadakis, G., de Jonge, D. (eds) Multi-Agent Systems and Agreement Technologies. EUMAS AT 2020 2020. Lecture Notes in Computer Science(), vol 12520. Springer, Cham. https://doi.org/10.1007/978-3-030-66412-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66412-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66411-4

  • Online ISBN: 978-3-030-66412-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics