Incentivising Exploration and Recommendations for Contextual Bandits with Payments

Agrawal, Priyank; Tulabandhula, Theja

doi:10.1007/978-3-030-66412-1_11

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12520))

Included in the following conference series:

Abstract

We propose a contextual bandit based model to capture the learning and social welfare goals of a web platform in the presence of myopic users. By using payments to incentivize these agents to explore different items/recommendations, we show how the platform can learn the inherent attributes of items and achieve a sublinear regret while maximizing cumulative social welfare. We also calculate theoretical bounds on the cumulative costs of incentivization to the platform. Unlike previous works in this domain, we consider contexts to be completely adversarial, and the behavior of the adversary is unknown to the platform. Our approach can improve various engagement metrics of users on e-commerce stores, recommendation engines and matching platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In a typical explore-then-commit learning strategy, there is an initial pure exploration phase by the end of which the learner commits to a single best action till the end of the horizon T [12].

References

Bastani, H., Bayati, M., Khosravi, K.: Mostly exploration-free algorithms for contextual bandits. arXiv preprint arXiv:1704.09011 (2017)
Bietti, A., Agarwal, A., Langford, J.: A contextual bandit bake-off. arXiv preprint arXiv:1802.04064 (2018)
Chen, B., Frazier, P., Kempe, D.: Incentivizing exploration by heterogeneous users. In: Conference On Learning Theory, pp. 798–818 (2018)
Google Scholar
Cohen, L., Mansour, Y.: Optimal algorithm for Bayesian incentive-compatible. arXiv preprint arXiv:1810.10304 (2018)
Dantzig, S., Geleijnse, G., Halteren, A.T.: Toward a persuasive mobile application to reduce sedentary behavior. Pers. Ubiquit. Comput. 17(6), 1237–1246 (2013)
Article Google Scholar
Frazier, P., Kempe, D., Kleinberg, J., Kleinberg, R.: Incentivizing exploration. In: Proceedings of the Fifteenth ACM Conference on Economics and Computation, pp. 5–22. ACM (2014)
Google Scholar
Han, L., Kempe, D., Qiang, R.: Incentivizing exploration with heterogeneous value of money. In: Markakis, E., Schäfer, G. (eds.) WINE 2015. LNCS, vol. 9470, pp. 370–383. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48995-6_27
Chapter Google Scholar
Immorlica, N., Mao, J., Slivkins, A., Wu, Z.S.: Incentivizing exploration with unbiased histories. arXiv preprint arXiv:1811.06026 (2018)
Immorlica, N., Mao, J., Slivkins, A., Wu, Z.S.: Bayesian exploration with heterogeneous agents. In: The World Wide Web Conference, pp. 751–761. ACM (2019)
Google Scholar
Kannan, S., et al.: Fairness incentives for myopic agents. In: Proceedings of the 2017 ACM Conference on Economics and Computation, pp. 369–386. ACM (2017)
Google Scholar
Kannan, S., Morgenstern, J.H., Roth, A., Waggoner, B., Wu, Z.S.: A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Adv. Neural Inf. Process. Syst. 31, 2227–2236 (2018)
Google Scholar
Langford, J., Zhang, T.: The epoch-greedy algorithm for contextual multi-armed bandits. In: Proceedings of the 20th International Conference on Neural Information Processing Systems, pp. 817–824. Citeseer (2007)
Google Scholar
Lattimore, T., Szepesvári, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2020)
Google Scholar
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 661–670. ACM (2010)
Google Scholar
Mansour, Y., Slivkins, A., Syrgkanis, V.: Bayesian incentive-compatible bandit exploration. In: Proceedings of the Sixteenth ACM Conference on Economics and Computation, pp. 565–582. ACM (2015)
Google Scholar
Mansour, Y., Slivkins, A., Syrgkanis, V., Wu, Z.S.: Bayesian exploration: incentivizing exploration in Bayesian games. arXiv preprint arXiv:1602.07570 (2016)
Riquelme, C., Tucker, G., Snoek, J.: Deep Bayesian bandits showdown: an empirical comparison of Bayesian deep networks for thompson sampling. In: International Conference on Learning Representations, ICLR (2018)
Google Scholar
Wang, S., Huang, L.: Multi-armed bandits with compensation. In: Advances in Neural Information Processing Systems, pp. 5114–5122 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, Urbana, USA
Priyank Agrawal
University of Illinois at Chicago, Chicago, USA
Theja Tulabandhula

Authors

Priyank Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Theja Tulabandhula
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Priyank Agrawal .

Editor information

Editors and Affiliations

Aristotle University of Thessaloniki, Thessaloniki, Greece
Nick Bassiliades
Technical University of Crete, Chania, Greece
Georgios Chalkiadakis
IIIA-CSIC, Bellaterra, Spain
Dave de Jonge

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Agrawal, P., Tulabandhula, T. (2020). Incentivising Exploration and Recommendations for Contextual Bandits with Payments. In: Bassiliades, N., Chalkiadakis, G., de Jonge, D. (eds) Multi-Agent Systems and Agreement Technologies. EUMAS AT 2020 2020. Lecture Notes in Computer Science(), vol 12520. Springer, Cham. https://doi.org/10.1007/978-3-030-66412-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-66412-1_11
Published: 05 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66411-4
Online ISBN: 978-3-030-66412-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics