Abstract
We propose a contextual bandit based model to capture the learning and social welfare goals of a web platform in the presence of myopic users. By using payments to incentivize these agents to explore different items/recommendations, we show how the platform can learn the inherent attributes of items and achieve a sublinear regret while maximizing cumulative social welfare. We also calculate theoretical bounds on the cumulative costs of incentivization to the platform. Unlike previous works in this domain, we consider contexts to be completely adversarial, and the behavior of the adversary is unknown to the platform. Our approach can improve various engagement metrics of users on e-commerce stores, recommendation engines and matching platforms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In a typical explore-then-commit learning strategy, there is an initial pure exploration phase by the end of which the learner commits to a single best action till the end of the horizon TÂ [12].
References
Bastani, H., Bayati, M., Khosravi, K.: Mostly exploration-free algorithms for contextual bandits. arXiv preprint arXiv:1704.09011 (2017)
Bietti, A., Agarwal, A., Langford, J.: A contextual bandit bake-off. arXiv preprint arXiv:1802.04064 (2018)
Chen, B., Frazier, P., Kempe, D.: Incentivizing exploration by heterogeneous users. In: Conference On Learning Theory, pp. 798–818 (2018)
Cohen, L., Mansour, Y.: Optimal algorithm for Bayesian incentive-compatible. arXiv preprint arXiv:1810.10304 (2018)
Dantzig, S., Geleijnse, G., Halteren, A.T.: Toward a persuasive mobile application to reduce sedentary behavior. Pers. Ubiquit. Comput. 17(6), 1237–1246 (2013)
Frazier, P., Kempe, D., Kleinberg, J., Kleinberg, R.: Incentivizing exploration. In: Proceedings of the Fifteenth ACM Conference on Economics and Computation, pp. 5–22. ACM (2014)
Han, L., Kempe, D., Qiang, R.: Incentivizing exploration with heterogeneous value of money. In: Markakis, E., Schäfer, G. (eds.) WINE 2015. LNCS, vol. 9470, pp. 370–383. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48995-6_27
Immorlica, N., Mao, J., Slivkins, A., Wu, Z.S.: Incentivizing exploration with unbiased histories. arXiv preprint arXiv:1811.06026 (2018)
Immorlica, N., Mao, J., Slivkins, A., Wu, Z.S.: Bayesian exploration with heterogeneous agents. In: The World Wide Web Conference, pp. 751–761. ACM (2019)
Kannan, S., et al.: Fairness incentives for myopic agents. In: Proceedings of the 2017 ACM Conference on Economics and Computation, pp. 369–386. ACM (2017)
Kannan, S., Morgenstern, J.H., Roth, A., Waggoner, B., Wu, Z.S.: A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Adv. Neural Inf. Process. Syst. 31, 2227–2236 (2018)
Langford, J., Zhang, T.: The epoch-greedy algorithm for contextual multi-armed bandits. In: Proceedings of the 20th International Conference on Neural Information Processing Systems, pp. 817–824. Citeseer (2007)
Lattimore, T., Szepesvári, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2020)
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 661–670. ACM (2010)
Mansour, Y., Slivkins, A., Syrgkanis, V.: Bayesian incentive-compatible bandit exploration. In: Proceedings of the Sixteenth ACM Conference on Economics and Computation, pp. 565–582. ACM (2015)
Mansour, Y., Slivkins, A., Syrgkanis, V., Wu, Z.S.: Bayesian exploration: incentivizing exploration in Bayesian games. arXiv preprint arXiv:1602.07570 (2016)
Riquelme, C., Tucker, G., Snoek, J.: Deep Bayesian bandits showdown: an empirical comparison of Bayesian deep networks for thompson sampling. In: International Conference on Learning Representations, ICLR (2018)
Wang, S., Huang, L.: Multi-armed bandits with compensation. In: Advances in Neural Information Processing Systems, pp. 5114–5122 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Agrawal, P., Tulabandhula, T. (2020). Incentivising Exploration and Recommendations for Contextual Bandits with Payments. In: Bassiliades, N., Chalkiadakis, G., de Jonge, D. (eds) Multi-Agent Systems and Agreement Technologies. EUMAS AT 2020 2020. Lecture Notes in Computer Science(), vol 12520. Springer, Cham. https://doi.org/10.1007/978-3-030-66412-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-66412-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66411-4
Online ISBN: 978-3-030-66412-1
eBook Packages: Computer ScienceComputer Science (R0)