Bandits and Recommender Systems

Mary, Jérémie; Gaudel, Romaric; Preux, Philippe

doi:10.1007/978-3-319-27926-8_29

Jérémie Mary¹⁷,
Romaric Gaudel¹⁷ &
Philippe Preux¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9432))

Included in the following conference series:

International Workshop on Machine Learning, Optimization and Big Data

2466 Accesses
17 Citations

Abstract

This paper addresses the on-line recommendation problem facing new users and new items; we assume that no information is available neither about users, nor about the items. The only source of information is a set of ratings given by users to some items. By on-line, we mean that the set of users, and the set of items, and the set of ratings is evolving along time and that at any moment, the recommendation system has to select items to recommend based on the currently available information, that is basically the sequence of past events. We also mean that each user comes with her preferences which may evolve along short and longer scales of time; so we have to continuously update their preferences. When the set of ratings is the only available source of information, the traditional approach is matrix factorization. In a decision making under uncertainty setting, actions should be selected to balance exploration with exploitation; this is best modeled as a bandit problem. Matrix factors provide a latent representation of users and items. These representations may then be used as contextual information by the bandit algorithm to select items. This last point is exactly the originality of this paper: the combination of matrix factorization and bandit algorithms to solve the on-line recommendation problem. Our work is driven by considering the recommendation problem as a feedback controlled loop. This leads to interactions between the representation learning, and the recommendation policy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
\(\tilde{O}\) means O up to a logarithmic term on T.

References

Abbasi-yadkori, Y., Pal, D., Szepesvari, C.: Improved algorithms for linear stochastic bandits. In: Proceedings of NIPS, pp. 2312–2320 (2011)
Google Scholar
Agarwal, D., Chen, B.-C., Elango, P., Motgi, N., Park, S.-T., Ramakrishnan, R., Roy, S., Zachariah, J.: Online models for content optimization. In: Proceedings of NIPS, pp. 17–24 (2008)
Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47, 235–256 (2002)
Article MATH Google Scholar
Bennett, J., Lanning, S., Netflix, N.: The Netflix prize. In: KDD Cup and Workshop (2007)
Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Combinatorial bandits. J. Comput. Syst. Sci. 78(5), 1404–1422 (2012)
Article MATH MathSciNet Google Scholar
Chatterjee, S.: Matrix estimation by universal singular value thresholding. pre-print (2012). http://arxiv.org/abs/1212.1247
Dhanjal, C., Gaudel, R., Clémençon, S.: Collaborative filtering with localised ranking. In: Proceedings of AAAI (2015)
Google Scholar
Dror, G., Koenigstein, N., Koren, Y., Weimer, M.: The Yahoo! music dataset and kdd-cup 2011. In: Proceedings of KDD Cup (2011)
Google Scholar
Feldman, S.: Personalization with contextual bandits. http://engineering.richrelevance.com/author/sergey-feldman/
Kohli, P., Salek, M., Stoddard, G.: A fast bandit algorithm for recommendations to users with heterogeneous tastes. In: Proceedings of AAAI, pp. 1135–1141 (2013)
Google Scholar
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Article Google Scholar
Langford, J., Strehl, A., Wortman, J.: Exploration scavenging. In: Proceedings of ICML, pp. 528–535. Omnipress (2008)
Google Scholar
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of WWW, pp. 661–670. ACM, New York (2010)
Google Scholar
Li, L., Chu, W., Langford, J., Wang, X.: Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In: Proceedings of WSDM, pp. 297–306. ACM (2011)
Google Scholar
Mary, J., Garivier, A., Li, L., Munos, R., Nicol, O., Ortner, R., Preux, P.: ICML exploration and exploitation 3 - new challenges (2012)
Google Scholar
Shani, G., Heckerman, D., Brafman, R.I.: An MDP-based recommender system. J. Mach. Learn. Res. 6, 1265–1295 (2005)
MATH MathSciNet Google Scholar
Shivaswamy, P.K., Joachims, T.: Online learning with preference feedback. In: NIPS Workshop on Choice Models and Preference Learning (2011)
Google Scholar
Walsh, T.J., Szita, I., Diuk, C., Littman, M.L.: Exploring compact reinforcement-learning representations with linear regression (2012). CoRR abs/1205.2606
Weston, J., Yee, H., Weiss, R.J.: Learning to rank recommendations with the k-order statistic loss. In: Proceedings of RecSys, pp. 245–248. ACM (2013)
Google Scholar
White, J.M.: Bandit Algorithms for Website Optimization. O’Reilly, USA (2012)
Google Scholar
Yue, Y., Hong, S.A., Guestrin, C.: Hierarchical exploration for accelerating contextual bandits. In: Proceedings of ICML, pp. 1895–1902 (2012)
Google Scholar
Zhou, Y., Wilkinson, D., Schreiber, R., Pan, R.: Large-scale parallel collaborative filtering for the netflix prize. In: Fleischer, R., Xu, J. (eds.) AAIM 2008. LNCS, vol. 5034, pp. 337–348. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Acknowledgements

Authors acknowledge the support of INRIA, and the stimulating environment of the research group SequeL.

Author information

Authors and Affiliations

CRIStAL (UMR CNRS), Université de Lille, Villeneuve d’Ascq, France
Jérémie Mary, Romaric Gaudel & Philippe Preux

Authors

Jérémie Mary
View author publications
You can also search for this author in PubMed Google Scholar
Romaric Gaudel
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Preux
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philippe Preux .

Editor information

Editors and Affiliations

University of Florida, Gainsville, Florida, USA
Panos Pardalos
University of Catania, Catania, Italy
Mario Pavone
University of Catania, Catania, Italy
Giovanni Maria Farinella
University of Catania, Catania, Italy
Vincenzo Cutello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mary, J., Gaudel, R., Preux, P. (2015). Bandits and Recommender Systems. In: Pardalos, P., Pavone, M., Farinella, G., Cutello, V. (eds) Machine Learning, Optimization, and Big Data. MOD 2015. Lecture Notes in Computer Science(), vol 9432. Springer, Cham. https://doi.org/10.1007/978-3-319-27926-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-27926-8_29
Published: 06 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27925-1
Online ISBN: 978-3-319-27926-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics