1-Bit matrix completion: PAC-Bayesian analysis of a variational approximation
We focus on the completion of a (possibly) low-rank matrix with binary entries, the so-called 1-bit matrix completion problem. Our approach relies on tools from machine learning theory: empirical risk minimization and its convex relaxations. We propose an algorithm to compute a variational approximation of the pseudo-posterior. Thanks to the convex relaxation, the corresponding minimization problem is bi-convex, and thus the method works well in practice. We study the performance of this variational approximation through PAC-Bayesian learning bounds. Contrary to previous works that focused on upper bounds on the estimation error of M with various matrix norms, we are able to derive from this analysis a PAC bound on the prediction error of our algorithm. We focus essentially on convex relaxation through the hinge loss, for which we present a complete analysis, a complete simulation study and a test on the MovieLens data set. We also discuss a variational approximation to deal with the logistic loss.
KeywordsMatrix completion PAC-Bayesian bounds Variational Bayes Supervised classification Risk convexification Oracle inequalities
We would like to thank Vincent Cottet’s Ph.D. supervisor Professor Nicolas Chopin, for his kind support during the project and the three anonymous referees for their helpful and constructive comments.
- Alquier, P., Cottet, V., & Lecué, G. (2017). Estimation bounds and sharp oracle inequalities of regularized procedures with Lipschitz loss functions. arXiv preprint arXiv:1702.01402.
- Alquier, P., Ridgway, J., & Chopin, N. (June 2015). On the properties of variational approximations of Gibbs posteriors. arXiv e-prints.Google Scholar
- Catoni, O. (2004). Statistical learning theory and stochastic optimization. In J. Picard (Ed.), Saint-Flour Summer School on probability theory 2001., Lecture notes in mathematics Berlin: Springer.Google Scholar
- Herbster, M., Pasteris, S., & Pontil, M. (2016). Mistake bounds for binary matrix completion. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, R. Garnett, & R. Garnett (Eds.), Proceedings of the 29th conference on neural information processing systems (NIPS 2016). Barcelona, Spain: NIPS Proceedings.Google Scholar
- Hsieh, C.-J., Natarajan, N., & Dhillon, I. S. (2015). PU learning for matrix completion. In Proceedings of the 32nd international conference on machine learning, pp. 2445–2453.Google Scholar
- Latouche, P., Robin, S., & Ouadah, S. (2015). Goodness of fit of logistic models for random graphs. arXiv preprint arXiv:1508.00286.
- Lim, Y. J. & Teh, Y. W. (2007). Variational Bayesian approach to movie rating prediction. In Proceedings of KDD cup and workshop.Google Scholar
- McAllester, D. A. (1998). Some PAC-Bayesian theorems. In Proceedings of the eleventh annual conference on computational learning theory (pp. 230–234). New York, ACM.Google Scholar
- Salakhutdinov, R. & Mnih, A. (2008). Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th international conference on machine learning, pp. 880–887.Google Scholar
- Shawe-Taylor, J., & Langford, J. (2003). PAC-Bayes and margins. Advances in Neural Information Processing Systems, 15, 439.Google Scholar
- Srebro, N., Rennie, J., & Jaakkola, T. S. (2004). Maximum-margin matrix factorization. In Advances in neural information processing systems, pp. 1329–1336.Google Scholar