Abstract
The neighborhood-based methods of the previous chapter can be viewed as generalizations of k-nearest neighbor classifiers, which are commonly used in machine learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
From a practical point of view, preprocessing is essential for efficiency. However, one could implement the neighborhood method without a preprocessing phase, albeit with larger latencies at query time.
- 2.
Parameter-tuning methods, such as hold-out and cross-validation, are discussed in Chapter 7
- 3.
In the case of user-based associations, the consequents might contain any user.
- 4.
It is also possible to use more sophisticated ways of removing bias for better performance. For example, the bias B ij , which is specific to user i and item j, can be computed using the approach discussed in section 3.7.1. This bias is subtracted from observed entries and all missing entries are initialized to 0s during pre-processing. After computing the predictions, the biases B ij are added back to the predicted values during postprocessing.
- 5.
A detailed description of the method used for performing this estimation in various scenarios is discussed in section 3.6.5.3.
- 6.
The row space of a matrix is defined by all possible linear combinations of the rows of the matrix. The column space of a matrix is defined by all possible linear combinations of the columns of the matrix.
- 7.
In SVD [568], the basis vectors are also referred to as singular vectors, which, by definition, must be mutually orthonormal.
- 8.
Refer to Chapter 6 for a discussion of the bias-variance trade-off.
- 9.
A more precise update should be \(\overline{u_{i}} \Leftarrow \overline{u_{i}} +\alpha (e_{ij}\overline{v_{j}} -\lambda \overline{u_{i}}/n_{i}^{user})\) and \(\overline{v_{j}} \Leftarrow \overline{v_{j}} +\alpha (e_{ij}\overline{u_{i}} -\lambda \overline{v_{j}}/n_{j}^{item})\). Here, n i user represents the number of observed ratings for user i and \(n_{j}^{item}\) represents the number of observed ratings for item j. Here, the regularization terms for various user/item factors are divided equally among the corresponding observed entries for various users/items. In practice, the (simpler) heuristic update rules discussed in the chapter are often used. We have chosen to use these (simpler) rules throughout this chapter to be consistent with the research literature on recommender systems. With proper parameter tuning, \(\lambda\) will automatically adjust to a smaller value in the case of the simpler update rules.
- 10.
The inner-product of two column-vectors \(\overline{x}\) and \(\overline{y}\) is given by the scalar \(\overline{x}^{T}\overline{y}\), whereas the outer-product is given by the rank-1 matrix \(\overline{x}\,\overline{y}^{T}\). Furthermore, \(\overline{x}\) and \(\overline{y}\) need not be of the same size in order to compute an outer-product.
- 11.
In many cases, this approach can outperform SVD + +, especially when the number of observed ratings is small.
- 12.
For matrices, which are not mean-centered, the global mean can be subtracted during preprocessing and then added back at prediction time.
- 13.
We use a slightly different notation than the original paper [309], although the approach described here is equivalent. This presentation simplifies the notation by introducing fewer variables and viewing bias variables as constraints on the factorization process.
- 14.
The literature often describes these updates in vectorized form. These updates may be applied to the rows of U, V, and Y as follows:
$$\displaystyle\begin{array}{rcl} & & \overline{u_{i}} \Leftarrow \overline{u_{i}} +\alpha (e_{ij}\overline{v_{j}} -\lambda \overline{u_{i}}) {}\\ & & \overline{v_{j}} \Leftarrow \overline{v_{j}} +\alpha \left (e_{ij} \cdot \left [\overline{u_{i}} +\sum _{h\in I_{i}} \frac{\overline{y_{h}}} {\sqrt{\vert I_{i } \vert }}\right ] -\lambda \cdot \overline{v_{j}}\right ) {}\\ & & \overline{y_{h}} \Leftarrow \overline{y_{h}} +\alpha \left (\frac{e_{ij} \cdot \overline{v_{j}}} {\sqrt{\vert I_{i } \vert }} -\lambda \cdot \overline{y_{h}}\right )\ \ \forall h \in I_{i} {}\\ & & \mbox{ Reset perturbed entries in fixed columns of $U$, $V $, and $Y $} {}\\ \end{array}$$ - 15.
These effects are best understood in terms of the bias-variance trade-off in machine learning [22]. Setting the unspecified values to 0 increases bias, but it reduces variance. When a large number of entries are unspecified, and the prior probability of a missing entry to be 0 is very high, the variance effects can dominate.
- 16.
Refer to Chapter 6 for a discussion of the bias-variance trade-off in collaborative filtering.
- 17.
Bibliography
D. Agarwal, and B. Chen. Regression-based latent factor models. ACM KDD Conference, pp. 19–28. 2009.
C. Aggarwal. Data classification: algorithms and applications. CRC Press, 2014.
C. Aggarwal. Data mining: the textbook. Springer, New York, 2015.
C. Aggarwal and J. Han. Frequent pattern mining. Springer, New York, 2014.
C. Aggarwal and S. Parthasarathy. Mining massively incomplete data sets by conceptual reconstruction. ACM KDD Conference, pp. 227–232, 2001.
C. Aggarwal, C. Procopiuc, and P. S. Yu. Finding localized associations in market basket data. IEEE Transactions on Knowledge and Data Engineering, 14(1), pp. 51–62, 2001.
C. Aggarwal, Z. Sun, and P. Yu. Online generation of profile association rules. ACM KDD Conference, pp. 129–133, 1998.
C. Aggarwal, Z. Sun, and P. Yu. Online algorithms for finding profile association rules, CIKM Conference, pp. 86–95, 1998.
R. Battiti. Accelerated backpropagation learning: Two optimization methods. Complex Systems, 3(4), pp. 331–342, 1989.
R. Bell and Y. Koren. Scalable collaborative filtering with jointly derived neighborhood interpolation weights. IEEE International Conference on Data Mining, pp. 43–52, 2007.
R. Bell and Y. Koren. Lessons from the Netflix prize challenge. ACM SIGKDD Explorations Newsletter, 9(2), pp. 75–79, 2007.
D. P. Bertsekas. Nonlinear programming. Athena Scientific Publishers, Belmont, 1999.
D. Billsus and M. Pazzani. Learning collaborative information filters. ICML Conference, pp. 46–54, 1998.
C. M. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995.
M. Brand. Fast online SVD revisions for lightweight recommender systems. SIAM Conference on Data Mining, pp. 37–46, 2003.
J. Cai, E. Candes, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982, 2010.
J. Canny. Collaborative filtering with privacy via factor analysis. ACM SIGR Conference, pp. 238–245, 2002.
T. Chen, Z. Zheng, Q. Lu, W. Zhang, and Y. Yu. Feature-based matrix factorization. arXiv preprint arXiv:1109.2271, 2011.
A. Cichocki and R. Zdunek. Regularized alternating least squares algorithms for non-negative matrix/tensor factorization. International Symposium on Neural Networks, pp. 793–802. 2007.
D. DeCoste. Collaborative prediction using ensembles of maximum margin matrix factorizations. International Conference on Machine Learning, pp. 249–256, 2006.
R. Devooght, N. Kourtellis, and A. Mantrach. Dynamic matrix factorization with priors on unknown values. ACM KDD Conference, 2015.
R. Gemulla, E. Nijkamp, P. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. ACM KDD Conference, pp. 69–77, 2011.
L. Getoor and M. Sahami. Using probabilistic relational models for collaborative filtering. Workshop on Web Usage Analysis and User Profiling, 1999.
F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural networks architectures. Neural Computation, 2(2), pp. 219–269, 1995.
T. Hofmann. Latent semantic models for collaborative filtering. ACM Transactions on Information Systems (TOIS), 22(1), pp. 89–114, 2004.
Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets. IEEE International Conference on Data Mining, pp. 263–272, 2008.
P. Jain and I. Dhillon. Provable inductive matrix completion. arXiv preprint arXiv:1306.0626 http://arxiv.org/abs/1306.0626.
P. Jain, P. Netrapalli, and S. Sanghavi. Low-rank matrix completion using alternating minimization. ACM Symposium on Theory of Computing, pp. 665–674, 2013.
D. Kim, and B. Yum. Collaborative filtering Based on iterative principal component analysis, Expert Systems with Applications, 28, pp. 623–830, 2005.
H. Kim and H. Park. Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM Journal on Matrix Analysis and Applications, 30(2), pp. 713–730, 2008.
Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. ACM KDD Conference, pp. 426–434, 2008. Extended version of this paper appears as: “Y. Koren. Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Transactions on Knowledge Discovery from Data (TKDD), 4(1), 1, 2010.”
Y. Koren. Collaborative filtering with temporal dynamics. ACM KDD Conference, pp. 447–455, 2009. Another version also appears in the Communications of the ACM,, 53(4), pp. 89–97, 2010.
Y. Koren. The Bellkor solution to the Netflix grand prize. Netflix prize documentation, 81, 2009. http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf
Y. Koren and R. Bell. Advances in collaborative filtering. Recommender Systems Handbook, Springer, pp. 145–186, 2011. (Extended version in 2015 edition of handbook).
Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8), pp. 30–37, 2009.
S. Kabbur, X. Ning, and G. Karypis. FISM: factored item similarity models for top-N recommender systems. ACM KDD Conference, pp. 659–667, 2013.
S. Kabbur and G. Karypis. NLMF: NonLinear Matrix Factorization Methods for Top-N Recommender Systems. IEEE Data Mining Workshop (ICDMW), pp. 167–174, 2014.
A. Langville, C. Meyer, R. Albright, J. Cox, and D. Duling. Initializations for the nonnegative matrix factorization. ACM KDD Conference, pp. 23–26, 2006.
D. Lemire and A. Maclachlan. Slope one predictors for online rating-based collaborative filtering. SIAM Conference on Data Mining, 2005.
M. Li, T. Zhang, Y. Chen, and A. Smola. Efficient mini-batch training for stochastic optimization. ACM KDD Conference, pp. 661–670, 2014.
C.-J. Lin. Projected gradient methods for nonnegative matrix factorization. Neural Computation, 19(10), pp. 2576–2779, 2007.
W. Lin. Association rule mining for collaborative recommender systems. Masters Thesis, Worcester Polytechnic Institute, 2000.
W. Lin, S. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery, 6(1), pp. 83–105, 2002.
B. Liu, W. Hsu, and Y. Ma. Mining association rules with multiple minimum supports. ACM KDD Conference, pp. 337–341, 1999.
X. Liu, C. Aggarwal, Y.-F. Lee, X. Kong, X. Sun, and S. Sathe. Kernelized matrix factorization for collaborative filtering. SIAM Conference on Data Mining, 2016.
A. Mild and M. Natter. Collaborative filtering or regression models for Internet recommendation systems?. Journal of Targeting, Measurement and Analysis for Marketing, 10(4), pp. 304–313, 2002.
K. Miyahara, and M. J. Pazzani. Collaborative filtering with the simple Bayesian classifier. Pacific Rim International Conference on Artificial Intelligence, 2000.
B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Effective personalization based on association rule discovery from Web usage data. ACM Workshop on Web Information and Data Management, pp. 9–15, 2001.
X. Ning and G. Karypis. SLIM: Sparse linear methods for top-N recommender systems. IEEE International Conference on Data Mining, pp. 497–506, 2011.
D. Oard and J. Kim. Implicit feedback for recommender systems. Proceedings of the AAAI Workshop on Recommender Systems, pp. 81–83, 1998.
P. Paatero and U. Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(2), pp. 111–126, 1994.
R. Pan, Y. Zhou, B. Cao, N. Liu, R. Lukose, M. Scholz, Q. Yang. One-class collaborative filtering. IEEE International Conference on Data Mining, pp. 502–511, 2008.
R. Pan, and M. Scholz. Mind the gaps: weighting the unknown in large-scale one-class collaborative filtering. ACM KDD Conference, pp. 667–676, 2009.
S. Parthasarathy and C. Aggarwal. On the use of conceptual reconstruction for mining massively incomplete data sets. IEEE Transactions on Knowledge and Data Engineering, 15(6), pp. 1512–1521, 2003.
A. Paterek. Improving regularized singular value decomposition for collaborative filtering. Proceedings of KDD Cup and Workshop, 2007.
V. Pauca, J. Piper, and R. Plemmons. Nonnegative matrix factorization for spectral data analysis. Linear algebra and its applications, 416(1), pp. 29–47, 2006.
S. Rendle. Factorization machines. IEEE International Conference on Data Mining, pp. 995–100, 2010.
J. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative prediction. ICML Conference, pp. 713–718, 2005.
R. Salakhutdinov, and A. Mnih. Probabilistic matrix factorization. Advances in Neural and Information Processing Systems, pp. 1257–1264, 2007.
R. Salakhutdinov, and A. Mnih. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. International Conference on Machine Learning, pp. 880–887, 2008.
R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted Boltzmann machines for collaborative filtering. International conference on Machine Learning, pp. 791–798, 2007.
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. World Wide Web Conference, pp. 285–295, 2001.
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Application of dimensionality reduction in recommender system – a case study. WebKDD Workshop at ACM SIGKDD Conference, 2000. Also appears at Technical Report TR-00-043, University of Minnesota, Minneapolis, 2000. https://wwws.cs.umn.edu/tech_reports_upload/tr2000/00-043.pdf
D. Seung, and L. Lee. Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems, 13, pp. 556–562, 2001.
H. Shen and J. Z. Huang. Sparse principal component analysis via regularized low rank matrix approximation. Journal of multivariate analysis. 99(6), pp. 1015–1034, 2008.
M.-L. Shyu, C. Haruechaiyasak, S.-C. Chen, and N. Zhao. Collaborative filtering by mining association rules from user access sequences. Workshop on Challenges in Web Information Retrieval and Integration, pp. 128–135, 2005.
G. Strang. An introduction to linear algebra. Wellesley Cambridge Press, 2009.
N. Srebro, J. Rennie, and T. Jaakkola. Maximum-margin matrix factorization. Advances in neural information processing systems, pp. 1329–1336, 2004.
X. Su, T. Khoshgoftaar, X. Zhu, and R. Greiner. Imputation-boosted collaborative filtering using machine learning classifiers. ACM symposium on Applied computing, pp. 949–950, 2008.
G. Takacs, I. Pilaszy, B. Nemeth, and D. Tikk. Matrix factorization and neighbor based algorithms for the Netflix prize problem. ACM Conference on Recommender Systems, pp. 267–274, 2008.
S. Vucetic and Z. Obradovic. Collaborative filtering using a regression-based approach. Knowledge and Information Systems, 7(1), pp. 1–22, 2005.
M. Weimer, A. Karatzoglou, Q. Le, and A. Smola. CoFiRank: Maximum margin matrix factorization for collaborative ranking. Advances in Neural Information Processing Systems, 2007.
S. Wild, J. Curry, and A. Dougherty. Improving non-negative matrix factorizations through structured initialization. Pattern Recognition, 37(11), pp. 2217–2232, 2004.
Z. Xia, Y. Dong, and G. Xing. Support vector machines for collaborative filtering. Proceedings of the 44th Annual Southeast Regional Conference, pp. 169–174, 2006.
H. F. Yu, C. Hsieh, S. Si, and I. S. Dhillon. Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. IEEE International Conference on Data Mining, pp. 765–774, 2012.
K. Yu, S. Zhu, J. Lafferty, and Y. Gong. Fast nonparametric matrix factorization for large-scale collaborative filtering. ACM SIGIR Conference, pp. 211–218, 2009.
S. Zhang, W. Wang, J. Ford, and F. Makedon. Learning from incomplete ratings using nonnegative matrix factorization. SIAM Conference on Data Mining, pp. 549–553, 2006.
T. Zhang and V. Iyengar. Recommender systems using linear classifiers. Journal of Machine Learning Research, 2, pp. 313–334, 2002.
K. Zhou, S. Yang, and H. Zha. Functional matrix factorizations for cold-start recommendation. ACM SIGIR Conference, pp. 315–324, 2011.
Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the Netflix prize. Algorithmic Aspects in Information and Management, pp. 337–348, 2008.
C. Ziegler. Applying feed-forward neural networks to collaborative filtering, Master’s Thesis, Universitat Freiburg, 2006.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Aggarwal, C.C. (2016). Model-Based Collaborative Filtering. In: Recommender Systems. Springer, Cham. https://doi.org/10.1007/978-3-319-29659-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-29659-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29657-9
Online ISBN: 978-3-319-29659-3
eBook Packages: Computer ScienceComputer Science (R0)