# Restoring latent factors against negative transfer using partial-adaptation nonnegative matrix factorization

- 34 Downloads

## Abstract

Collaborative filtering usually suffers from limited performance due to a data sparsity problem. Transfer learning presents an unprecedented opportunity to alleviate this issue through the transfer of useful knowledge from an auxiliary domain to a target domain. However, the situation becomes complicated when the source and target domain share partial knowledge with each other. Transferring the unshared part across domains will cause negative transfer and may degrade the prediction accuracy in the target domain. To address this issue, in this paper, we present a novel model that exploits the latent factors in the target domain against the negative transfer. First, we transfer rating patterns from the source domain to approximate and reconstruct the target rating matrix. Second, to be specific, we propose a partial-adaptation nonnegative matrix factorization method to correct the transfer learning result and restore latent factors in the target. The final experiments completed on real world datasets demonstrate that our proposed approach effectively addresses the negative transfer and significantly outperforms the state-of-art transfer-learning model.

## Keywords

Transfer learning Cross-domain recommendation Negative transfer## 1 Introduction

Recommendation systems help users faced with an overwhelming selection of items by identifying particular items that are likely to match each user’s tastes or preferences. Increasingly, people are turning to recommender systems to help them find the information that is most valuable to them. One of the most successful technologies in recommender systems research and practice is collaborative filtering.

The CF approach gathers users and ratings and then predicts what users will rate based on their similarity to other users. Collaborative methods can be divided into two models: the neighbourhood-based model (NBM) (Alqadah et al. 2015; Xiaojun 2017) and the latent factor model (LFM) (Langseth and Nielsen 2015). Some of the most successful realizations of LFMs are based on matrix factorization (MF) (Yu et al. 2017; Abdollahi and Nasraoui 2016; Bokde et al. 2015). However, pure latent factor models suffer from several problems, such as poor prediction, sparsity, scalability, etc. In real-world recommender systems, users can rate a very limited number of items. Thus, the rating matrix is often extremely sparse. As a result, the available rating data that can be used for K-NN searches, probabilistic modelling, or matrix factorization are radically insufficient. The sparsity problem has become a major bottleneck for most CF methods.

*S*is constructed by simultaneously clustering the users (rows) and items (columns) of \(X_{src}\), indicating the rating that a user belonging to a specific user cluster \(u_{src}\) provides to an item belonging to a specific item cluster \(v_{src}\). Then the missing values in the target domain \(X_{tgt}\) can be learned by duplicating the rows and columns of the codebook using \(U_{tgt} S{V_{tgt}^T}\), This approximation can be achieved by the following matrix norm optimization:

In our previous studies, we tried to use the context information to restore the target domain specific character and validate the effectiveness of the context restoration model. However, that model needs two necessary conditions. We first must get a ratings’ reference standard for the target domain to construct a rating bias matrix. Next, the target domain needs to contain context knowledge which can help group target items or users. In this way, we can get both relationship matrixes between user-context and item-context. In the real-world, some datasets well satisfy these two conditions, such as MovieLens, while the others have difficulty meetting them, such as BookCrossing. Therefore, in this study, we propose a model that restores the target specific characters after the transfer learning process without any additional conditions needed for the target domain.

## 2 Related work

### 2.1 Transfer learning

*R*is the graph regularized function and

*W*is weight matrix on the graph. \({{W_{jl}}}\) is used to measure the closeness of two points \({x_l}\) and \({x_j}\). \({{N_p}\left( {{x_j}} \right) }\) is the

*p*nearest neighbors of \({x_j}\). \({z_j}\) and \({z_l}\) are low dimensional representation of \({x_j}\) and \({x_l}\), respectively.

As for the research works of context-aware and adaptive models, tensor factorization (Yao et al. 2015) and context-based splitting methods (Zheng et al. 2014) have been proved useful in the single domain context-aware collaborative filtering. However, the methods used to restore a fulfilled matrix and enhance the transfer learning results are still a topic for further investigation. In addition, Fenza et al. (2011) present a hybrid context-aware system by combining fuzzy clustering and rule mining. All of these approaches still need additional target domain context knowledge.

### 2.2 Non-negative matrix factorization

*X*into 2 nonnegative matrices

*M*-by-

*N*matrices whose elements are nonnegative and each column of

*X*is a sample vector, NMF aims to find two non-negative matrices \(U \in R_{+}^{M \times K}\) and \(V \in R_{+}^{N \times K}\) whose product can approximate the original matrix X well. The cost function of the approximation is defined in Eq. (3).

*X*contains unlabeled zero items within the matrix, we have the Incomplete-NMF cost functions Eq. (4), as follows:

*W*is a mask matrix that \(W_{ij}=1\) if \(X_{tgt}\ne 0\), else \(W_{ij}=0\). Both Eqs. (3) and (4) can be optimized iteratively:

The non-negative constraints on *U* and *V* only allow additive combinations among different bases. This is the most significant difference between NMF and the other matrix factorization methods, e.g., SVD. NMF can learn parts-based representation of the data and significantly improves interpretability as compared to SVD. The advantages of this parts-based representation have been observed in many real-world problems such as sound analysis (Helen and Virtanen 2005), face recognition (Long et al. 2014), image annotation (Kalayeh et al. 2014), visual tracking (Wu et al. 2013), document clustering (Qin et al. 2017), cancer clustering (Wang et al. 2013) and DNA gene expression analysis (Gaujoux and Seoighe 2012).

*U*, Each column \({{U_{ * k}}}\) is weighted by the corresponding items \({v_{jk}}\) in

*V*. Therefore,

*U*can be viewed as containing a basis that is optimized for the linear approximation of the data in

*X*. Since relatively few basis vectors are used to represent many data vectors, a good approximation can only be achieved if the basis vectors discover latent factors in the data (Huang et al. 2013). In this study, we propose a novel PA-NMF method to find and restore the latent factors in the target domain.

## 3 Proposed model

Our model consists of two stages, First, we transfer rating patterns from the source domain to alleviate the target domain sparsity. Second, we employ PA-NMF to learn the domain specific knowledge and restore latent factors against negative transfer in the target matrix. The restoration stage is based on the result of transfer learning.

### 3.1 Transfer rating patterns

*n*using the ONMTF (Ding et al. 2006) method. Then, we take \({B_n}\) as the medium to transfer rating patterns from sources to the target. Equation (7) is the cost function of target matrix approximation.

*p*users and

*q*items. By introducing multiple source domains, we can solve under-fitting problem in the single source CBT model. Furthermore, we also confine the value of relatedness coefficients \(\lambda _n\) in Eq. (7) to overcome the over-fitting problems in multiple sources cross-domain models (Moreno et al. 2012). In the end, we get the all ratings filled target matrix \({\tilde{X}_{tr}}\) by Eq. (8).

### 3.2 Latent factors restoration

The rating patterns transfer learning method is also known as sharing cluster-level latent factors cross domains in which we view the rating patterns as the cluster-level latent factors of the matrix (Gao et al. 2013). The cluster-level structures hidden across domains can be extracted by learning the rating patterns of user groups on the item clusters in the source domains. In the state-of-art transfer learning models (Li et al. 2009; Moreno et al. 2012; Gao et al. 2013), sparsity in the target domain is the most important cause of the low prediction accuracy, so these models just assume that the source and target domains share most of their latent factors. Then, they use all the source domain latent factors to approximate and reconstruct the target domain matrix as in Eqs. (7) and (8). However, as we have introduced before, the assumption of sharing all latent factors cross domains dose not always hold in real-world circumstances, where the ratings from multiple domains cannot share all their correspondence in the cluster level. There are neither two identical datasets nor entirely sharing latent factors datasets in the real world. Therefore, that transfer learning stage must lead to a more or less negative transfer cross-domains. For this reason, we first formulate the latent factors sharing assumption and then, we propose a method to correct the transfer learning result \({\tilde{X}_{tr}}\) and restore the latent factors of the target domain \(X_{tgt}\).

#### 3.2.1 Problem definition

*U*and

*V*represent the latent factors which share \({\tilde{X}_{tr}}\) and \(W\circ {\tilde{X}_{tr}}\). Since \(W\circ {\tilde{X}_{tr}}\) is the fixed part in \({\tilde{X}_{tr}}\) in Eq. (10), we demonstrate our objective equation in Eq. (11)

*U*,

*V*and adjust the variable part \(\left( {1 - W} \right) {\tilde{X}_{tr}}\) so that Eq. (9) is established.

#### 3.2.2 Partial-adaptation NMF

The state-of-art NMF methods will find latent factors for all valued items and approximate the whole matrix \(\tilde{X}_{tr}\). Thus, we cannot use this kind of NMF to fix one part of the matrix \(W\circ {\tilde{X}_{tr}}\) and adjust the other part \(\left( {1 - W} \right) \circ {\tilde{X}_{tr}}\). Therefore, we introduce the partial adaptation concept to NMF by where we can adjust the partial matrix and make it more adaptive with the latent factors of the fixed part in the matrix.

*U*and

*V*. Second, We try to fix part of the target matrix \(W \circ {\tilde{X}_{tr}}\) then, we update and adjust the other part of the matrix \(\left( {1 - W} \right) \circ {\tilde{X}_{tr}}\), Therefore, we replace

*X*with \(W \circ {\tilde{X}_{tr}} + \left( {1 - W} \right) \circ U{V^T}\) in the iteration functions in Eq. (5). The first term of the replacement \(W \circ {\tilde{X}_{tr}}\) is the fixed part of the matrix and the second term \(\left( {1 - W} \right) \circ U{V^T}\) is the transferred ratings that can be iteratively adjusted based on the latent factors matrix

*U*and

*V*in the target domain. The updating function for PA-NMF is done using Eqs. (12) and (13).

*U*and

*V*of the target latent factors can be restored by adjusting \(\left( {1 - W} \right) \circ {\tilde{X}_{tr}}\) during the iterations. Finally, we can get the restored matrix \(\tilde{X}_{re}\) by Eq. (14).

#### 3.2.3 Convergence analysis

*U*and

*V*for the target matrix can monotonically reduce the value of the two equivalent cost functions in Eq. (10) and make them converge to a local minimum.

#### 3.2.4 Algorithm

Algorithm 1 demonstrates the first (line 7–9) and second (line 10–20) steps of PA-NMF. In practice, it is crucial to deal with overfitting in the latent factors restoration stage. This problem can be overcome by dividing the training data into two parts and using the validation set to decide whether to stop the iteration or not. Besides, we deal with this issue by cooperating multiple parameters to stop the iteration early. The *K* in Eq. (3) can be set to a relatively large value to prevent the restoration stage from introducing new noise in PA-NMF process. In addition, setting \(M< < T\) can speed up the learning progress. The values of the iteration count *T* and error threshold \(\psi\) help us to stop the iteration under the appropriate circumstances. All these parameters are peculiar for a specific target domain and can be decided by cross validation.

As our restoration model is running at the tipping point between overfitting and underfitting, the rule of thumb in practice is to get a relatively lower value of *T* or a higher value of \(\psi\) to let the iteration stop early and make slightly underfit the results *U* and *V* for Eq. (10) (line 10–16). Moreover, we try the PA-NMF algorithm *N* times in the outermost loop (line 3–21) and seek \({U_f}\) and \({V_f}\) in order to produce the smallest value of the cost function Eq. (10) to compensate for the loss of accuracy due to the underfitting that we needed to end the iteration early (line 18–20).

## 4 Experiments

### 4.1 Datasets setup

We used Netflix and Jester as the source domains in the transfer learning stage and predict the missing ratings on MovieLens and BookCrossing. We extracted a relatively dense part from the huge Netflix dataset with 38,934 ratings and \(97.3\%\) density. The Jester dataset we used is \(100\%\) dense. As for the target domains, MovieLens is \(3.8\%\) dense with 89,132 ratings and BookCrossing is \(2.9\%\) dense with 11,003 ratings.

*p*is obviously larger than the number of items

*q*. The setup for all the datasets is shown in Table 1.

Datasets setup

Domain | Name | Scale | Density (%) | Quantity |
---|---|---|---|---|

Source | Netflix | 200 × 200 | 97.3 | 38,934 |

Jester | 200 × 100 | 100 | 20,000 | |

Target | MovieLens | 671 × 3473 | 3.8 | 89,132 |

BookCrossing | 1157 × 325 | 2.9 | 11,003 |

### 4.2 Experiment results

Table 2 and Fig. 3 compare the prediction accuracy of 7 experiments performed using on MovieLens and BookCrossing data. As we expected, our latent factors restoration (LFR) model clearly and solidly improves the accuracy of the transfer learning (TL) results. For MovieLens, the MAE values decrease from 0.037 to 0.055 (the average is 0.043). For BookCrossing, the MAE values decrease from 0.032 to 0.068 (the average is 0.057). Furthermore, in Fig. 4, we can clearly distinguish the latent factors restoration stage from the start-up transfer learning stage when the curves drop down again after being flat. Figure 4 demonstrates that the transfer learning stage converges after 5–12 iterations on both the MovieLens and BookCrossing data. As for the difference between the two target datasets, Fig. 4 shows that the test error iteration curves have greater fluctuations for BookCrossing than for MovieLens.

MAE values of 7 times experiments on MovieLens and BookCrossing

MovieLens | 1 | 2 | 3 | 4 | 5 | 6 | 7 | STDEV |
---|---|---|---|---|---|---|---|---|

TL | 0.679 | 0.698 | 0.681 | 0.674 | 0.675 | 0.695 | 0.661 | 0.0117 |

LFR | \({{\varvec{0.636}}}\) | \({{\varvec{0.644}}}\) | \({{\varvec{0.641}}}\) | \({{\varvec{0.637}}}\) | \({{\varvec{0.633}}}\) | \({{\varvec{0.639}}}\) | \({{\varvec{0.629}}}\) | 0.0045 |

BookCrossing | 1 | 2 | 3 | 4 | 5 | 6 | 7 | STDEV |
---|---|---|---|---|---|---|---|---|

TL | 0.626 | 0.629 | 0.609 | 0.623 | 0.653 | 0.640 | 0.602 | 0.0159 |

LFR | \({{\varvec{0.569}}}\) | \({{\varvec{0.573}}}\) | \({{\varvec{0.565}}}\) | \({{\varvec{0.555}}}\) | \({{\varvec{0.572}}}\) | \({{\varvec{0.572}}}\) | \({{\varvec{0.570}}}\) | 0.0060 |

## Notes

### Acknowledgements

The work is supported by the Beijing Natural Science Foundation (No. 4192008) and the General Project of Beijing Municipal Education Commission (No. KM201710005023).

## References

- Abdollahi, B., Nasraoui, O.: Explainable matrix factorization for collaborative filtering. In: Proceedings of the 25th International Conference Companion on World Wide Web, International World Wide Web Conferences Steering Committee, 5–6 (2016, April)Google Scholar
- Alqadah, F., Reddy, C.K., Hu, J., Alqadah, H.F.: Biclustering neighborhood-based collaborative filtering method for top-n recommender systems. Knowl. Inf. Syst.
**44**(2), 475–491 (2015)CrossRefGoogle Scholar - Bokde, D., Girase, S., Mukhopadhyay, D.: Matrix factorization model in collaborative filtering algorithms: a survey. Procedia Comput. Sci.
**49**, 136–146 (2015)CrossRefGoogle Scholar - Cai, D., He, X., Han, J.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell.
**33**(8), 1548–1560 (2010)Google Scholar - Chen, D., Plemmons, R.J.: Nonnegativity constraints in numerical analysis. In: Bultheel, A. (ed.) The Birth of Numerical Analysis, pp. 109–139. World Scientific, Singapore (2010)zbMATHGoogle Scholar
- Ding, C., Li, T., Peng, W.: Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 126–135 (2006)Google Scholar
- Fenza, G., Fischetti, E., Furno, D.: A hybrid context aware system for tourist guidance based on collaborative filtering. In: IEEE International Conference on Fuzzy Systems, pp. 131–138 (2011)Google Scholar
- Gao, S., Luo, H., Chen, D.: Cross-domain recommendation via cluster-level latent factor model. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 161–176. Springer, Berlin (2013)Google Scholar
- Gaujoux, R., Seoighe, C.: Semi-supervised nonnegative matrix factorization for gene expression deconvolution: a case study. Infect. Genet. Evolut.
**12**(5), 913–921 (2012)CrossRefGoogle Scholar - Helen, M., Virtanen, T.: Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine. In: 13th European Signal Processing Conference, 1–4 (2005)Google Scholar
- Huang, K., Sidiropoulos, N.D., Swami, A.: Non-negative matrix factorization revisited: uniqueness and algorithm for symmetric decomposition. IEEE Trans. Signal Process.
**62**(1), 211–224 (2013)MathSciNetCrossRefGoogle Scholar - Kalayeh, M.M., Idrees, H., Shah, M.: NMF-KNN: image annotation using weighted multi-view non-negative matrix factorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 184–191 (2014)Google Scholar
- Langseth, H., Nielsen, T.D.: Scalable learning of probabilistic latent models for collaborative filtering. Decis. Support Syst.
**74**, 1–11 (2015)CrossRefGoogle Scholar - Langville, A.N., Meyer, C.D., Albright, R. (2014) Algorithms, initializations, and convergence for the nonnegative matrix factorization. arXiv:1407.7299 (2014)
- Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556–562 (2001)Google Scholar
- Li, B., Yang, Q., Xue, X.: Can movies and books collaborate? Cross-domain collaborative filtering for sparsity reduction. In: Twenty-First International Joint Conference on Artificial Intelligence (2009)Google Scholar
- Lin, T., Zha, H.: Riemannian manifold learning. IEEE Trans. Pattern Anal. Mach. Intell.
**30**(5), 796–809 (2018)Google Scholar - Long, X., Lu, H., Peng, Y.: Graph regularized discriminative non-negative matrix factorization for face recognition. Multimed. Tools Appl.
**72**(3), 2679–2699 (2014)CrossRefGoogle Scholar - Long, M., Wang, J., Ding, G.: Adaptation regularization: a general framework for transfer learning. IEEE Trans. Knowl. Data Eng.
**26**(5), 1076–1089 (2013)CrossRefGoogle Scholar - Moreno, O., Shapira, B., Rokach, L.: TALMUD: transfer learning for multiple domains. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 425–434 (2012)Google Scholar
- Qin, A., Shang, Z., Tian, J.: Maximum correntropy criterion for convex anc semi-nonnegative matrix factorization. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1856–1861 (2017)Google Scholar
- Wang, J.J.Y., Wang, X., Gao, X.: Non-negative matrix factorization by maximizing correntropy for cancer clustering. BMC Bioinform.
**14**(1), 107 (2013)MathSciNetCrossRefGoogle Scholar - Wu, Y., Shen, B., Ling, H.: Visual tracking via online nonnegative matrix factorization. IEEE Trans. Circuits Syst. Video Technol.
**24**(3), 374–383 (2013)CrossRefGoogle Scholar - Xiaojun, L.: An improved clustering-based collaborative filtering recommendation algorithm. Clust. Comput.
**20**(2), 1281–1288 (2017)CrossRefGoogle Scholar - Yao, L., Sheng, Q.Z., Qin, Y.: Context-aware point-of-interest recommendation using tensor factorization with social regularization. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1007–1010 (2015)Google Scholar
- Yu, Y., Wang, C., Wang, H., Gao, Y.: Attributes coupling based matrix factorization for item recommendation. Appl. Intell.
**46**(3), 521–533 (2017)CrossRefGoogle Scholar - Zheng, Y., Burke, R., Mobasher, B.: Splitting approaches for context-aware recommendation: an empirical study. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing, pp. 274–279 (2014)Google Scholar
- Zhou, D., Hofmann, T., Schölkopf, B.: Semi-supervised learning on directed graphs. In: Advances in Neural Information Processing Systems, pp. 1633–1640 (2005)Google Scholar