# Collaborative Filtering

**DOI:**https://doi.org/10.1007/978-3-319-32001-4_274-1

- 29 Downloads

## Abstract

Collaborative filtering (CF) is a process to filter information or patterns with collaboration among multiple agents and resources. The main idea of CF is to effectively extract useful information from the overwhelming amount of collected data. This article discusses the perception of CF techniques and explains how to utilize CF in a recommender system (RS). RS provides recommendations to an active user based on items that other similar users prefer. CF makes automatic predictions of a user’s interests by utilizing stored data of various users, which makes it a key method for RS.

## Synonyms

## Introduction

Collaborative filtering (CF) entirely depends on users’ contribution such as ratings or reviews about items. It exploits the matrix of collected user-item ratings as the main source of input. It ultimately provides the recommendations as an output that takes the following two forms: (1) a numerical prediction to items that might be liked by an active user *U* and (2) a list of top-rated items as top-N items. CF claims that similar users express similar patterns of rating behavior. Also, CF claims that similar items obtain similar ratings. There are two primary approaches of CF algorithms: (1) neighborhood-based and (2) model-based (Aggarwal 2016).

The neighborhood-based CF algorithms (aka, memory-based) directly utilize stored user-item ratings to predict ratings for unseen items. There are two primary forms of neighborhood-based algorithms: (1) user-based nearest neighbor CF and (2) item-based nearest neighbor CF (Aggarwal 2016). In the user-based CF, two users are similar if they rate several items in a similar way. Thus, it recommends to a user the items that are the most preferred by similar users. In contrast, the item-based CF recommends to a user the items that are the most similar to the user’s previous purchases. In such an approach, two items are similar if several users have rated these items in a similar way.

The model-based CF algorithms (aka, learning-based models) form an alternative approach by sending both items and users to the same latent factor space. The algorithms utilize users’ ratings to learn a predictive model (Ning et al. 2015). The latent factor space attempts to interpret ratings by characterizing both items and users on factors automatically inferred from previous users’ ratings (Koren and Bell 2015).

## Methodology

### Neighborhood-Based CF Algorithms

#### User-Based CF

*item3*by the active user

*Andy*.

User-item rating dataset

User name | Item1 | Item2 | Item3 | Item4 |
---|---|---|---|---|

Andy | 3 | 3 | ? | 5 |

U1 | 4 | 2 | 2 | 4 |

U2 | 1 | 1 | 4 | 2 |

U3 | 5 | 2 | 3 | 4 |

In order to solve the task presented above, the following notations are given. The set of users is symbolized as *U = {U1, .., Uu}*, the set of items is symbolized as *I = {I1, ..,Ii}*, the matrix of ratings is symbolized as *R* where *r*_{u, i} means rating of a user *U* for an item *I*, and the set of possible ratings is symbolized as *S* where its values take a range of numerical ratings {1,2,3,4,5}. Most systems consider the value 1 as strongly dislike and the value 5 as strongly like. It is worth noting that *r*_{u, i} should only take one rating value.

*Andy*and the other three users. In this example, the similarity between the users is simply computed using Pearson’s correlation coefficient (1).

*u*and

*v*.

*Andy*and

*U1*is calculated as follows:

It is worth noting that the results of Pearson’s correlation coefficient are in the range of (*+1 to − 1*), where *+1* means high positive correlation and *− 1* means high negative correlation. The similarities between *Andy* and *U2* and *U3* are 0.15 and 0.19, respectively. Referring to the previous calculations, it seems that *U1* and *U3* similarly rated several items in the past. Thus, *U1* and *U3* are utilized in this example to predict the rating of *item3* for *Andy*.

*item3*using the ratings of

*Andy’s K*-neighbors (

*U1*and

*U3*). Thus, Eq. (3) is introduced where \( \overset{\wedge }{r} \) means the predicted rating.

Given the result of the prediction computed by Eq. (4), it is most likely that *item3* will be a good choice to be included in the recommendation list for *Andy*.

#### Item-Based CF

Item-based CF algorithms are introduced to solve serious challenges when applying user-based nearest neighbor CF algorithms. The main challenge is that when the system has massive records of users, the complexity of the prediction task increases sharply. Accordingly, if the number of items is less than the number of users, it is ideal to adopt the item-based CF algorithms.

This approach computes the similarity between items instead of an enormous number of potential neighbor users. Also, this approach considers the ratings of user *U* to make a prediction for item *I*, as item *I* will be similar to the previous rated items by user *U*. Therefore, users may prefer to utilize their ratings rather than other users’ rating when making the recommendations.

In Equation (5), \( \overline{ri} \) and \( \overline{rj} \) are the average rating of the available ratings made by users for both items *i* and *j*.

*I*for user

*U*by applying Eq. (6) where

*K*means the number of neighbors of items for item

*I*.

### Model-Based CF Algorithms

Model-based CF algorithms take the raw data that has been preprocessed in the offline step where the data typically requires to be cleansed, filtered, and transformed and then generate the learned model to make a prediction. It solves several issues that appear in the neighborhood-based CF algorithms. These issues are (1) limited coverage which means finding neighbors is based on the rating of common items and (2) sparsity in the rating matrix which means the diversity of items rated by different users.

Model-based CF algorithms compute the similarities between users or items by developing a parametric model that investigates their relationships and patterns. It is classified into two main categories: (1) factorization methods and (2) adaptive neighborhood learning methods (Ning et al. 2015).

#### Factorization Methods

Factorization methods aim to define the characterization of ratings by projecting users and items to the reduced latent vector. It helps discover more expressive relations between each pair of users, items, or both. It has two main types: (1) factorization of a sparse similarity matrix and (2) factorization of an actual rating matrix (Jannach et al. 2010).

*M*can be collapsed into a product of three matrices as follows:

*U*and

*V*contain left and right singular vectors and the values of the diagonal of

*∑*are singular values.

#### Adaptive Neighborhood Learning Methods

This approach combines the original neighborhood-based and model-based CF methods. The main difference of this approach, in comparison with the basic neighborhood-based, is that the learning of the similarities is directly inferred from the user-item ratings matrix, instead of adopting pre-defined neighborhood measures.

## Conclusion

This article discusses a general perception of the CF. CF is one of the early approaches proposed for information filtering and recommendation making. However, CF still ranks among the most popular methods that people employ in nowadays for researches on Web, big data, and data mining.

## Cross-References

## References

- Aggarwal, C. C. (2016). An introduction to recommender systems. In
*Recommender systems*(pp. 1–28). Cham: Springer.Google Scholar - Golub, G., & Kahan, W. (1965). Calculating the singular values and pseudo-inverse of a matrix.
*Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis, 2*(2), 205–224.CrossRefGoogle Scholar - Jannach, D., Zanker, M., Felfernig, A., & Friedrich, G. (2010).
*Recommender systems: An introduction*. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar - Koren, Y., & Bell, R. (2015). Advances in collaborative filtering. In
*Recommender systems handbook*(pp. 77–118). Boston: Springer.CrossRefGoogle Scholar - Ning, X., Desrosiers, C., & Karypis, G. (2015). A comprehensive survey of neighborhood-based recommendation methods. In
*Recommender systems handbook*(pp. 37–76). Boston: Springer.CrossRefGoogle Scholar