Keywords

1 Introduction

With the advent of the World Wide Web, information has increased exponentially and so is the need of customized techniques to filter relevant and personalized information. Recommender system is an essential tool for providing customized information about products and services. It is widely used in various domains such as movies, music, books, research articles, electronic products, apparel, etc. Collaborative filtering (CF) approach is one of the popular approaches in the design of recommender systems [3]. It uses rating information of the items provided by the users in the past [2, 3]. CF techniques are further classified as memory based CF and model based CF [6]. Memory based CF, also known as neighborhood based CF relies on a simple intuition that an item might be interesting to an active user if the item is appreciated by a set of similar users or if the user has appreciated similar items in the past [2, 4, 12]. On the other hand, model based CF learns the features or patterns from rating information using machine learning algorithms [3]. Once the model is trained, it is used to predict the ratings of unrated items. Examples of model based collaborative filtering are matrix factorization [7, 10], tendency based [3], etc.

All the aforementioned algorithms focus on improving the accuracy of recommender system. However, accuracy oriented approaches have two major limitations. These approaches recommend items that are very similar to the items consumed or rated by the user in the past, leading to lack of diversity in recommendations [1, 8]. Further, these approaches are biased towards recommending popular items due to which non popular items (referred as long tail items) are ignored leading to loss in business [5]. Overcoming the above mentioned shortcomings result in an improvement in overall user experience and increase in business profits.

Over the last decade, there are a few approaches proposed in the literature that focus on improving diversity and long tail item recommendation. Adomavicius and Kwon proposed a ranking-based approach that improves diversity based on the statistics of recommended items such as reverse predicted rating, item popularity ranking, item absolute likeability, item relative likeability, etc. [1]. There are other variants of ranking-based approach proposed in the literature such as clustering-based approach (KRCF) [11] and graph-based reranking technique [9]. Clustering-based approach known as Knowledge Reuse Framework in Collaborative Filtering (KRCF) clusters predicted items and pick items from each cluster exploiting inter cluster dissimilarity [11]. Özge and Tevfik proposed two methods to improve diversity. First method is graph-based reranking technique which is applied after predicting the ratings of unrated items and the second method incorporates a diversity factor while training the model using matrix factorization approach [9]. Pareto-efficient multi objective ranking technique is proposed to maximize accuracy and diversity in order to recommend accurate and unpopular items [13]. Valcarce et al. proposed an approach referred as item-based relevance modelling (IRM) to improve long tail item recommendations [14]. IRM utilizes a probabilistic approach to build the relevance model on long tail items.

Though the above mentioned approaches improve diversity, these approaches do not integrate different aspects of diversity such as popularity of item, finding items dissimilar to users’ past ratings etc. We address this issue by integrating these aspects and categorized them into different item classes in order to reap the maximum benefits. We propose a Hybrid Reranking framework in Collaborative Filtering (HyReCF) that utilizes various statistics of the recommended items to improve diversity and long tail item recommendations. Our contributions are summarized as follow.

  • The predicted ratings obtained from collaborative filtering and the item classes are fused to generate topN recommendations for each user.

  • We identify personalized dissimilar items to improve diversity.

  • The proposed technique, HyReCF is extensively tested on two datasets (MovieLens and Netflix) and exhaustive study shows that HyReCF outperforms the state-of-the-art in terms of diversity and long tail recommendations.

The rest of the paper is organized as follows. Section 2 discusses proposed methodology in detail. Experimental results and analysis are provided in Sect. 3. Finally, we conclude our work in Sect. 4.

2 HyReCF: Proposed Hybrid Reranking Framework in Collaborative Filtering

In this section, we explain the proposed approach termed as Hybrid Reranking Framework in Collaborative Filtering (HyReCF) in detail. The proposed approach is carried out in two phases. In phase I (referred as prediction phase), a list of predictions (\(I_u\)) is generated for each user applying a traditional collaborative filtering technique. Main idea of this approach is to deploy this framework with the existing collaborative recommender system to offer more diverse and long tail items to the users. Therefore, we can apply one of the popular traditional collaborative filtering approaches such as user-based CF, item-based CF, matrix factorization, etc. in the first phase to generate prediction list \(I_u\) for a user.

In phase II (referred as reranking phase), significant factors such as popularity of item, finding items dissimilar to users’ past ratings and good predicted items need to be introspected to attain diverse and long tail item recommendations. The predicted items are categorized into different classes to achieve maximum diversity as well as to improve long tail item recommendations. Several aspects such as relevance, popularity, and similarity of the items are considered while defining the classes. For each user, the prediction items list (\(I_u\)) obtained from phase I is categorized into three classes: (1) Unpopular Items, (2) Personalized Dissimilar Items, (3) Good Predicted Items.

Unpopular Items: In order to improve long tail item recommendations, the prediction list \(I_u\) is ranked in the increasing order of their popularity. The popularity of an item is defined as number of users who rated that item. This approach ensures that less popular items or long tail items are prominent in the user recommendations.

Personalized Dissimilar Items: A predicted item is said to be personalized dissimilar if it is different from all the items rated by the user in the past. Let \(I_R = \{ i_1, i_2, \ldots , i_r \}\) be the set of items rated by user u in the past. For an unrated item i and each item j in \(I_R\), the dissimilarity can be computed using Eq. 1.

$$\begin{aligned} dis(i,j)=1-sim(i,j) \end{aligned}$$
(1)

where sim(ij) is similarity between item i and item j. Similarity measures such as Adjusted Cosine, Cosine, Pearson Correlation, etc. can be used to compute sim(ij). We compute aggregate dissimilarity between the item i and rated set \(I_R\). Any one of the popular aggregate functions such as average, maximum or minimum can be used to compute aggregate dissimilarity. Once the aggregate dissimilarities of all the unrated items are computed, these items are ranked in the descending order of aggregate dissimilarities to obtain personalized dissimilarity items class.

Good Predicted Items: A predicted item is considered as a good predicted item, if its predicted rating is greater than or equal to a certain threshold value (Th).

$$\begin{aligned} I^{Rel}_u = \{i \in I | \forall r^*_{u,i} \ge Th \} \end{aligned}$$
(2)

where \(r^*_{u,i}\) is the predicted rating of a user u for an item i and I is the set of items in the system. This class ensures accuracy in the recommendation list.

Most of the recommender systems provide topN items, where topN represents the number of items recommended to a user. HyReCF also recommends topN items to each user. For each user, \(\alpha \times topN/3\) items are selected from each of the above mentioned classes where \(\alpha \) is an accuracy factor and holds an integer value. As the value of \(\alpha \) increases accuracy also increases. Finally, standard ranking approach is applied on total of \(\alpha \times topN\) items to generate a recommendation list \(I^{Rec}_u\) of topN items. This approach reranks the items in ascending order of their predicted rating values and picks the topN items. The overview of the proposed approach is shown in Fig. 1.

Fig. 1.
figure 1

Hybrid Reranking Framework in Collaborative Filtering (HyReCF).

3 Experimental Evaluation

The details of experiments and result analysis are described in this section. In phase I of HyReCF, for predicting users’ ratings of unrated items, we use matrix factorization (MF) [7] and item based collaborative filtering techniques [12]. To evaluate performance of proposed HyReCF framework, we implemented the traditional matrix factorization approach [7], item based collaborative filtering [12], ranking-based approach [1] and clustering-based approach (KRCF)[11]. All variants of ranking based technique and KRCF are executed. Out of which Item Absolute Likeability provides the best results [1]. So, results are reported with “Item Absolute Likeability” ranking approach. We used Ward’s Minimum distance clustering approach in our experiments as it provides best results. Throughout all the experiments topN recommendations for all algorithms is considered, where the value of N is set to 8.

3.1 Dataset Description and Experimental Settings

All the implemented approaches are tested with two real world movie rating datasets namely, MovieLens and Netflix. Subsets of these datasets are created to ensure that each user has rated at least 20 items. Sparsity level of a dataset, denoted as K is the percentage of all missing ratings in a dataset. The statistics of working dataset are summarized in Table 1. In order to find long tail items, the item popularity graphs are plotted for MovieLens and Netflix datasets, as shown in Fig. 2(a) and (b), respectively. The items are sorted in descending order of their popularity. It can be noted that the items in the tail section of the plots received fewer ratings from the users, therefore considered as long tail items.

Fig. 2.
figure 2

Item popularity graph for (a) MovieLens Dataset (b) Netflix Dataset.

Table 1. Dataset description

3.2 Evaluation Metrics

The following evaluation metrics are used to measure the performance of the proposed methodology.

Precision-in-TopN: It is defined as the ratio of recommended items that are actually relevant to all the recommended items. Let Rec(u) be the set of topN recommended items for user u, recommendation accuracy of a recommender system can be expressed as \(\textit{precision-in-TopN} = {\sum \limits _{u \epsilon U} {|Rec(u) \cap Rel(u)|}}/ \sum \limits _{u \epsilon U} {|Rec(u)|}\), where Rel(u) is the set of relevant items (ratings \(\ge \) Th) of user u in the test set.

Aggregate Diversity (AD): It is defined as number of distinct items that occur in the topN list of all the users [1]. The AD is computed as \(|\cup _{u\epsilon U} Rec(u)|\).

This metric gives an aggregate diversity achieved in the recommendations. However, it does not provide enough information regarding how many long tail items are recommended. To overcome this issue, we propose a new metric Long Tail denoted as LT.

Long Tail (LT): It is defined as number of times distinct long tail items occur in the topN list of all the users.

3.3 Experimental Results and Comparison

In order to compare improvement in diversity we keep precision of the proposed approach close to the standard approach. To achieve this we set \(\alpha =3\) and aggregate function to be maximum while creating personalized dissimilar items class. Comparative results are shown in Table 2 and 3 for MovieLens and Netflix datasets respectively. As observed from Tables 2 and 3 precision of the standard approaches is highest as their main focus is accuracy. However, HyReCF outperforms in terms of aggregate diversity and long tail item recommendations.

Table 2. Experimental results on MovieLens Dataset.

From Table 2, it can be noted that for prediction algorithm as matrix factorization approach, with 1% loss in precision, HyReCF approach achieves 227 items gain in aggregate diversity and 166 more long tail items are recommended with respect to traditional matrix factorization approach. In comparison to Ranking approach, with a little loss of 0.25% in precision, gain of 140 items in aggregate diversity and 122 items in long tail are obtained. Likewise, in item based CF, HyReCF earns a significant gain of 298 items in aggregate diversity and 216 in long tail with a loss of 1.1% in precision compared with traditional item based CF approach. A profit of 250 items in aggregate diversity and 196 items in long tail is attained with a loss of 0.9% in precision in comparison to Ranking approach.

Table 3. Experimental results on Netflix Dataset.

Comparative results on Netflix dataset are reported in Table 3. For prediction algorithm as matrix factorization approach, we can observe that with a 1% loss in precision we attain 723 items gain in AD and 435 items in LT recommendation, while compared with standard matrix factorization. However, when compared to ranking approach, we achieve gain of 205 items in AD and 232 items in long tail recommendation with a loss of 0.41%. Similar observations can be drawn for item based CF approach.

IRM is an approach that provides long tail item recommendation [14]. In order to observe the effectiveness of long tail item recommendations, we compare HyReCF with IRM. Table 4 and 5 report the results in terms of precision loss and gain in long tail item recommendation for IRM and HyReCF approach with respect to standard matrix factorization approach.

Table 4. Experimental results on MovieLens Dataset.

From Table 4, it can be noted that with 0.100 precision loss, IRM gains 51 items whereas 273 LT items gain is achieved in HyReCF which is a significant improvement. IRM performs adversely in terms of aggregate diversity. With the loss in precision there is a loss in diversity instead of having gain whereas HyReCF shows a significant improvement in diversity with a small drop in accuracy. Table 5 reports result on Netflix dataset. With a loss of 5%, IRM gains 387 long tail items and HyReCF gains 590 long tail items. It can be observed that HyReCF outperforms IRM in both diversity and long tail item recommendation.

The proposed approach, HyReCF is further analyzed on varying precision using accuracy factor \(\alpha \). Figure 3(a) and (b) depict the performance of HyReCF on MovieLens and Netflix datasets respectively. From Fig. 3(a), in matrix factorization approach when \(\alpha =1\), precision is 0.6324 and aggregate diversity is 2817 items. At \(\alpha =2\), precision is 0.7776 and aggregate diversity is 2601. At \(\alpha =3\), precision is 0.8225 and aggregate diversity is 2194. It can be observed from the Fig. 3(a) and (b), as precision decreases, we obtain significant improvement in diversity.

Table 5. Experimental results on Netflix Dataset.
Fig. 3.
figure 3

Performance of HyReCF on (a) MovieLens Dataset (b) Netflix Dataset.

4 Conclusion

In this paper, we proposed Hybrid Reranking Framework in Collaborative Filtering (HyReCF), a technique to improve diversity and long tail item recommendations. Various statistics of the rating information and recommended items are integrated to improve diversity with a small drop in accuracy. The proposed approach provides a significant improvement in the diversity as compared to state-of-the-art. The long tail items are more prominent in the proposed framework than state-of-the-art which makes system interesting for the users as well as increases profits for the business organizations.