Keywords

1 Introduction

Recommendation systems have become a ubiquitous part of everyday life. These systems are encountered daily in e-commerce sites such as Amazon and eBay, social media websites such as Instagram and Facebook, and entertainment websites including Netflix and Pandora. Yet, prior to their popularity in online e-commerce and entertainment, they have been an active research area since the mid-1990s [1]. Recommendation systems are a subset of “information filtering.” They predict the rating user would give to an item. Using data-mining techniques and “prediction algorithms”, these systems make highly relevant predictions for an active user [1]. Within an online context, this system helps users find products such as books, movies, or music by using ratings from the website’s users to recommend relevant products based on the preferences of similar users [1].

Recommendation systems are classified into Collaborative Filtering, Content-based Filtering, Cluster-based Filtering, and Hybrid approaches that combine item-based and user-based similarity [2]. Of these diverse approaches, Collaborative Filtering is the most well-known. There are two types of collaborative filtering “Model Based” and “Memory based” [2]. Collaborative filtering uses information filtering techniques to recommend items or products based on previous purchase history [2]. However, Collaborative filtering has some limitations such as cold start, sparsity, and scalability that cause poor performance. Over the years many methodologies have been developed to improve the performance of the system while avoiding the challenge of information overload. This literature review discusses the Model-Based Collaborative Filtering technique and current methodologies.

Moreover, this paper reviews academic journal articles and conference papers that were published between 2008–2018 that discuss the methodologies of Model-Based Collaborative filtering recommendation systems. These papers are classified by application field, methodologies, and publication year to illuminate research trajectories in the field of model-based collaborative filtering and its significance for Big Data applications.

2 Background

Recommendation systems are software tools that make predictions for users based on their likes and dislikes. Recommender systems can be divided into two groups Personalized and Non-Personalized [1, 3]. Personalized recommendation systems recommend products or items based on individual criteria. In contrast, Non-Personalized Recommendation systems do not depend on individual criteria [3]. Instead, this type of system provides the same recommendation to all the users regardless of personal likes, dislikes, or demographic location.

Moreover, Recommendation systems are classified into collaborative filtering, content based and hybrid [1]. Collaborative filtering is a technique that makes a recommendation by finding users that have similar interests [2]. Content-based systems make recommendation by using users’ background information from interaction with the system such as browsing history [2]. However, hybrid systems make prediction by integrating techniques of collaborative and content-based together [2]. However, with the challenge of recommendation systems in the Big Data context, it is more difficult to provide an accurate recommendation to an active user due to the overwhelming size of the data.

To overcome this exponential increase of Big Data challenges, researchers have been working on different techniques that can be applied to address the solution for collaborative-filtering based recommendation systems. According to [2], other challenges to successful recommendation are lack of data, data evolution, and variety of users’ preferences. Furthermore, other limitations of collaborative filtering are cold start, scalability, and complexity [1, 2]. Researchers are actively researching different techniques to eliminate these challenges and limitations.

3 Overview of Collaborative Filtering and Its Techniques

In recent years, extensive work has been done on recommendation methodologies and approaches. This section reviews general concepts and algorithms. Recommendation Systems are generally classified into content-based systems, collaborative-filtering-based systems, and hybrid designs that combine these two techniques in a single system. However, this review will emphasize those design principles relevant to Collaborative-Filtering.

3.1 Collaborative Filtering

Collaborative Filtering is one of the most popular recommendation system techniques. This system calculates similarities between the users of the system and predicts items based on their similar patterns. This method of recommendation system uses data ratings for an item provided by an active user from a large database known as a “user-item matrix” [4]. It then calculates the similarities between users by matching interest and preference for an item; this process is known as “neighborhood” [4]. Users will sometimes receive a recommendation for an item which they did not rate; however, since users with similar interests are grouped into clusters, the item was already rated by users within the same neighborhood as the original user.

3.2 Collaborative Filtering Process

The main concern of collaborative filtering is to recommend items for a user based on existing ratings within the neighborhood of similar users. This process involves the following steps:

  1. 1.

    The system represents the entire space as a two-dimensional matrix. Within a two-dimensional matrix R, where i and j correspond to users and items, respectively. Each rating given by the system is the value Rij. If an item has not been rated by a user, its item has value 0.

  2. 2.

    This matrix R is then used to predict the rating for item j provided by user i in order to make recommendations for a list of N items that the user might like.

3.3 Collaborative Filtering Algorithms and Methods

Two types of Collaborative Filtering methods exist, memory-based and model-based [5, 17].

Memory-Based Techniques.

Memory-based Collaborative Filtering techniques use the system’s memory to produce predictions. These techniques find previous users that have identical or similar interests as current users as a basis to predict items for current users. Once the current user rates items from its user-item dataset, algorithms use these ratings to combine users’ interests to make new predictions. Memory-based systems can be further divided into user-based and item-based methods [6].

User-Based Method.

This method compares the users’ rating pattern for an item. Then based on the comparison, it makes prediction of an item for a user which has been rated by another user within the neighborhood. This technique calculates the prediction based on user similarity by comparing their evaluations on same item [6].

Item-Based Method.

This method computes the prediction based on the similarity between items. The equivalence between the user-based and item-based recommendations can be calculated using the Pearson correlation coefficient and cosine similarity [6].

Model-Based Techniques.

Model-Based recommendation systems incorporate machine-learning and use user ratings and preferences to learn a model about the user. Hidden characteristics and item preferences train the model to offer new predictions for a user [7]. This system also uses users’ implicit information, such as music played, books read, or websites visited to make recommendation to the users. Methods for Model-Based Collaborative Filtering include Cluster Models, Bayesian Networks, Association Rule, and Neural Networks [2]. Ensemble techniques combine one or more of these approaches. Hybrid approaches combine collaborative filtering with another technique. Figure 1 shows various model-based techniques for collaborative filtering.

Fig. 1.
figure 1

Model-based collaborative filtering techniques

Clustering.

Clustering algorithms are a form of unsupervised machine learning that structure data based on a predefined model. Used extensively in image processing, pattern recognition, and statistical parameter models, this model partitions data into clusters. Once stable clusters have been created, parameters will be used to average out the recommendation [4]. In this approach users with similar interests are grouped together to make recommendations within the neighborhood [8]. A good clustering method possesses high intra-cluster similarity and low inter-cluster similarity [4]. The most commonly used clustering algorithm is K-mean. This algorithm is relatively simple to implement and consistently provides better accuracy for recommendations than other algorithms.

Bayesian Networks.

Bayesian networks are based on conditional probability and Bayes theorem [4]. Ratings will be determined from each node which represent the item.

Association Rule Mining.

Association Rule Mining is used to describe the relationship between items that have been purchased together. This mining algorithm makes prediction about an item, based on items that have been purchased in the past or purchased concurrently in a transaction [9].

Neural Network.

Neural Networks resemble the human brain where many neurons link to each other. In a neural network, neurons are connected in layers that includes input and output nodes. According to [10] there are many kinds of neural network, but one basic neural network is Multiple Layer Perceptron (MLP). There are two kinds of neural networks commonly used in collaborative filtering recommendation systems, user-based and item-based [10]. In user-based networks, input nodes refer to the user’s previous preferences and output nodes refer to user’s preference for an item [10]. However, item-based input node refers to an item preference and output node corresponds to a user’s preference for the target item [10].

Regression.

Regression is the statistical process to determine the relationship within datasets. Specifically, it is a method to ascertain the association between independent variable(s) and a dependent variable [5]. The main purpose for this approach is to determine users’ rating within their neighborhood [11]. This approach helps predict hidden characteristics of relationships among users’ rating “habits” [11]. This method calculates users’ ratings and identifies common patterns between users and their neighborhood.

Ensemble.

An ensemble model is the combination of two or more algorithms and methods to provide recommendations in order to improve the results from a single method. This model contains four components that include “Boosting”, “Bagging,” “Fusion” (combining several models that use collaborative filtering methods), and “Randomness” [12].

4 Limitations of Collaborative Filtering

Although collaborative filtering is among the most popular recommendation system techniques, there are some challenges discussed below.

4.1 Cold Start

Cold start can be described in three scenarios, such as “new community”, “new item” and “new user” [13].

New Community.

This problem refers to the challenge of gathering sufficient initial ratings to make recommendations to a new group of users [13].

New Item.

This problem occurs when a new item enters the system. In the beginning, new items do not have ratings; therefore, they are less likely to be recommended. These new items go unnoticed by large parts of the community [13].

New User.

New user is a significant challenge in collaborative filtering recommendation systems because there is no history of preferences to use as a basis for recommendations [13]. The user’s preferences are completely unknown to the system. Therefore, the system is unable to make a reliable recommendation to the user.

4.2 Scalability

Collaborative filtering uses billions of data to make reliable recommendation to the users which requires extensive computation resources. As collaborative filtering information grows exponentially, processing becomes expensive and inaccurate ratings can result from this Big Data challenge [2].

4.3 Sparsity

Recommendation systems rely on a massive catalog of item rankings. However, only a subset of this data is used for individual items, which leads some items having few ratings. This sparsity of rankings makes it difficult for a system to calculate recommendations [2].

Because of these limitations of collaborative filtering, one objective of this literature review is to emphasize articles and papers in conference proceedings that are dedicated to alleviating these problems using machine learning methods.

5 Research Methodology for Literature Review

It is necessary to establish the research methodology for this review and criteria for inclusion of papers. The different articles and conference proceedings are surveyed based on distributions by year, methodology, application field and dataset. This paper’s chronological range was for articles and conference proceedings published in the last decade from 2008 to 2018. Moreover, this study only includes those articles that conducted an experiment using dataset with one or more methodologies. The following databases were used in this study: IEEE Xplore, Science Direct, Springer Link, and ACM Digital Library.

Figure 2 shows the distribution of research articles by database.

Fig. 2.
figure 2

Distribution of articles by database

Thirty-four percent of the articles, or 24 of 71, are from IEEE. Twenty-seven percent of the articles are from Science Direct, which is 19 out of 71 articles. Twenty-five percent of the articles came from Springer Link, or 18 of 71 articles. Finally, fourteen percent of the articles are from the ACM digital library, which is 10 of 71 articles.

The key selection criteria for papers in this study were collaborative filtering recommendation system articles that were performing experiments on real data sets and trying to alleviate the inherent challenges for collaborative filtering techniques discussed above. Moreover, this study was confined to literature published in English. The keywords chosen for search process were: “recommendation system”, “recommender system”, “methodologies and recommendation system”, “Association rule and recommendation system”, “regression and recommendation system”, “neural network and recommendation system”, and “ensemble and recommendation system.” Although consulted, survey papers were not included in this study unless they included an experiment on a dataset.

6 Classification of Papers for Literature Review

Seventy-one articles on methodologies for collaborative-filtering recommendation system were reviewed in this study. These papers were classified into categories by year, application field, and methodology. This section discusses the detailed classification of the articles and conference papers surveyed for this literature review.

6.1 Distribution of Papers by Year

The research papers were selected by year of publication between 2008–2018 as shown in Fig. 3. The majority of research papers published between 2016–2018 were using different methods to implement in different application areas such as books, music, social networks, travel, and education to find solutions for the limitations of collaborative filtering discussed in Sect. 4 above.

Fig. 3.
figure 3

Distribution of papers by publication year

6.2 Distribution of Papers by Methodology/Technique

Distribution of papers by different techniques for collaborative filtering is shown below in Fig. 4. From the 71 papers that were reviewed, the clustering technique predominates [14, 16, 24, 34, 41, 42, 51, 52, 54, 70, 74, 75, 78, 79, 82]. This technique has proven beneficial to eliminate the challenges in collaborative filtering. Association rule was researched in the earlier years of the period considered for this survey [21, 29, 36, 39, 40, 46, 64]. However, more recently, researchers have become more interested in implementing the association technique alongside another method within a hybrid or ensemble approach [31, 37, 38, 44, 47, 55, 58, 65, 68, 80]. Bayesian prediction ranks next in number of papers. In the majority of cases Bayesian technique has been used to implement Movie, social media and location-based recommendation systems [18, 19, 22, 28, 44, 45, 49, 50, 53, 57, 60, 62, 63, 80]. The use of neural networks in recommender systems is similar to the Bayesian [20, 23, 26, 32, 37, 47, 58, 66, 71, 72, 76]. Finally, regression has received the least study of these methodologies [11, 61, 73, 77, 81]. However, increased research on ensemble techniques may have decreased the research devoted exclusively to regression. As with Association Rule methodology, researchers are interested in implementing regression with other techniques to improve recommendation system performance.

Fig. 4.
figure 4

Distribution of papers by technique/methodology

6.3 Distribution of Papers by Application

The distribution of papers by application is represented in Fig. 5.

Fig. 5.
figure 5

Distribution by application field

Thirty-five of the papers discussed the implementation and application of recommender system methods to movie recommendation [14, 19, 41,42,43, 54, 57, 60, 61, 71, 78, 81]. This is likely due to the easily accessible movie data from the Movie Lens dataset [27]. Following the movie category, Business received the second highest number of studies. This includes e-commerce, restaurant, travel, question answering , and others [8, 21, 28, 36, 37, 52, 66, 69, 74]. Some papers discussed the implementation of collaborative filtering methodologies in education. Most of these use association rule and ensemble methods [29, 39, 67]. The fewest papers were published in music and television applications [25, 35, 51]. Other studies used multiple datasets as benchmarks. This indicates that diverse datasets exist for benchmarking Recommendation System performance. Furthermore, given the lack of studies devoted to music and television and the abundance of studies in the movie domain, these may be worthwhile areas for future research in collaborative filtering recommender systems.

6.4 Chronological Evaluation of Papers by Year and Technique

During the late 2000s Association Rule was predominant [21, 29, 30, 40, 46, 55, 64, 68]. Following a hiatus, researchers resumed interest in this method in 2017. Clustering can be considered the most stable research category because it has been consistently investigated since 2009 [14, 16, 24, 28, 34, 42, 51, 52, 54, 70, 74, 75, 78, 79, 82]. Few papers in the early period of this study examined this method. However, more recent studies in clustering have focused on implementing fuzzy clustering instead of traditional clustering [41]. Another stable research category is Bayesian prediction with papers consistently appearing on this technique with almost yearly regularity [18, 19, 22, 28, 44, 45, 49, 50, 53, 57, 60, 62, 63]. Few papers from 2009–2010 and 2014–2017 discuss the Regression technique [11, 33, 59, 61, 73, 77, 81]. Low frequency of articles for a given technique could indicate that researchers are transitioning to a new research emphasis, such as such as ensemble methods. Only one paper from 2009 discussed ensemble technique; but, since 2016, this method has grown in popularity among researchers [12, 15, 31, 37, 58, 65, 80]. The ascendant popularity of Ensemble techniques indicates a current trajectory of recommender system research. Moreover, many researchers are interested in implementing this method for different application fields such as context-based, location-based, demographics, education, and social media [6, 28,29,30, 39, 44, 46, 48, 52, 63, 66, 68, 75].

7 Conclusion

Recommendation systems predict users rating of products or items based on users’ likes and dislikes. These systems utilize users’ background history and current information by using implicit features and explicit feedback to provide high quality, relevant and diverse recommendations. This paper discussed model-based collaborative filtering, examined its different methodologies and applications, and surveyed the results of a literature review of model-based collaborative filtering. For this review, 71 articles and conference papers published between 2008–2018 were surveyed to furnish insight about the trajectories in the field of Big Data and collaborative filtering techniques and applications.

Papers were classified in terms of database, publication year, methodologies or technique, application field, and technique and publication year. There are few papers using Ensemble method in early years covered in this survey. However, more papers using Ensemble models have been published within the past few years. Moreover, it was also observed that generally fewer papers were published between 2008–2013 on collaborative filtering, then after 2014 the publication count rises, and the trend reaches its peak in 2017.

For application field, there were more papers implementing collaborative filtering methodologies on movie datasets than any other type of dataset. This is due to the wide availability of movie datasets such as MovieLens and Netflix [27, 56]. As a future goal for research, it is important to investigate further ensemble methods and the potential limitations of this method. Also, it is important to expand the availability and variety of research datasets for collaborative filtering recommendation systems. With the ascendancy of Big Data, it is evident that Recommender Systems will continue to be a crucial tool to navigate the overwhelming variety choices across numerous domains.