Advertisement

A new approach for rating prediction system using collaborative filtering

  • Pushpendra Kumar
  • Vinod Kumar
  • Ramjeevan Singh Thakur
Original Article
  • 56 Downloads

Abstract

Recommendation systems are most commonly used to recommend items for web users. It assists users in the selection of product from millions of product. E-Commerce websites such as AMAZON recommend items to its customers. The recommendation system mainly depends upon the previous history of its users. In this paper, a new User Rating Prediction (URP) algorithm is proposed to predict ratings for items. The proposed URP algorithm mainly depends upon similarity of users and assumes that users with similar taste may be interested in similar items. The proposed system first makes a list of related users for every user and then uses this information to predict ratings for different items. The result of the proposed algorithm was compared with the previous existing methods. The proposed algorithm gives small value of Mean Absolute Error (MAE) and root-mean-square error (RMSE) as compared to other methods.

Keywords

Recommendation system E-commerce Collaborative filtering (CF) Prediction 

1 Introduction

Nowadays, the E-commerce and social networking websites are playing an important role in our daily life [10, 11, 18]. Most of the people are very much dependent on these websites for daily activities [13]. Because of the fast growth in WorldWideWeb, finding appropriate content from the web has become a difficult problem in our daily life. Recommender System (RS) is a new approach to assist the user in the handling of the vast quality of information. The RS is a tool [9, 15] that provides greater impact and offers useful products among many possible options to the target user. The goal of the RS is to supply an accurate recommendation of products that a user prefers among the given list of items [15]. E-commerce websites can use this system to provide the recommendation to their users based on their choice and personalize the website. E-commerce websites like Netfix.com and Amazon.com personalize their sites for each customer by showing them products related to their interests and tests, so the websites related to books, music, movies, news, and restaurants can recommend items to its users based on their past history [7]. It saves the users’ valuable time by recommending items that are related to their preferences and choices. It also increases profitability of the business by increasing sales at online stores. There are mainly three categories of recommendation system techniques, including Collaborative filtering (CF), Content-based filtering, and Hybrid filtering [7]. Taxonomy of Recommender System [8] is shown in Fig. 1. In personalized recommendation systems, one of the most commonly used methods is CF. CF uses the opinion of other similar users in the organization to recommend items or products to the active user. Content-based filtering uses user’s past history and recommends the similar items as they have been used in the past. Content-based filtering (CBF) uses users’ past information and recommends the similar items as they have been used in the past. In CBF, system learns the importance of item characteristics and builds a model of what the user likes. The hybrid recommendation system combines the approach of collaborative filtering and content-based method in different ways. In this paper, we proposed an algorithm that predicts the rating of an item. The proposed method first finds the similarity between users using the previous history of choices then performs prediction for items using similar user list. The basic idea to create similar user list is that if two users rate similar types of items and give similar ratings to these items then these users are similar to each other. To calculate the predicted rating for given user, the rating given by similar users and similarity values are taken into consideration. The proposed algorithm does not perform any hybridization of collaborative filtering and content-based filtering. Therefore, it is easy to implement. This algorithm makes user experience great and also increases the business sales.
Fig. 1

Taxonomy of recommender system

The rest of the paper is organized as follows. In Sect. 2, related work is discussed. In Sect. 3, the proposed work is discussed. In Sect. 4, result and analysis are discussed. In Sect. 5, work is concluded with Conclusion and future scope.

2 Related work

Related work examines the recent work in the field of the recommendation system, including content-based filtering technique, collaborative filtering technique, and hybrid recommendation techniques with advantages and disadvantages. C. M. Rodrigues et al. [16] introduced cluster-based hybrid collaborative approaches which combine the item-based CF algorithm with user demographic-based algorithm in cluster-weighted mechanism to calculate the result and make the recommendation to the user. This approach performs saving of resources, time, and performs dynamic predictions. A weighted system is implemented to make consolidated decisions. This approach is also useful in solving user cold start problem, item cold start problem, and sparsity problem. This approach cannot generate clusters based on cross-domain data, such as if users like towards the music, the system should suggest movies based on the kinds of music which he likes. In addition, this system has not an ability to filter questionable user profiles and rating. Li and Murata [12] proposed a hybrid recommendation approach that provides a flexible solution by combining multidimensional clustering into a collaborative filtering recommendation system to produce a better quality of the recommendation. This helps to get user cluster, which has a different preference for multi-view for increasing effectiveness and diversity of recommendation. This proposed algorithm has been divided into three phases. In Phase I, training data are collected in the form of user and item profiles, and perform clustering using the proposed algorithm. In Phase II, obtained clusters having similar characteristics are discarded. In Phase III, prediction of an item is made by performing a weighted average of deviations from the neighbor mean. The proposed method requires background data in the form of user and item profiles for clustering. P. Devika et al. [3] introduced a new pattern mining algorithm for recommendation system that overwhelms the shortcoming of Apriori algorithm, called as Frequent Pattern Intersect algorithm (FPIntersect algorithm). Nowadays, e-commerce and social networking websites play an important role in our everyday life; it is very difficult to survive without it. These websites produce a huge amount of data; the traditional data mining approaches such as Apriori suffer from high latency in scanning the large database for generating association rules. FPIntersect algorithm overwhelms the shortcoming of Apriori by decreasing the number of scans and produces association rules, but the proposed system extracts the user ratings and their comments from the user reviews to obtain the information. Finding information from the user comments is a costly process. Wang and Han [19] proposed a collaborative filtering algorithm that can calculate the rating of an item that has not rated by the user based on analysis of the item characteristic. This algorithm improves the accuracy of recommendation system and prediction under the situation of the sparsity of user rating data. This system is based on content-based filtering, so if items do not contain much information to differentiate each other, the system will not perform accurate prediction. Gupta and Gadge [6] proposed a framework that combined an item-based collaborative filtering with demographic-based user clusters in an adaptive-weighted scheme and performs prediction. Cold start, scalability, and sparsity problems are faced by the conventional collaborative filtering algorithms which are solved by the proposed algorithm. When a number of ratings available for a user are less, the quality of prediction will improve by giving more weight to demographic-based user cluster. The addition of new users into an appropriate cluster is a challenging task. Moghaddam and Ali Selamat [14] proposed a novel hybrid recommender system that achieves the advantage of both model-based and memory-based collaborative filtering methods by combining user-based collaborative filtering method with DBSCAN clustering method based on users’ demographic information. This proposed method improves accuracy along with scalability. There is an examination required to recognize the result of intra-cluster rates smoothing to overcome the sparsity problem in collaborative filtering algorithms. The proposed system uses density-based user clustering. The time complexity of DBSCAN is O(n2) which is greater than K-means O(n). Deshpande and Karypis [2] proposed the computational complexity of the user-based collaborative filtering methods to grow linearly with the number of customers increases. To resolve these scalability problems, a model-based recommendation method is introduced. A model-based recommendation algorithm analyzes the user–item matrix to find relations between the different items and performs recommendations based on the relation. The proposed algorithm is divided into two phases. In phase, I similarity between items is computed. Phase II uses similarities computed in phase I, to compute the similarity between a basket of items and a candidate recommender item. The proposed algorithms are faster, independent of the size of the user–item matrix, and allow real-time recommendations. The proposed system gives the good result for smaller values of \(k (10 \le k \le 30)\), and for higher values of k, it gives very small or no improvement.

3 Proposed work

In this work, a new approach for rating prediction system for rating prediction of the items is proposed. The proposed work predicts the ratings for a given item, for a given user. The work takes the ratings table of Movielens data set [5]. The data set contains m users (U) and n items (I). This system reads the first three attributes of the rating table, i.e., userid, movieid, and rating. The rating is given at a scale of 1–5. The flowchart for the calculation of predicted rating is shown in Fig. 2. The proposed system works in the following ways.

3.1 Finding similarity between users

It will be calculated on the basis of the ratings of the items given by users. If two users give almost the same ratings to an item, then these two users are related to each other. A formula is derived for calculation of how much two users are related to each other. The formula is as follows:
$$\begin{aligned} sim(a,b)=\sum _{i=1}^nr_{h}-|r_{ai}-r_{bi}|, \end{aligned}$$
(1)
where ab :  users; Sim(ab) :  similarity between user a to user b; \(r_{h}:\) highest rating; \(r_{ai}:\) rating of user a for item i; \(r_{bi}:\) rating of user b for item i; i :  set of items, rated by both user a and b.
Fig. 2

Procedure of rating predication system functioning

3.2 Find list of related users for every user

A list of related users is created for every user after calculating the similarity between users. This list is created on the basis of the value of similarity matrix. All those users are added to the list of the related user for a whose similarity value is greater than a threshold value \(\theta \) in the similarity matrix corresponding to the user a. A formula is derived for calculation of threshold \(\theta \) as follows:
$$\begin{aligned} \theta =n\times r_{h}\times p, \end{aligned}$$
(2)
where n :  maximum number of item; \(p: {0.2} (20\%\) of the highest similarity).

3.3 Predicting ratings for users

In this step, the system predict rating for a given user for a given item. To predict rating for a user for an item, the related list of all the users which are found in step 3.2 is used. A formula is derived for the prediction of rating is as follows:
$$\begin{aligned} p(x,k)=\frac{\sum _{r_{u}\in R}r(r_{u,}k)}{|R|}+\frac{\sum _{r_{u}\in R}Sim(x,r_{u})}{|R|\times n \times r_{h}}, \end{aligned}$$
(3)
where p(xk) :  predicted rating of user x for item k; R :  list of related user for user x; \(r(r_{u}, k):\) rating of user \(r_{u}\) for item k.

4 Results and analysis

The proposed recommendation system will suggest the most relevant items to its users from a large list list of items. As the relevancy between user expectations and items recommended by proposed system increases, the system will recommend the best item to its users. The experimental result shows that the value of MAE and RMSE decreases. In this section, we are going to describe about the data set 4.1, performance evaluation metrics 4.2, and results 4.3 of our implemented approach with the existing once.

4.1 Data set

We use MovieLeans data set [5] to assess the performance of our proposed algorithm. This data set is collected by GroupLeans Research at the University of Minnesota. The data set contains 100k records, 1628 movies rated by 943 users. Every user has at least 20 movies rated. The ratings are on the scale of 1(poor)–5(awesome) stars. This rating shows the user interests about the item.

4.2 Performance measurement

The accuracy of the proposed algorithm can be measured by the statistical accuracy metrics or decision-support metrics. In this paper, we use statistical accuracy metrics to evaluate the accuracy of rating prediction algorithm. The frequently used statistical metrics are Mean Absolute Error (MAE) and Root-Mean-Square Error (RMSE) [4]. Let \(r_{1}, r_{2}, r_{3},......, r_{n}\) are the actual ratings, and the corresponding \(p_{1}, p_{2}, p_{3},....., p_{n}\) are the predicted ratings.
Table 1

Similarity among different users

User

\(u_{1}\)

\(u_{2}\)

\(u_{3}\)

\(u_{4}\)

\(u_{5}\)

u6

\(u_{7}\)

\(u_{8}\)

u9

\(u_{10}\)

\(u_{11}\)

\(u_{12}\)

\(u_{13}\)

\(u_{14}\)

\(u_{1}\)

0

0

0

17.5

20.5

20.5

37.5

37.5

42.5

42.5

42.5

42.5

42.5

42.5

\(u_{2}\)

0

0

34

93

130

130

204

241

266

273

281

281

319

319

\(u_{3}\)

0

34

0

61

110

120.5

163

238

262

284

302

302

353.5

362

\(u_{4}\)

17.5

76.5

103.5

0

180.5

208.5

369.5

477.5

487.5

532.5

541

541.5

603.5

617.5

\(u_{5}\)

3

40

89

166

0

184.5

225.5

299.5

323.5

332.5

341.5

341.5

404.5

419.5

\(u_{6}\)

0

0

10.5

38.5

57

0

57

91.5

95.5

102.5

102.5

102.5

130.5

138

\(u_{7}\)

17

91

133.5

294.5

335.5

335.5

0

388

400

447

448

448

485.5

491.5

\(u_{8}\)

0

37

122

220

294

328.5

381

0

423

471

488.5

488.5

583.5

592.5

\(u_{9}\)

5

30

54

64

88

92

104

146

0

168

172

172

198

208

\(u_{10}\)

0

7

29

74

83

90

137

185

207

0

212

212

224.5

229.5

\(u_{11}\)

0

8

26

35

44

44

45

62.5

66.5

71.5

0

0

88.5

88.5

\(u_{12}\)

0

4

17.5

34.5

47.5

54

68.5

80.5

82.5

87.5

87.5

87.5

90.5

95.5

\(u_{13}\)

0

34

72

117

167

188

211.5

294.5

318.5

326

343

343

0

353

\(u_{14}\)

0

0

8.5

22.5

37.5

45

51

60

70

75

75

75

87

0

4.2.1 Mean absolute error (MAE)

MAE is defined as the mean of absolute differences between actual rating and predicted ratings [4]. The lower MAE value shows the better prediction [1]. It is defined as follows:
$$\begin{aligned} MAE=\frac{\sum _{i=1}^n|r_{i}-p_{i}|}{n}. \end{aligned}$$
(4)

4.2.2 Root-mean-square error (RMSE)

The root-mean-square error (RMSE) is another way of model evaluations. RMSE is the square root of the mean square error [1]. The MAE values are always less than or equal to RMSE. It is defined as follows:
$$\begin{aligned} RMSE=\sqrt{\frac{\sum _{i=1}^n(r_{i}-p_{i})^{2}}{n}}. \end{aligned}$$
(5)
Table 2

Similarity of Uid with other different Uids

U Id.

Has similarity with user id

1

2

5

6

7

8

9

10

11

12

13

14

3

5

6

7

8

9

10

11

12

13

14

4

3

5

6

7

8

9

10

11

12

13

14

5

4

5

6

7

8

9

10

11

12

13

14

6

10

11

12

13

14

7

3

4

5

6

8

9

10

11

12

13

14

8

3

4

5

6

7

9

10

11

12

13

14

9

7

8

10

11

12

13

14

10

7

8

9

11

12

13

14

11

-

12

-

13

4

5

6

7

8

9

10

11

12

14

14

4.3 Discussion of results

In our proposed work, a new approach for rating prediction system for ratings’ prediction of the items is proposed. The proposed algorithm predicts the ratings for a given user for a given item. This system is easy to implement, because it does not analyze the item characteristics and avoid clustering which reduces the complexity of the algorithm. The proposed algorithm is implemented in JAVA. This algorithm is implemented on 962 records of 14 users first. Table 1 shows the calculated similarity between users. Here, first row and first column show the user id, and \(M_{a,b}\) shows the similarity value between user a and user b.

On the basis of the similarity between the users, Table 2 illustrates the list of related users for every user. Predicted rating for random 20 user–item sets is shown in Table 3. Table 3 has five attributes. The predicted rating is calculated based on the similarity values between the users and based on the list of similar users. Error in Table 3 shows that the differences between actual rating and the predicted rating and based on this error value of MAE and RMSE are calculated.

Table 4 shows the comparison of results using the proposed algorithm and existing algorithms. It compares the value of MAE and RMSE with the existing algorithm.

Figure 3 demonstrates a graphical representation of the measured Mean Absolute Error (MAE) value of the proposed method URP and three other existing methods—Rodrigues [16], Sarwar [17], and Gong [4]. The Mean Absolute Error for the URP method is the least among all other three existing methods. The method, which has the least value of MAE, provides the better predictions. As the MAE value for a particular method increases, the accuracy of the prediction decreases. The MAE value of URP method is 0.315 and the MAE value of Gong [4] is 0.80 which is greatest among all these four methods. This method will produce least accuracy in prediction. Theoretically and experimentally, it is already proven that the root-mean-square error is always greater than the Mean Absolute Error. Fig.4 shows the comparison of results of RMSE between the existing algorithm and proposed algorithm URP. URP again shows the smaller RMSE than other. This indicates that the URP is more accurate. With the above result, result analysis, and discussion, it is concluded that the proposed method URP performs better than the existing algorithm.
Table 3

Showing predicted rating using the proposed algorithm

Uid

Movie id

Actual rating

Predicted rating

Error

1

1343

2

2

0

4

1674

5

4.5

0.5

8

1674

4

4.5

0.5

4

141

5

4.33

0.67

4

1032

5

3.5

1.5

6

8636

4

4

0

10

2840

3

3

0

8

1219

4

4.5

0.5

12

2529

1

1

0

5

3897

4.5

4.5

0

4

1356

4

4

0

8

1358

5

2.88

2.12

8

33166

4.5

4.75

0.25

7

592

3

3

0

8

33493

4.5

4.5

0

4

2902

2

2

0

4

1031

5

5

0

4

2054

3

3

0

10

1127

4

4

0

8

7438

4

4

0

Table 4

Comparison of result

Metric

URP

Rodrigues [16]

Sarwar [17]

Gong [4]

MAE

0.315

0.56

0.76

0.8

RMSE

0.676

0.7

-

-

Fig. 3

Comparison of result for MAE

Fig. 4

Comparison of result for RMSE

5 Conclusion and future scope

In this work, a new approach for predicting the ratings of items is proposed. The proposed system first finds the similarity among user using the previous data available in data set. Using that similarity, the system predicts the ratings for different items for different users. The results of the implementation suggest that the proposed algorithm predicts better ratings to items as the value of Mean Absolute Error is less as compared to the existing algorithm. However, this work has some limitations and, in future work, can be done on these limitations, which are as follows: (1). working on a larger data set or testing the algorithm on other data sets; (2). working on some other problems of recommendation system such as scalability and sparsity.

Notes

References

  1. 1.
    Chai, T., Draxler, R.R.: Root mean square error (rmse) or mean absolute error (mae)? Geosci. Model Dev. Discuss. 7, 1525–1534 (2014)CrossRefGoogle Scholar
  2. 2.
    Deshpande, M., Karypis, G.: Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst. 22(1), 143–177 (2004)CrossRefGoogle Scholar
  3. 3.
    Devika, P., Jisha, R., Sajeev, G.: A novel approach for book recommendation systems. In: Computational Intelligence and Computing Research (ICCIC), 2016 IEEE International Conference on, pp. 1–6. IEEE (2016)Google Scholar
  4. 4.
    Gong, S.: A collaborative filtering recommendation algorithm based on user clustering and item clustering. JSW 5(7), 745–752 (2010)CrossRefGoogle Scholar
  5. 5.
    GroupLens. Movielens 100k dataset. http://grouplens.org/datasets/movielens/100k/, May 2018 Accessed on 2018-04-03
  6. 6.
    Gupta, J., Gadge, J.: A framework for a recommendation system based on collaborative filtering and demographics. In: Circuits, Systems, Communication and Information Technology Applications (CSCITA), 2014 International Conference on, pp. 300–304. IEEE (2014)Google Scholar
  7. 7.
    Gupta, J., Gadge, J.: Performance analysis of recommendation system based on collaborative filtering and demographics. In: Communication, Information and Computing Technology (ICCICT), 2015 International Conference on, pp. 1–6. IEEE (2015)Google Scholar
  8. 8.
    Isinkaye, F., Folajimi, Y., Ojokoh, B.: Recommendation systems: principles, methods and evaluation. Egypt. Inf. J. 16(3), 261–273 (2015)CrossRefGoogle Scholar
  9. 9.
    Kumar, P., Thakur, R.S.: Recommendation system techniques and related issues: a survey. Int J Inf Technol 10, 1–7 (2018)Google Scholar
  10. 10.
    Kumar, P., Thakur, R.S.: A framework for weblog data analysis using hive in hadoop framework. In: Proceedings of International Conference on Recent Advancement on Computer and Communication, pp. 433–439. Springer (2018)Google Scholar
  11. 11.
    Kumar, V., Kumar, P., Thakur, R.: A brief investigation on data security tools and techniques for big data. Int. J. Eng. Sci. Invention 6(9), 20–27 (2017)Google Scholar
  12. 12.
    Li, X., Murata, T.: Using multidimensional clustering based collaborative filtering approach improving recommendation diversity. In: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology-Volume 03, pp. 169–174. IEEE Computer Society (2012)Google Scholar
  13. 13.
    Malviya, B.K., Agrawal, J.: A study on web usage mining theory and applications. In: Communication Systems and Network Technologies (CSNT), 2015 Fifth International Conference on, pp. 935–939. IEEE (2015)Google Scholar
  14. 14.
    Moghaddam, S.G., Selamat, A.: A scalable collaborative recommender algorithm based on user density-based clustering. In: Data Mining and Intelligent Information Technology Applications (ICMiA), 2011 3rd International Conference on, pp. 246–249. IEEE, (2011)Google Scholar
  15. 15.
    Rathod, A., Indiramma, M.: A survey of personalized recommendation system with user interest in social network. Int. J. Comput. Sci. Inf. Technol. 6(1), 413–415 (2015)Google Scholar
  16. 16.
    Rodrigues, C.M., Rathi, S., Patil, G.: An efficient system using item & user-based cf techniques to improve recommendation. In: Next Generation Computing Technologies (NGCT), 2016 2nd International Conference on, pp. 569–574. IEEE (2016)Google Scholar
  17. 17.
    Sarwar, B.M., Karypis, G., Konstan, J., Riedl, J.: Recommender systems for large-scale e-commerce: scalable neighborhood formation using clustering. Proc. Fifth Int. Conf. Comput. Inf. Technol. 1, 291–324 (2002)Google Scholar
  18. 18.
    Valera, M., Rathod, K.: A novel approach of mining frequent sequential pattern from customized web log preprocessing. Int. J. Eng. Res. Appl. 3(1), 269–380 (2013)Google Scholar
  19. 19.
    Wang, M.-J., Han, J.-T.: Collaborative filtering recommendation based on item rating and characteristic information prediction. In: Consumer Electronics, Communications and Networks (CECNet), 2012 2nd International Conference on, pp. 214–217. IEEE (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Computer ApplicationsMaulana Azad National Institute of TechnologyBhopalIndia

Personalised recommendations