Skip to main content
Log in

An efficient parallel similarity matrix construction on MapReduce for collaborative filtering

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Nowadays, the collaborative filtering becomes popular for recommendation systems. However, as the volume of data increases expansively, the construction of a similarity matrix becomes a performance bottleneck in recommendation systems. The MapReduce framework proposed by Google has been widely used for data-intensive application recently. Thus, in this work, we propose an efficient parallel algorithm ConSimMR for constructing a similarity matrix using MapReduce. We first partition a set of items into disjoint groups in each of which items rated by similar users tend to be located. We next compute the similarity of every pair of items belonging to the same group. Finally, we calculate the similarity of every item pair included in different groups. At this step, by using the rating list of each user rather than that of each item, we can compute the similarities in parallel resulting in the performance improvement. We conducted experiments to compare our parallel algorithm ConSimMR with the previous algorithms on real-life data sets and confirmed the efficiency as well as scalability of ConSimMR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17(6):734–749

    Article  Google Scholar 

  2. Apache: Apache hadoop. http://hadoop.apache.org (2010). Accessed 1 June 2017

  3. Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp 43–52

  4. Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of Compression and Complexity of Sequences 1997, IEEE, pp 21–29

  5. Cohen E (1997) Size-estimation framework with applications to transitive closure and reachability. J Comput Syst Sci 55(3):441–453

    Article  MathSciNet  MATH  Google Scholar 

  6. Das AS, Datar M, Garg A, Rajaram S (2007) Google news personalization: scalable online collaborative filtering. In: Proceedings of the 16th International Conference on World Wide Web, ACM, pp 271–280

  7. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  8. Delgado J, Ishii N (1999) Memory-based weighted majority prediction. In: ACM SIGIR Workshop Recommender Systems Citeseer

  9. Deshpande M, Karypis G (2004) Item-based top-n recommendation algorithms. ACM Trans Inf Syst (TOIS) 22(1):143–177

    Article  Google Scholar 

  10. Goldberg D, Nichols D, Oki BM, Terry D (1992) Using collaborative filtering to weave an information tapestry. Commun ACM 35(12):61–70

    Article  Google Scholar 

  11. Indyk P (2001) A small approximately min-wise independent family of hash functions. J Algorithms 38(1):84–90

    Article  MathSciNet  MATH  Google Scholar 

  12. Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, ACM, pp 604–613

  13. Jiang J, Lu J, Zhang G, Long G (2011) Scaling-up item-based collaborative filtering recommendation algorithm based on hadoop. In: 2011 IEEE World Congress on Services, pp 490–497

  14. Li C, He K (2017) CBMR: an optimized MapReduce for item-based collaborative filtering recommendation algorithm with empirical analysis. Concurr Comput Pract Exp 29:e4092. https://doi.org/10.1002/cpe.4092

  15. Meng S, Dou W, Zhang X, Chen J (2014) KASR: a keyword-aware service recommendation method on mapreduce for big data applications. IEEE Trans Parallel Distrib Syst 25(12):3221–3231

    Article  Google Scholar 

  16. Miller BN, Albert I, Lam SK, Konstan JA, Riedl J (2003) Movielens unplugged: experiences with an occasionally connected recommender system. In: Proceedings of the 8th International Conference on Intelligent User Interfaces, pp 263–266

  17. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Grouplens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp 175–186

  18. Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web, pp 285–295

  19. Schelter S, Boden C, Markl V (2012) Scalable similarity-based neighborhood methods with MapReduce. In: Proceedings of the Sixth ACM Conference on Recommender Systems, pp 163–170

  20. Shardanand U, Maes P (1995) Social information filtering: algorithms for automating word of mouth. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp 210–217

  21. Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Adv Artif Intell 2009:4

    Article  Google Scholar 

  22. Wang P, Ye H (2009) A personalized recommendation algorithm combining slope one scheme and user based collaborative filtering. In: Proceedings of the International Conference on Industrial and Information Systems, pp 152–154

  23. Zhao ZD, Shang MS (2010) User-based collaborative-filtering recommendation algorithms on Hadoop. In: Proceedings of Third International Conference on Knowledge Discovery and Data Mining (WKDD), pp 478–481

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun-Ki Min.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, S., Kim, H. & Min, JK. An efficient parallel similarity matrix construction on MapReduce for collaborative filtering. J Supercomput 75, 123–141 (2019). https://doi.org/10.1007/s11227-018-2271-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2271-3

Keywords

Navigation