Performance Comparison of Apache Spark and Hadoop Based Large Scale Content Based Recommender System

S., Saravanan; K.E., Karthick; Balaji, Ashwin; Sajith, Anand

doi:10.1007/978-3-319-68385-0_6

Saravanan S.²⁰,
Karthick K.E.²⁰,
Ashwin Balaji²⁰ &
…
Anand Sajith²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 683))

Included in the following conference series:

The International Symposium on Intelligent Systems Technologies and Applications

1093 Accesses

Abstract

The recommendation of products of interest to the user is pivotal for improving a customer’s shopping experience. Recommender system has diversified and endeared itself in wide ranging industrial applications from e-commerce to online video sites. As the input data that is supplied to the recommender systems is large, the recommender system is often considered as data intensive application. In this paper, we present improvised MapReduce based data preprocessing and content based recommendation algorithms. Also, Spark based content based recommendation algorithm is developed and compared with Hadoop based content based recommendation algorithm. Our experimental results on Amazon co-purchasing network meta data show that Spark based content based recommendation algorithm is faster than Hadoop based content based recommendation algorithm. Also, graphical user interface is developed to interact with the recommender system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Venkataraman, D., Gangothri, V., Saranya, S.: A comprehensive review of recommender system. Int. J. Appl. Eng. Res. 10, 13909–13919 (2015)
Google Scholar
Thangavel, S.K., Thampici, N.S., Johnpaul, C.I.: Performance analysis of various recommendation algorithms using apache hadoop and mahout. Int. J. Sci. Eng. Res 4(12), 279–287 (2013)
Google Scholar
Philip Chen, C.L., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Information Sciences. Elsevier, Amsterdam (2014)
Google Scholar
Kang, S.J., Lee, S.Y., Lee, K.M.: Performance comparison of OpenMP, MPI, and MapReduce in practical problems. Adv. Multimed. J., Article ID: 575687. Hindawi Publishing Corporation (2014)
Google Scholar
De Pessemier, T., Vanhecke, K., Dooms, S., Martens, L.: Content-based recommendation algorithms on the hadoop mapreduce framework. In: 7th International Conference on Web Information Systems and Technologies, pp. 237–240 (2011)
Google Scholar
Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, pp. 322–331 (2014)
Google Scholar
Dooms, S., Audenaert, P., Fostier, J., De Pessemier, T., Marten, L.: In-memory, distributed content-based recommender system. J. Intell. Syst. 42(3), 645–669 (2014)
Article Google Scholar
Saravanan, S.: Design of large scale content based recommender system using Hadoop MapReduce Framework. In: 2015 Eighth International Conference on Contemporary Computing (IC3). IEEE, 22 August 2015
Google Scholar
https://snap.stanford.edu/data/web-Amazon.html
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Apache-Hadoop. http://Hadoop.apache.org
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012). http://www.cs.berkeley.edu/˜matei/papers/2012/nsdispark.pdf
Scala programming language. http://www.scala-lang.org

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Amrita University, Bengaluru, India
Saravanan S., Karthick K.E., Ashwin Balaji & Anand Sajith

Authors

Saravanan S.
View author publications
You can also search for this author in PubMed Google Scholar
Karthick K.E.
View author publications
You can also search for this author in PubMed Google Scholar
Ashwin Balaji
View author publications
You can also search for this author in PubMed Google Scholar
Anand Sajith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saravanan S. .

Editor information

Editors and Affiliations

School of CS/IT, Indian Institute of Information Technology, Trivandrum, Kerala, India
Sabu M. Thampi
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Sushmita Mitra
Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, West Bengal, India
Jayanta Mukhopadhyay
Xiamen University, Xiamen, China
Kuan-Ching Li
Department of Electrical and Electronic, Nazarbayev University, Astana, Kazakhstan
Alex Pappachen James
Dipartimento di Ingegneria, Università degli Studi di Firenze, Firenze, Italy
Stefano Berretti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

S., S., K.E., K., Balaji, A., Sajith, A. (2018). Performance Comparison of Apache Spark and Hadoop Based Large Scale Content Based Recommender System. In: Thampi, S., Mitra, S., Mukhopadhyay, J., Li, KC., James, A., Berretti, S. (eds) Intelligent Systems Technologies and Applications. ISTA 2017. Advances in Intelligent Systems and Computing, vol 683. Springer, Cham. https://doi.org/10.1007/978-3-319-68385-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-68385-0_6
Published: 21 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68384-3
Online ISBN: 978-3-319-68385-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics