Abstract
The recommendation of products of interest to the user is pivotal for improving a customer’s shopping experience. Recommender system has diversified and endeared itself in wide ranging industrial applications from e-commerce to online video sites. As the input data that is supplied to the recommender systems is large, the recommender system is often considered as data intensive application. In this paper, we present improvised MapReduce based data preprocessing and content based recommendation algorithms. Also, Spark based content based recommendation algorithm is developed and compared with Hadoop based content based recommendation algorithm. Our experimental results on Amazon co-purchasing network meta data show that Spark based content based recommendation algorithm is faster than Hadoop based content based recommendation algorithm. Also, graphical user interface is developed to interact with the recommender system.
References
Venkataraman, D., Gangothri, V., Saranya, S.: A comprehensive review of recommender system. Int. J. Appl. Eng. Res. 10, 13909–13919 (2015)
Thangavel, S.K., Thampici, N.S., Johnpaul, C.I.: Performance analysis of various recommendation algorithms using apache hadoop and mahout. Int. J. Sci. Eng. Res 4(12), 279–287 (2013)
Philip Chen, C.L., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Information Sciences. Elsevier, Amsterdam (2014)
Kang, S.J., Lee, S.Y., Lee, K.M.: Performance comparison of OpenMP, MPI, and MapReduce in practical problems. Adv. Multimed. J., Article ID: 575687. Hindawi Publishing Corporation (2014)
De Pessemier, T., Vanhecke, K., Dooms, S., Martens, L.: Content-based recommendation algorithms on the hadoop mapreduce framework. In: 7th International Conference on Web Information Systems and Technologies, pp. 237–240 (2011)
Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, pp. 322–331 (2014)
Dooms, S., Audenaert, P., Fostier, J., De Pessemier, T., Marten, L.: In-memory, distributed content-based recommender system. J. Intell. Syst. 42(3), 645–669 (2014)
Saravanan, S.: Design of large scale content based recommender system using Hadoop MapReduce Framework. In: 2015 Eighth International Conference on Contemporary Computing (IC3). IEEE, 22 August 2015
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Apache-Hadoop. http://Hadoop.apache.org
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012). http://www.cs.berkeley.edu/Ëœmatei/papers/2012/nsdispark.pdf
Scala programming language. http://www.scala-lang.org
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
S., S., K.E., K., Balaji, A., Sajith, A. (2018). Performance Comparison of Apache Spark and Hadoop Based Large Scale Content Based Recommender System. In: Thampi, S., Mitra, S., Mukhopadhyay, J., Li, KC., James, A., Berretti, S. (eds) Intelligent Systems Technologies and Applications. ISTA 2017. Advances in Intelligent Systems and Computing, vol 683. Springer, Cham. https://doi.org/10.1007/978-3-319-68385-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-68385-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68384-3
Online ISBN: 978-3-319-68385-0
eBook Packages: EngineeringEngineering (R0)