Skip to main content

Performance Comparison of Apache Spark and Hadoop Based Large Scale Content Based Recommender System

  • Conference paper
  • First Online:
Intelligent Systems Technologies and Applications (ISTA 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 683))

  • 1093 Accesses

Abstract

The recommendation of products of interest to the user is pivotal for improving a customer’s shopping experience. Recommender system has diversified and endeared itself in wide ranging industrial applications from e-commerce to online video sites. As the input data that is supplied to the recommender systems is large, the recommender system is often considered as data intensive application. In this paper, we present improvised MapReduce based data preprocessing and content based recommendation algorithms. Also, Spark based content based recommendation algorithm is developed and compared with Hadoop based content based recommendation algorithm. Our experimental results on Amazon co-purchasing network meta data show that Spark based content based recommendation algorithm is faster than Hadoop based content based recommendation algorithm. Also, graphical user interface is developed to interact with the recommender system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Venkataraman, D., Gangothri, V., Saranya, S.: A comprehensive review of recommender system. Int. J. Appl. Eng. Res. 10, 13909–13919 (2015)

    Google Scholar 

  2. Thangavel, S.K., Thampici, N.S., Johnpaul, C.I.: Performance analysis of various recommendation algorithms using apache hadoop and mahout. Int. J. Sci. Eng. Res 4(12), 279–287 (2013)

    Google Scholar 

  3. Philip Chen, C.L., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Information Sciences. Elsevier, Amsterdam (2014)

    Google Scholar 

  4. Kang, S.J., Lee, S.Y., Lee, K.M.: Performance comparison of OpenMP, MPI, and MapReduce in practical problems. Adv. Multimed. J., Article ID: 575687. Hindawi Publishing Corporation (2014)

    Google Scholar 

  5. De Pessemier, T., Vanhecke, K., Dooms, S., Martens, L.: Content-based recommendation algorithms on the hadoop mapreduce framework. In: 7th International Conference on Web Information Systems and Technologies, pp. 237–240 (2011)

    Google Scholar 

  6. Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, pp. 322–331 (2014)

    Google Scholar 

  7. Dooms, S., Audenaert, P., Fostier, J., De Pessemier, T., Marten, L.: In-memory, distributed content-based recommender system. J. Intell. Syst. 42(3), 645–669 (2014)

    Article  Google Scholar 

  8. Saravanan, S.: Design of large scale content based recommender system using Hadoop MapReduce Framework. In: 2015 Eighth International Conference on Contemporary Computing (IC3). IEEE, 22 August 2015

    Google Scholar 

  9. https://snap.stanford.edu/data/web-Amazon.html

  10. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  11. Apache-Hadoop. http://Hadoop.apache.org

  12. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012). http://www.cs.berkeley.edu/Ëœmatei/papers/2012/nsdispark.pdf

  13. Scala programming language. http://www.scala-lang.org

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saravanan S. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

S., S., K.E., K., Balaji, A., Sajith, A. (2018). Performance Comparison of Apache Spark and Hadoop Based Large Scale Content Based Recommender System. In: Thampi, S., Mitra, S., Mukhopadhyay, J., Li, KC., James, A., Berretti, S. (eds) Intelligent Systems Technologies and Applications. ISTA 2017. Advances in Intelligent Systems and Computing, vol 683. Springer, Cham. https://doi.org/10.1007/978-3-319-68385-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68385-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68384-3

  • Online ISBN: 978-3-319-68385-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics