Abstract
MapReduce is a programming paradigm used for processing massive data sets with a scalable and parallel approach on a cluster of distributed compute nodes. In this chapter we aim to provide background on the MapReduce programming paradigm and framework, highlighting its significance and usage for data crunching in today’s scenario. Alongside, students will be introduced to important concepts such as Big Data, scalability, parallelization and divide & conquer. The chapter provides ample examples, both beginner level and advanced, for students to become proficient in recognizing problems suitable for a MapReduce solution and to define efficient Map and Reduce functions for those data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
[Online]. “Mind-blowing growth & power of big data - Business Insider” Available: http://www.businessinsider.com/mind-blowing-growth-and-power-of-big-data-2015-6
EMC Digital Universe with Research & Analysis by IDC. The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things. 2014
[Online]. “Volume, velocity, and variety: Understanding the three V’s of big data,” in DIY-IT Available: http://www.zdnet.com/article/volume-velocity-and-variety-understanding-the-three-vs-of-big-data/
G. Sanjay, G. Howard, and L. Shun-Tak, “The Google File system,” in ACM SIGOPS Operating Systems Review - Volume 37 Issue 5, December 2003
D. Jeff and G. Sanjay, “MapReduce: Simplified Data Processing on Large Clusters,” in Communications of the ACM – 50th Anniversary Issue, Vol. 51 No. 1, Pages 107–113, 2008.
[Online]. “Apache Hadoop” Available: http://hadoop.apache.org/
[Online]. “Apache Hadoop YARN” Available: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
[Online]. “Hadoop Basics—Creating a MapReduce Program,” DZone Available: https://dzone.com/articles/hadoop-basics-creating
“Data-Intensive Text Processing with MapReduce” by Jimmy Lin and Chris Dyer, University of Maryland, College Park, Manuscript prepared April 11, 2010
“MapReduce Patterns, Algorithms, and Use Cases” by Ilya Katsov in Highly Scalable Blog, 2012 (https://highlyscalable.wordpress.com/2012/02/01/MapReduce-patterns/)
[Online]. “Apache Spark” Available: http://spark.apache.org/
[Online]. “Apache Hive” Available: https://hive.apache.org/
[Online]. “Apache Pig” Available: https://pig.apache.org/
[Online]. “Apache HBase” Available: https://hbase.apache.org/
[Online]. “Apache Mahout” Available: http://mahout.apache.org/
[Online]. “Apache Oozie” Available: http://oozie.apache.org/
[Online]. “The Hadoop Ecosystem Table” Available: https://hadoopecosystemtable.github.io/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Anjum, B. (2018). MapReduce – The Scalable Distributed Data Processing Solution. In: Prasad, S., Gupta, A., Rosenberg, A., Sussman, A., Weems, C. (eds) Topics in Parallel and Distributed Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-93109-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-93109-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93108-1
Online ISBN: 978-3-319-93109-8
eBook Packages: Computer ScienceComputer Science (R0)