The MapReduce Paradigm

Nielsen, Frank

doi:10.1007/978-3-319-21903-5_6

The MapReduce Paradigm

Frank Nielsen^3,4

Chapter
First Online: 04 February 2016

4896 Accesses

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))

Abstract

The system of MapReduce (or Hadoop for an equivalent open source in Java) offers a simple framework to parallelize and execute parallel algorithms on massive data sets, commonly called Big Data (with size ranging from a few gigabytes to a few terabytes or even petabytes). This dedicated MapReduce paradigm of data-intensive parallel programming was originally developed by Google in 2003. MapReduce is an abstract model of parallel programming for processing massive data sets on a cluster of computers, and a platform to execute and monitor jobs. MapReduce is straightforward to use, can be easily extended, and even more importantly MapReduce is prone to both hardware and software failures.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
When one uses several hundreds of interconnected cheap machines, failures happen quite often in practice, and need to be addressed.
2.
https://storm.apache.org/.
3.
https://spark.apache.org/streaming/.
4.
https://hive.apache.org/.
5.
http://caml.inria.fr/ocaml/.
6.
One can download the Common Lisp from http://www.lispworks.com/.
7.
http://wiki.apache.org/hadoop/Grep.
8.
Those figures are usually trade secrets of companies, and are not disclosed publicly.
9.
http://mapreduce.sandia.gov/.
10.
http://mapreduce.sandia.gov/doc/Manual.html.
11.
http://www.thecloudavenue.com/2013/01/virtual-machine-for-learning-hadoop.html.
12.
http://mapreduce.sandia.gov/.

References

Hoefler, T., Lumsdaine, A., Dongarra, J.: Towards efficient MapReduce using MPI. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. Lecture Notes in Computer Science, vol. 5759, pp. 240–249. Springer, Berlin (2009)
Google Scholar
Plimpton, S.J., Devine, K.D.: Mapreduce in MPI for large-scale graph algorithms. Parallel Comput. 37(9), 610–632 (2011)
Article Google Scholar
Kaur, S., Bhatnagar, V., Chakravarthy, S.: Stream clustering algorithms: a primer. In: Ella Hassanien, A., Taher Azar, A., Snasael, V., Kacprzyk, J., Abawajy, J.H. (eds.) Big Data in Complex Systems. Studies in Big Data, vol. 9, pp. 105–145. Springer International Publishing, Switzerland (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

École Polytechnique, Palaiseau, France
Frank Nielsen
Sony Computer Science Laboratories, Inc., Tokyo, Japan
Frank Nielsen

Authors

Frank Nielsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frank Nielsen .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nielsen, F. (2016). The MapReduce Paradigm. In: Introduction to HPC with MPI for Data Science. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-21903-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-21903-5_6
Published: 04 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21902-8
Online ISBN: 978-3-319-21903-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics