Abstract
The system of MapReduce (or Hadoop for an equivalent open source in Java) offers a simple framework to parallelize and execute parallel algorithms on massive data sets, commonly called Big Data (with size ranging from a few gigabytes to a few terabytes or even petabytes). This dedicated MapReduce paradigm of data-intensive parallel programming was originally developed by Google in 2003. MapReduce is an abstract model of parallel programming for processing massive data sets on a cluster of computers, and a platform to execute and monitor jobs. MapReduce is straightforward to use, can be easily extended, and even more importantly MapReduce is prone to both hardware and software failures.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
When one uses several hundreds of interconnected cheap machines, failures happen quite often in practice, and need to be addressed.
- 2.
- 3.
- 4.
- 5.
- 6.
One can download the Common Lisp from http://www.lispworks.com/.
- 7.
- 8.
Those figures are usually trade secrets of companies, and are not disclosed publicly.
- 9.
- 10.
- 11.
- 12.
References
Hoefler, T., Lumsdaine, A., Dongarra, J.: Towards efficient MapReduce using MPI. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. Lecture Notes in Computer Science, vol. 5759, pp. 240–249. Springer, Berlin (2009)
Plimpton, S.J., Devine, K.D.: Mapreduce in MPI for large-scale graph algorithms. Parallel Comput. 37(9), 610–632 (2011)
Kaur, S., Bhatnagar, V., Chakravarthy, S.: Stream clustering algorithms: a primer. In: Ella Hassanien, A., Taher Azar, A., Snasael, V., Kacprzyk, J., Abawajy, J.H. (eds.) Big Data in Complex Systems. Studies in Big Data, vol. 9, pp. 105–145. Springer International Publishing, Switzerland (2015)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Nielsen, F. (2016). The MapReduce Paradigm. In: Introduction to HPC with MPI for Data Science. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-21903-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-21903-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21902-8
Online ISBN: 978-3-319-21903-5
eBook Packages: Computer ScienceComputer Science (R0)