Skip to main content

The MapReduce Paradigm

  • Chapter
  • First Online:
  • 4896 Accesses

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))

Abstract

The system of MapReduce (or Hadoop for an equivalent open source in Java) offers a simple framework to parallelize and execute parallel algorithms on massive data sets, commonly called Big Data (with size ranging from a few gigabytes to a few terabytes or even petabytes). This dedicated MapReduce paradigm of data-intensive parallel programming was originally developed by Google in 2003. MapReduce is an abstract model of parallel programming for processing massive data sets on a cluster of computers, and a platform to execute and monitor jobs. MapReduce is straightforward to use, can be easily extended, and even more importantly MapReduce is prone to both hardware and software failures.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    When one uses several hundreds of interconnected cheap machines, failures happen quite often in practice, and need to be addressed.

  2. 2.

    https://storm.apache.org/.

  3. 3.

    https://spark.apache.org/streaming/.

  4. 4.

    https://hive.apache.org/.

  5. 5.

    http://caml.inria.fr/ocaml/.

  6. 6.

    One can download the Common Lisp from http://www.lispworks.com/.

  7. 7.

    http://wiki.apache.org/hadoop/Grep.

  8. 8.

    Those figures are usually trade secrets of companies, and are not disclosed publicly.

  9. 9.

    http://mapreduce.sandia.gov/.

  10. 10.

    http://mapreduce.sandia.gov/doc/Manual.html.

  11. 11.

    http://www.thecloudavenue.com/2013/01/virtual-machine-for-learning-hadoop.html.

  12. 12.

    http://mapreduce.sandia.gov/.

References

  1. Hoefler, T., Lumsdaine, A., Dongarra, J.: Towards efficient MapReduce using MPI. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. Lecture Notes in Computer Science, vol. 5759, pp. 240–249. Springer, Berlin (2009)

    Google Scholar 

  2. Plimpton, S.J., Devine, K.D.: Mapreduce in MPI for large-scale graph algorithms. Parallel Comput. 37(9), 610–632 (2011)

    Article  Google Scholar 

  3. Kaur, S., Bhatnagar, V., Chakravarthy, S.: Stream clustering algorithms: a primer. In: Ella Hassanien, A., Taher Azar, A., Snasael, V., Kacprzyk, J., Abawajy, J.H. (eds.) Big Data in Complex Systems. Studies in Big Data, vol. 9, pp. 105–145. Springer International Publishing, Switzerland (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank Nielsen .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Nielsen, F. (2016). The MapReduce Paradigm. In: Introduction to HPC with MPI for Data Science. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-21903-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21903-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21902-8

  • Online ISBN: 978-3-319-21903-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics