HDFS and MapReduce

  • Deepak Vohra


Apache Hadoop is a distributed framework for storing and processing large quantities of data. Going over each of the terms in the previous statement, "distributed" implies that Hadoop is distributed across several (tens, hundreds, or even thousands) of nodes in a cluster. For "storing and processing" means that Hadoop uses two different frameworks: Hadoop Distributed Filesystem (HDFS) for storage and MapReduce for processing. This is illustrated in Figure 2-1


Regular Expression Input File Configuration File Word Count Output Directory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

© Deepak Vohra 2016

Authors and Affiliations

  • Deepak Vohra
    • 1
  1. 1.Apt 105White RockCanada

Personalised recommendations