Abstract
Apache Hadoop is a distributed framework for storing and processing large quantities of data. Going over each of the terms in the previous statement, "distributed" implies that Hadoop is distributed across several (tens, hundreds, or even thousands) of nodes in a cluster. For "storing and processing" means that Hadoop uses two different frameworks: Hadoop Distributed Filesystem (HDFS) for storage and MapReduce for processing. This is illustrated in Figure 2-1
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Deepak Vohra
About this chapter
Cite this chapter
Vohra, D. (2016). HDFS and MapReduce. In: Practical Hadoop Ecosystem. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-2199-0_2
Download citation
DOI: https://doi.org/10.1007/978-1-4842-2199-0_2
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-2198-3
Online ISBN: 978-1-4842-2199-0
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)