Abstract
Hadoop Map Reduce is a system for parallel processing of very large data sets using distributed fault-tolerant storage over very large clusters. The input data set is broken down into pieces, which are the inputs to the Map functions. The Map functions then filter and sort these data chunks (whose size is configurable) on the Hadoop cluster data nodes. The output of the Map processes is delivered to the Reduce processes, which shuffle and summarize the data to produce the resulting output.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2015 Michael Frampton
About this chapter
Cite this chapter
Frampton, M. (2015). Processing Data with Map Reduce. In: Big Data Made Easy. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-0094-0_4
Download citation
DOI: https://doi.org/10.1007/978-1-4842-0094-0_4
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-0095-7
Online ISBN: 978-1-4842-0094-0
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)