MapReduce refers to both a programming model and the corresponding distributed framework. Its model is composed of two phases, map and reduce, which manipulate data formated as key-value pairs. Map phase splits and sorts data on keys, whereas reduce phase applies user-defined function to process data with the same key. In this way, MapReduce is a typical divide-and-conquer framework that is designed to handle embarrassingly parallel problems, namely problems that can be split into sub-tasks with little or no synchronization costs.
MapReduce is a programming framework that allows users to process large-scaled data by leveraging the parallelism among a cluster of nodes. It is also used to refer to the distributed engine which splits and disseminates users’ jobs and monitors their processing in the cluster. MapReduce is a typical divide-and-conquer framework, since it transforms the user code into an embarrassingly parallel job, where little or no effort...
- 1.Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th USENIX Symposium on Operating System Design and Implementation; 2004. p. 137–50.Google Scholar
- 3.Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung. The Google file system. In: Proceedings of the 19th ACM Symposium on Operating System Principles; 2003. p. 29–43.Google Scholar
- 4.Dittrich J, Quiané-Ruiz J-A, Jindal A, Kargin Y, Setty V, Schad J. Hadoop++: making a yellow elephant run like a cheetah (without It even noticing). Proc VLDB Endow. 2010;3(1):518–29.Google Scholar
- 8.Olston C, Reed B, Srivastava U, Kumar R, Tomkins A. Pig latin: a not-so-foreign language for data processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 1099–110.Google Scholar
- 10.Pavlo A, Paulson E, Rasin A, Abadi DJ, DeWitt DJ, Madden S, Stonebraker M. A comparison of approaches to large-scale data analysis. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2009. p. 165–78.Google Scholar
- 12.Sai Wu, Feng Li, Sharad Mehrotra, Beng Chin Ooi. Query optimization for massively parallel data processing. In: Proceedings of the 2nd ACM Symposium on Cloud Computing; 2011. p. 12.Google Scholar
- 13.Afrati FN, Das Sarma A, Menestrina D, Parameswaran AG, Ullman JD. Fuzzy joins using MapReduce. In: Proceedings of the 28th International Conference on Data Engineering; 2012. p. 498–509.Google Scholar
- 15.Li F, Ooi BC, Tamer Özsu M, Wu S. Distributed data management using MapReduce. ACM Comput Surv. 2014;46(3):31:1–31:42.Google Scholar
- 19.Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G. Pregel: a system for large-scale graph processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2010. p. 135–46.Google Scholar