Abstract
In recent years, Map-Reduce systems have grown into leading solution for processing large volumes of data. Often, in order to minimize the execution time, the developers express their programs using procedural language instead of high-level query language. In such cases one has full control over the program execution, what can lead to several problems, especially when join operation is concerned. In the literature the wide range of join techniques has been proposed, although many of them cannot be easily classified using old Map-Side/Reduce-Side distinction. The main goal of this paper is to propose the taxonomy of the existing join algorithms and provide their evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apache hive reference. https://hive.apache.org/
Apache pig reference. http://pig.apache.org/
Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99–110 (2010)
Atta, F., Viglas, S., Niazi, S.: SAND join - a skew handling join algorithm for google’s mapreduce framework. In: 2011 IEEE 14th International Multitopic Conference (INMIC), pp. 170–175, December 2011
Atta, F.: Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework. Master’s thesis, University of Edinburgh (2010)
Balazinska, M., Howe, B., Kwon, Y., Ren, K.: Managing skew in hadoop. IEEE Data Eng. Bull. 36(1), 24–33 (2013)
Chandar, J.: Join Algorithms using Map/Reduce. Master’s thesis, University of Edinburgh (2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Mag. Commun. ACM - 50th anniversary issue: 1958–2008 51(1), 107–113 (2008)
Ercegovac, V., Blanas, S.: A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 975–986 (2010)
Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for MapReduce, pp. 938–948 (2010)
Lee, T., Kim, K., Kim, H.J.: Join processing using bloom filter in mapreduce. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium, RACS 2012, pp. 100–105. ACM, New York (2012). http://doi.acm.org/10.1145/2401603.2401626
Li, J., Wu, L., Zhang, C.: Optimizing theta-joins in a mapreduce environment. Int. J. Database Theory Appl. 6, 91–108 (2013)
Luo, G., Dong, L.: Adaptive join plan generation in hadoop
Miner, D., Shook, A.: MapReduce Design Patterns. O’Reilly, Beijing (2013). http://opac.inria.fr/record=b1134500, dEBSZ
Okcan, A., Riedewald, M.: Processing theta-joins using mapreduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 949–960 (2011)
Palla, K.: A Comparative Analysis of Join Algorithms Using the Hadoop Map/Reduce Framework. Master’s thesis, University of Edinburgh (2009)
Pigul, A.: Generalized Parallel Join Algorithms and Designing Cost Models (2012)
White, T.: Hadoop: The Definitive Guide, chap. 8, 3rd edn. O’reilly, Sebastopol (2012)
Zhang, X., Chen, L., Wang, M.: Efficient multiway theta-join processing using mapreduce. In: Proceedings of the VLDB Endowment (PVLDB), vol. 5(11), pp. 1184–1195 (2012)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Penar, M., Wilczek, A. (2016). The Evaluation of Map-Reduce Join Algorithms. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-34099-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34098-2
Online ISBN: 978-3-319-34099-9
eBook Packages: Computer ScienceComputer Science (R0)