Abstract
MapReduce framework has become a general programming model. MapReduce proved its superiority in fields like sorting, full-text searching. However, as demands become complicated, MapReduce could not directly support relational algebra, typically as join, on heterogeneous data source. We discusses the factors that influence the performance when implementing join both in map function and in reduce function. We also conduct implementation and make analysis. Experimental result shows that the first approach wins in situation that datasets involved in join have significant difference in size and one of them is small enough. In order to get advantages of the first approach, we conduct further discuss when the smaller dataset grows and improve it.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
DeWitt, D.J., Stonebraker, M.: MapReduce: A major step backwards. Blog post at The Database Column (January 17, 2008)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive – A Warehousing Solution Over a Map-Reduce Framework. In: VLDB (2009)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: A not-so-foreign language for data processing. In: SIGMOD, pp. 1099–1110 (2008)
Yang, H.-C., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: SIGMOD, pp. 1029–1040 (2007)
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J.: A Comparison of Join Algorithms for Log Processing in MapReduce. In: SIGMOD (2010)
Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using mapreduce. In: SIGMOD, pp. 495–506 (2010)
Okcan, A., Riedewald, M.: Processing Theta-Joins using MapReduce. In: SIGMOD (2011)
Zhang, S., Han, J., Liu, Z., Wang, K., Xu, Z.: SJMR: Parallelizing Spatial Join with MapReduce on Clusters. In: CLUSTER (2009)
Xu, L., Jin, K., Tian, H.: MRData: a MapReduce-Based Tool for Heterogeneous Data Integration. In: ISME (2010)
White, T.: Hadoop: The Definitive Guide
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI (2004)
Bernstein, P.A., Goodman, N.: Full reducers for relational queries using multi-attribute semi joins. In: Symp. on Comp. Network (1979)
Graefe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(2) (1993)
Mishra, P., Eich, M.H.: Join processing in relational databases. ACM Comput. Surv. 24(1) (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, F., Wu, Q., Tan, Y. (2013). Comparison and Performance Analysis of Join Approach in MapReduce. In: Yuan, Y., Wu, X., Lu, Y. (eds) Trustworthy Computing and Services. ISCTCS 2012. Communications in Computer and Information Science, vol 320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35795-4_79
Download citation
DOI: https://doi.org/10.1007/978-3-642-35795-4_79
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35794-7
Online ISBN: 978-3-642-35795-4
eBook Packages: Computer ScienceComputer Science (R0)