Comparison and Performance Analysis of Join Approach in MapReduce

Wu, Fuhui; Wu, Qingbo; Tan, Yusong

doi:10.1007/978-3-642-35795-4_79

Fuhui Wu³,
Qingbo Wu³ &
Yusong Tan³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 320))

Included in the following conference series:

International Conference on Trustworthy Computing and Services

3231 Accesses
1 Citations

Abstract

MapReduce framework has become a general programming model. MapReduce proved its superiority in fields like sorting, full-text searching. However, as demands become complicated, MapReduce could not directly support relational algebra, typically as join, on heterogeneous data source. We discusses the factors that influence the performance when implementing join both in map function and in reduce function. We also conduct implementation and make analysis. Experimental result shows that the first approach wins in situation that datasets involved in join have significant difference in size and one of them is small enough. In order to get advantages of the first approach, we conduct further discuss when the smaller dataset grows and improve it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

DeWitt, D.J., Stonebraker, M.: MapReduce: A major step backwards. Blog post at The Database Column (January 17, 2008)
Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive – A Warehousing Solution Over a Map-Reduce Framework. In: VLDB (2009)
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: A not-so-foreign language for data processing. In: SIGMOD, pp. 1099–1110 (2008)
Google Scholar
http://www.cascading.org/
Yang, H.-C., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: SIGMOD, pp. 1029–1040 (2007)
Google Scholar
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J.: A Comparison of Join Algorithms for Log Processing in MapReduce. In: SIGMOD (2010)
Google Scholar
Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using mapreduce. In: SIGMOD, pp. 495–506 (2010)
Google Scholar
Okcan, A., Riedewald, M.: Processing Theta-Joins using MapReduce. In: SIGMOD (2011)
Google Scholar
Zhang, S., Han, J., Liu, Z., Wang, K., Xu, Z.: SJMR: Parallelizing Spatial Join with MapReduce on Clusters. In: CLUSTER (2009)
Google Scholar
Xu, L., Jin, K., Tian, H.: MRData: a MapReduce-Based Tool for Heterogeneous Data Integration. In: ISME (2010)
Google Scholar
http://hadoop.apache.org/
http://hadoop.apache.org/mapreduce/
White, T.: Hadoop: The Definitive Guide
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI (2004)
Google Scholar
Bernstein, P.A., Goodman, N.: Full reducers for relational queries using multi-attribute semi joins. In: Symp. on Comp. Network (1979)
Google Scholar
Graefe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(2) (1993)
Google Scholar
Mishra, P., Eich, M.H.: Join processing in relational databases. ACM Comput. Surv. 24(1) (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, National University of Defense Technology, Changsha, China
Fuhui Wu, Qingbo Wu & Yusong Tan

Authors

Fuhui Wu
View author publications
You can also search for this author in PubMed Google Scholar
Qingbo Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yusong Tan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, China
Yuyu Yuan & Xu Wu &
The School of Telecommunications Engineering, Beijing University of Posts and Telecommunications Beijing, P. O. Box 128, 100876, Beijing, China
Yueming Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, F., Wu, Q., Tan, Y. (2013). Comparison and Performance Analysis of Join Approach in MapReduce. In: Yuan, Y., Wu, X., Lu, Y. (eds) Trustworthy Computing and Services. ISCTCS 2012. Communications in Computer and Information Science, vol 320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35795-4_79

Download citation

DOI: https://doi.org/10.1007/978-3-642-35795-4_79
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35794-7
Online ISBN: 978-3-642-35795-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics