Skip to main content

Comparison and Performance Analysis of Join Approach in MapReduce

  • Conference paper
Trustworthy Computing and Services (ISCTCS 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 320))

Included in the following conference series:

Abstract

MapReduce framework has become a general programming model. MapReduce proved its superiority in fields like sorting, full-text searching. However, as demands become complicated, MapReduce could not directly support relational algebra, typically as join, on heterogeneous data source. We discusses the factors that influence the performance when implementing join both in map function and in reduce function. We also conduct implementation and make analysis. Experimental result shows that the first approach wins in situation that datasets involved in join have significant difference in size and one of them is small enough. In order to get advantages of the first approach, we conduct further discuss when the smaller dataset grows and improve it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. DeWitt, D.J., Stonebraker, M.: MapReduce: A major step backwards. Blog post at The Database Column (January 17, 2008)

    Google Scholar 

  2. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive – A Warehousing Solution Over a Map-Reduce Framework. In: VLDB (2009)

    Google Scholar 

  3. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: A not-so-foreign language for data processing. In: SIGMOD, pp. 1099–1110 (2008)

    Google Scholar 

  4. http://www.cascading.org/

  5. Yang, H.-C., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: SIGMOD, pp. 1029–1040 (2007)

    Google Scholar 

  6. Blanas, S., Patel, J.M., Ercegovac, V., Rao, J.: A Comparison of Join Algorithms for Log Processing in MapReduce. In: SIGMOD (2010)

    Google Scholar 

  7. Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using mapreduce. In: SIGMOD, pp. 495–506 (2010)

    Google Scholar 

  8. Okcan, A., Riedewald, M.: Processing Theta-Joins using MapReduce. In: SIGMOD (2011)

    Google Scholar 

  9. Zhang, S., Han, J., Liu, Z., Wang, K., Xu, Z.: SJMR: Parallelizing Spatial Join with MapReduce on Clusters. In: CLUSTER (2009)

    Google Scholar 

  10. Xu, L., Jin, K., Tian, H.: MRData: a MapReduce-Based Tool for Heterogeneous Data Integration. In: ISME (2010)

    Google Scholar 

  11. http://hadoop.apache.org/

  12. http://hadoop.apache.org/mapreduce/

  13. White, T.: Hadoop: The Definitive Guide

    Google Scholar 

  14. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI (2004)

    Google Scholar 

  15. Bernstein, P.A., Goodman, N.: Full reducers for relational queries using multi-attribute semi joins. In: Symp. on Comp. Network (1979)

    Google Scholar 

  16. Graefe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(2) (1993)

    Google Scholar 

  17. Mishra, P., Eich, M.H.: Join processing in relational databases. ACM Comput. Surv. 24(1) (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, F., Wu, Q., Tan, Y. (2013). Comparison and Performance Analysis of Join Approach in MapReduce. In: Yuan, Y., Wu, X., Lu, Y. (eds) Trustworthy Computing and Services. ISCTCS 2012. Communications in Computer and Information Science, vol 320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35795-4_79

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35795-4_79

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35794-7

  • Online ISBN: 978-3-642-35795-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics