Skip to main content

An Adaptive Skew Insensitive Join Algorithm for Large Scale Data Analytics

  • Conference paper
Book cover Web Technologies and Applications (APWeb 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8709))

Included in the following conference series:

Abstract

With data explosion in recent years, timely and cost-effective analytics over large scale data has been a hotspot of data management research. Join is an important operation in database query. However, data skew happens naturally in many applications, which will severely degrade the performance of most join algorithms. To address this problem, this paper introduces an Adaptive Skew Insensitive(ASI) join algorithm to handle with serious data skew. Based on our cost analysis, ASI join algorithm can adaptively choose the best join algorithm for different inputs. Compared with several state-of-the-art join methods through adequate experiments, our method achieves significant improvement of join efficiency dealing with data skew.

This work was supported by Natural Science Foundation of China (No.60973002 and No.61170003), the National High Technology Research and Development Program of China (Grant No. 2012AA011002), and MOE-CMCC Research Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: The next frontier for innovation, competition, and productivity (2011)

    Google Scholar 

  2. Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 975–986. ACM (2010)

    Google Scholar 

  3. Wilson, R.: Social choice theory without the pareto principle. Journal of Economic Theory 5(3), 478–486 (1972)

    Article  MathSciNet  Google Scholar 

  4. Walton, C.B., Dale, A.G., Jenevein, R.M.: A taxonomy and performance model of data skew effects in parallel joins. VLDB 91, 537–548 (1991)

    Google Scholar 

  5. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 165–178. ACM (2009)

    Google Scholar 

  6. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  7. Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: Mad skills: new analysis practices for big data. Proceedings of the VLDB Endowment 2(2), 1481–1492 (2009)

    Article  Google Scholar 

  8. Lam, C.: Hadoop in action. Manning Publications Co. (2010)

    Google Scholar 

  9. DeWitt, D.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Practical skew handling in parallel joins. VLDB 92, 27–40 (1992)

    Google Scholar 

  10. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment 2(2), 1626–1629 (2009)

    Article  Google Scholar 

  11. Atta, F., Viglas, S.D., Niazi, S.: Sand join–A skew handling join algorithm for google’s mapreduce framework. In: 2011 IEEE 14th International Multitopic Conference (INMIC), pp. 170–175. IEEE (2011)

    Google Scholar 

  12. Gates, A.: Programming Pig. O’Reilly (2011)

    Google Scholar 

  13. Council, T.P.P.: Tpc-h benchmark specification (2008), http://www.tcp.org/hspec.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Liao, W., Wang, T., Li, H., Yang, D., Qiu, Z., Lei, K. (2014). An Adaptive Skew Insensitive Join Algorithm for Large Scale Data Analytics. In: Chen, L., Jia, Y., Sellis, T., Liu, G. (eds) Web Technologies and Applications. APWeb 2014. Lecture Notes in Computer Science, vol 8709. Springer, Cham. https://doi.org/10.1007/978-3-319-11116-2_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11116-2_44

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11115-5

  • Online ISBN: 978-3-319-11116-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics