Skip to main content

Online Aggregation: A Review

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11242))

Abstract

Recent demands for querying big data have revealed various shortcomings of traditional database systems. This, in turn, has led to the emergency of a new kind of query mode, approximate query.Online aggregation is a sample-based technology for approximate querying. It becomes quite indispensable in the era of information explosion today. Online aggregation continuously gives an approximate result with some error estimation (usually confidence interval) until all data are processed. This survey mainly aims at elucidating the most critical two steps for online aggregation: sampling mechanism and error estimation methods. As the development of MapReduce, researchers try to implement online aggregation in MapReduce framework. We will also briefly introduce some implementations of online aggregation in MapReduce and evaluate their features, strength, and drawbacks. Finally, we disclose some existing challenges in online aggregation, which needs attention of the research community and application designers.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. In: ACM SIGMOD Record, vol. 26, pp. 171–182. ACM (1997)

    Google Scholar 

  2. Aarnio, T.: Parallel data processing with MapReduce. In: TKK T-110.5190, Seminar on Internetworking (2009)

    Google Scholar 

  3. Olken, F.: Random sampling from databases. Ph.D. thesis, University of California, Berkeley (1993)

    Google Scholar 

  4. Wu, S., Ooi, B.C., Tan, K.L.: Continuous sampling for online aggregation over multiple queries. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 651–662. ACM (2010)

    Google Scholar 

  5. Agarwal, S., et al.: Knowing when you’re wrong: building fast and reliable approximate query processing systems. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 481–492. ACM (2014)

    Google Scholar 

  6. Zeng, K., Gao, S., Mozafari, B., Zaniolo, C.: The analytical bootstrap: a new method for fast error estimation in approximate query processing. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 277–288. ACM (2014)

    Google Scholar 

  7. Park, Y., Mozafari, B., Sorenson, J., Wang, J.: VerdictDB: universalizing approximate query processing. arXiv preprint arXiv:1804.00770 (2018)

  8. An, M., Sun, X., Ninghui, S.: Dynamic data partitioned online aggregation. J. Comput. Res. Dev. (2010)

    Google Scholar 

  9. Joshi, S., Jermaine, C.: Robust stratified sampling plans for low selectivity queries. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 199–208. IEEE (2008)

    Google Scholar 

  10. Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: BlinkDB: queries with bounded errors and bounded response times on very large data. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp. 29–42. ACM (2013)

    Google Scholar 

  11. Kim, A., Blais, E., Parameswaran, A., Indyk, P., Madden, S., Rubinfeld, R.: Rapid sampling for visualizations with ordering guarantees. Proc. VLDB Endow. 8(5), 521–532 (2015)

    Article  Google Scholar 

  12. Haas, P.J., Hellerstein, J.M.: Ripple joins for online aggregation. ACM SIGMOD Rec. 28(2), 287–298 (1999)

    Article  Google Scholar 

  13. Haas, P.J.: Large-sample and deterministic confidence intervals for online aggregation. In: Proceedings of Ninth International Conference on Scientific and Statistical Database Management, pp. 51–62. IEEE (1997)

    Google Scholar 

  14. Luo, G., Ellmann, C.J., Haas, P.J., Naughton, J.F.: A scalable hash ripple join algorithm. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 252–262. ACM (2002)

    Google Scholar 

  15. Dittrich, J.P., Seeger, B., Taylor, D.S., Widmayer, P.: Progressive merge join: a generic and non-blocking sort-based join algorithm** this work has been supported by grant no. se 553/2-2 from DFG. In: VLDB 2002: Proceedings of the 28th International Conference on Very Large Databases, pp. 299–310. Elsevier (2002)

    Google Scholar 

  16. Jermaine, C., Dobra, A., Arumugam, S., Joshi, S., Pol, A.: The sort-merge-shrink join. ACM Trans. Database Syst. (TODS) 31(4), 1382–1416 (2006)

    Article  Google Scholar 

  17. Jermaine, C., Dobra, A., Arumugam, S., Joshi, S., Pol, A.: A disk-based join with probabilistic guarantees. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 563–574. ACM (2005)

    Google Scholar 

  18. Jermaine, C., Dobra, A., Pol, A., Joshi, S.: Online estimation for subset-based SQL queries. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 745–756. VLDB Endowment (2005)

    Google Scholar 

  19. Li, F., Wu, B., Yi, K., Zhao, Z.: Wander join: online aggregation via random walks. In: Proceedings of the 2016 International Conference on Management of Data, pp. 615–629. ACM (2016)

    Google Scholar 

  20. Wang, Y., Luo, J., Song, A., Dong, F.: Oats: online aggregation with two-level sharing strategy in cloud. Distrib. Parallel Databases 32(4), 467–505 (2014)

    Article  Google Scholar 

  21. Efron, B.: Bootstrap methods: another look at the jackknife. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics, pp. 569–593. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-4380-9_41

    Chapter  Google Scholar 

  22. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system, vol. 37. ACM (2003)

    Google Scholar 

  23. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)

    Article  Google Scholar 

  24. Condie, T., et al.: Online aggregation and continuous query support in MapReduce. In: ACM SIGMOD International Conference on Management of Data, pp. 1115–1118 (2010)

    Google Scholar 

  25. Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce online. In: NSDI, vol. 10, p. 20 (2010)

    Google Scholar 

  26. Qin, C., Rusu, F.: PF-OLA: a high-performance framework for parallel online aggregation. Distrib. Parallel Databases 32(3), 337–375 (2014)

    Article  Google Scholar 

  27. Pansare, N., Borkar, V.R., Jermaine, C., Condie, T.: Online aggregation for large MapReduce jobs. Proc. VLDB Endow. 4(11), 1135–1145 (2011)

    Google Scholar 

  28. Agarwal, S., Agarwal, S., Armbrust, M., Armbrust, M., Stoica, I.: G-OLA: generalized on-line aggregation for interactive analysis on big data. In: ACM SIGMOD International Conference on Management of Data, pp. 913–918 (2015)

    Google Scholar 

  29. Zeng, K., Gao, S., Gu, J., Mozafari, B., Zaniolo, C.: ABS: a system for scalable approximate queries with accuracy guarantees. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1067–1070. ACM (2014)

    Google Scholar 

  30. Zhang, Z., Hu, J., Xie, X., Pan, H., Feng, X.: An online approximate aggregation query processing method based on hadoop. In: 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 117–122. IEEE (2016)

    Google Scholar 

  31. Cheng, Y., Zhao, W., Rusu, F.: Bi-level online aggregation on raw data. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, p. 10. ACM (2017)

    Google Scholar 

  32. Shi, Y., Meng, X., Wang, F., Gan, Y.: You can stop early with cola: online processing of aggregate queries in the cloud. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1223–1232. ACM (2012)

    Google Scholar 

  33. Gan, Y., Meng, X., Shi, Y.: COLA: a cloud-based system for online aggregation. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1368–1371. IEEE (2013)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant No. 61772289 and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanlong Wen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Y., Wen, Y., Yuan, X. (2018). Online Aggregation: A Review. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds) Web Information Systems and Applications. WISA 2018. Lecture Notes in Computer Science(), vol 11242. Springer, Cham. https://doi.org/10.1007/978-3-030-02934-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02934-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02933-3

  • Online ISBN: 978-3-030-02934-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics