Online Aggregation: A Review

Li, Yun; Wen, Yanlong; Yuan, Xiaojie

doi:10.1007/978-3-030-02934-0_10

Online Aggregation: A Review

Yun Li¹⁹,
Yanlong Wen¹⁹ &
Xiaojie Yuan^18,19

Conference paper
First Online: 20 November 2018

1378 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11242))

Abstract

Recent demands for querying big data have revealed various shortcomings of traditional database systems. This, in turn, has led to the emergency of a new kind of query mode, approximate query.Online aggregation is a sample-based technology for approximate querying. It becomes quite indispensable in the era of information explosion today. Online aggregation continuously gives an approximate result with some error estimation (usually confidence interval) until all data are processed. This survey mainly aims at elucidating the most critical two steps for online aggregation: sampling mechanism and error estimation methods. As the development of MapReduce, researchers try to implement online aggregation in MapReduce framework. We will also briefly introduce some implementations of online aggregation in MapReduce and evaluate their features, strength, and drawbacks. Finally, we disclose some existing challenges in online aggregation, which needs attention of the research community and application designers.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. In: ACM SIGMOD Record, vol. 26, pp. 171–182. ACM (1997)
Google Scholar
Aarnio, T.: Parallel data processing with MapReduce. In: TKK T-110.5190, Seminar on Internetworking (2009)
Google Scholar
Olken, F.: Random sampling from databases. Ph.D. thesis, University of California, Berkeley (1993)
Google Scholar
Wu, S., Ooi, B.C., Tan, K.L.: Continuous sampling for online aggregation over multiple queries. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 651–662. ACM (2010)
Google Scholar
Agarwal, S., et al.: Knowing when you’re wrong: building fast and reliable approximate query processing systems. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 481–492. ACM (2014)
Google Scholar
Zeng, K., Gao, S., Mozafari, B., Zaniolo, C.: The analytical bootstrap: a new method for fast error estimation in approximate query processing. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 277–288. ACM (2014)
Google Scholar
Park, Y., Mozafari, B., Sorenson, J., Wang, J.: VerdictDB: universalizing approximate query processing. arXiv preprint arXiv:1804.00770 (2018)
An, M., Sun, X., Ninghui, S.: Dynamic data partitioned online aggregation. J. Comput. Res. Dev. (2010)
Google Scholar
Joshi, S., Jermaine, C.: Robust stratified sampling plans for low selectivity queries. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 199–208. IEEE (2008)
Google Scholar
Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: BlinkDB: queries with bounded errors and bounded response times on very large data. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp. 29–42. ACM (2013)
Google Scholar
Kim, A., Blais, E., Parameswaran, A., Indyk, P., Madden, S., Rubinfeld, R.: Rapid sampling for visualizations with ordering guarantees. Proc. VLDB Endow. 8(5), 521–532 (2015)
Article Google Scholar
Haas, P.J., Hellerstein, J.M.: Ripple joins for online aggregation. ACM SIGMOD Rec. 28(2), 287–298 (1999)
Article Google Scholar
Haas, P.J.: Large-sample and deterministic confidence intervals for online aggregation. In: Proceedings of Ninth International Conference on Scientific and Statistical Database Management, pp. 51–62. IEEE (1997)
Google Scholar
Luo, G., Ellmann, C.J., Haas, P.J., Naughton, J.F.: A scalable hash ripple join algorithm. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 252–262. ACM (2002)
Google Scholar
Dittrich, J.P., Seeger, B., Taylor, D.S., Widmayer, P.: Progressive merge join: a generic and non-blocking sort-based join algorithm** this work has been supported by grant no. se 553/2-2 from DFG. In: VLDB 2002: Proceedings of the 28th International Conference on Very Large Databases, pp. 299–310. Elsevier (2002)
Google Scholar
Jermaine, C., Dobra, A., Arumugam, S., Joshi, S., Pol, A.: The sort-merge-shrink join. ACM Trans. Database Syst. (TODS) 31(4), 1382–1416 (2006)
Article Google Scholar
Jermaine, C., Dobra, A., Arumugam, S., Joshi, S., Pol, A.: A disk-based join with probabilistic guarantees. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 563–574. ACM (2005)
Google Scholar
Jermaine, C., Dobra, A., Pol, A., Joshi, S.: Online estimation for subset-based SQL queries. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 745–756. VLDB Endowment (2005)
Google Scholar
Li, F., Wu, B., Yi, K., Zhao, Z.: Wander join: online aggregation via random walks. In: Proceedings of the 2016 International Conference on Management of Data, pp. 615–629. ACM (2016)
Google Scholar
Wang, Y., Luo, J., Song, A., Dong, F.: Oats: online aggregation with two-level sharing strategy in cloud. Distrib. Parallel Databases 32(4), 467–505 (2014)
Article Google Scholar
Efron, B.: Bootstrap methods: another look at the jackknife. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics, pp. 569–593. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-4380-9_41
Chapter Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system, vol. 37. ACM (2003)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)
Article Google Scholar
Condie, T., et al.: Online aggregation and continuous query support in MapReduce. In: ACM SIGMOD International Conference on Management of Data, pp. 1115–1118 (2010)
Google Scholar
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce online. In: NSDI, vol. 10, p. 20 (2010)
Google Scholar
Qin, C., Rusu, F.: PF-OLA: a high-performance framework for parallel online aggregation. Distrib. Parallel Databases 32(3), 337–375 (2014)
Article Google Scholar
Pansare, N., Borkar, V.R., Jermaine, C., Condie, T.: Online aggregation for large MapReduce jobs. Proc. VLDB Endow. 4(11), 1135–1145 (2011)
Google Scholar
Agarwal, S., Agarwal, S., Armbrust, M., Armbrust, M., Stoica, I.: G-OLA: generalized on-line aggregation for interactive analysis on big data. In: ACM SIGMOD International Conference on Management of Data, pp. 913–918 (2015)
Google Scholar
Zeng, K., Gao, S., Gu, J., Mozafari, B., Zaniolo, C.: ABS: a system for scalable approximate queries with accuracy guarantees. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1067–1070. ACM (2014)
Google Scholar
Zhang, Z., Hu, J., Xie, X., Pan, H., Feng, X.: An online approximate aggregation query processing method based on hadoop. In: 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 117–122. IEEE (2016)
Google Scholar
Cheng, Y., Zhao, W., Rusu, F.: Bi-level online aggregation on raw data. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, p. 10. ACM (2017)
Google Scholar
Shi, Y., Meng, X., Wang, F., Gan, Y.: You can stop early with cola: online processing of aggregate queries in the cloud. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1223–1232. ACM (2012)
Google Scholar
Gan, Y., Meng, X., Shi, Y.: COLA: a cloud-based system for online aggregation. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1368–1371. IEEE (2013)
Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant No. 61772289 and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

College of Cyberspace Security, Nankai University, Tianjin, China
Xiaojie Yuan
College of Computer Science, Nankai University, Tianjin, China
Yun Li, Yanlong Wen & Xiaojie Yuan

Authors

Yun Li
View author publications
You can also search for this author in PubMed Google Scholar
Yanlong Wen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojie Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanlong Wen .

Editor information

Editors and Affiliations

Renmin University of China, Beijing, China
Xiaofeng Meng
Huazhong University of Science and Technology, Wuhan, China
Ruixuan Li
Renmin University of China, Beijing, China
Kanliang Wang
Taiyuan University of Technology, Yuci, China
Baoning Niu
Tianjin University, Tianjin, China
Xin Wang
South China Normal University, Guangzhou, China
Gansen Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Wen, Y., Yuan, X. (2018). Online Aggregation: A Review. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds) Web Information Systems and Applications. WISA 2018. Lecture Notes in Computer Science(), vol 11242. Springer, Cham. https://doi.org/10.1007/978-3-030-02934-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-02934-0_10
Published: 20 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02933-3
Online ISBN: 978-3-030-02934-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics