Optimizing Aggregate Query Processing in Cloud Data Warehouses

Kurunji, Swathi; Ge, Tingjian; Fu, Xinwen; Liu, Benyuan; Kumar, Amrith; Chen, Cindy X.

doi:10.1007/978-3-319-10067-8_1

Swathi Kurunji¹⁷,
Tingjian Ge¹⁷,
Xinwen Fu¹⁷,
Benyuan Liu¹⁷,
Amrith Kumar¹⁷ &
…
Cindy X. Chen¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8648))

Included in the following conference series:

International Conference on Data Management in Cloud, Grid and P2P Systems

546 Accesses
3 Citations

Abstract

In this paper, we study and optimize the aggregate query processing in a highly distributed Cloud Data Warehouse, where each database stores a subset of relational data in a star-schema. Existing aggregate query processing algorithms focus on optimizing various query operations but give less importance to communication cost overhead (Two-phase algorithm). However, in cloud architectures, the communication cost overhead is an important factor in query processing. Thus, we consider communication overhead to improve the distributed query processing in such cloud data warehouses. We then design query-processing algorithms by analyzing aggregate operation and eliminating most of the sort and group-by operations with the help of integrity constraints and our proposed storage structures, PK-map and Tuple-index-map. Extensive experiments on PlanetLab cloud machines validate the effectiveness of our proposed framework in improving the response time, reducing node-to-node interdependency, minimizing communication overhead, and reducing database table access required for aggregate query.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kemper, A., Neumann, T.: Hyper: A hybrid OLTP and OLAP main memory database system based on virtual memory snapshots. In: IEEE 27th International Conference on Data Engineering (ICDE), Hannover, Germany, pp. 195–206 (2011)
Google Scholar
Vlachou, A., Doulkeridis, C., Norvag, K., Kotidis, Y.: Peer-to-Peer Query Processing over Multidimensional Data. Springer (2012)
Google Scholar
Curino, C., Evan, P.C.J., Raluca, A.P., Malviya, N., Wu, E., Madden, S., Balakrishnan, H., Zeldovich, N.: Relational Cloud: A Database Service for the cloud. In: 5th Biennial Conference on Innovative Data Systems Research (CIDR), California, USA, pp. 235–240 (2011)
Google Scholar
Simmen, D., Shekita, E., Malkemus, T.: Fundamental Techniques for Order Optimization. In: ACM SIGMOD International Conference on Management of Data, Montreal, Canada, vol. 25, pp. 57–67 (1996)
Google Scholar
Kossmann, D.: The state of the art in distributed query processing. ACM Computing Surveys (CSUR) 32(4), 422–469 (2000)
Article Google Scholar
Xin, D., Han, J., Li, X., Benjamin, W.W.: Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration. In: 29th International Conference on Very Large Data Bases (VLDB), Berlin, Germany, vol. 29, pp. 476–487 (2003)
Google Scholar
Boukhelef, D., Kitagawa, H.: Efficient Management of Multidimensional Data in Structured Peer-to-Peer Overlays. In: 35th International Conference on Very Large Data Bases (VLDB), vol. 35. Lyon, France (2009)
Google Scholar
Graefe, G.: New algorithms for join and grouping operations. Journal Computer Science - Research and Development 27(1), 3–27 (2012)
Article Google Scholar
Soundararajan, G., Lupei, D., Ghanbari, S., Adrian, D.P., Chen, J., Amza, C.: Dynamic Resource Allocation for Database Servers Running on Virtual Storage. In: 7th USENIX Conference on File and Storage Technologies (FAST), San Francisco, California, USA, pp. 71–84 (2009)
Google Scholar
Garcia-Molina, H., Salem, K.: Main Memory Database Systems. IEEE Transactions on Knowledge and Data Engineering (TKDE) 4(6), 509–516 (1992)
Article Google Scholar
Plattner, H.: A common database approach for OLTP and OLAP using an in-memory column database. In: ACM SIGMOD International Conference on Management of Data, Providence, USA, pp. 1–2 (2009)
Google Scholar
Peterson, L., Roscoe, T.: The design principles of PlanetLab. In: ACM SIGOPS Operating Systems Review, New York, USA, pp. 11–16 (2006)
Google Scholar
Planetlab Cloud, http://www.planet-lab.org/
Chaudhuri, S., Shim, K.: Including Group-By in Query Optimization. In: 20th International Conference on Very Large Data Bases (VLDB), Santiago, Chile, pp. 354–366 (1994)
Google Scholar
Survey, http://www.oracle.com/us/products/database/high-performance-data-warehousing-1869944.pdf
Agarwal, S., Agrawal, R., Prasad, M.D., Gupta, A., Jeffrey, F.N., Ramakrishnan, R., Sarawagi, S.: On the Computation of Multidimentional Aggregates. In: 22nd International Conference on Very Large Data Bases (VLDB), Mumbai (Bombay), India, vol. 22, pp. 506–521 (1996)
Google Scholar
Kurunji, S., Ge, T., Liu, B., Chen, C.X.: Communication Cost Optimization for Cloud Data Warehouse Queries. In: 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings (CloudCom), Taipei, Taiwan, pp. 512–519 (2012)
Google Scholar
Srikanth, B., Li, H., Unmesh, J., Zhu, Y., Vince, L., Thierry, C.: Adaptive and Big Data Scale Parallel Execution in Oracle. International Journal on Very Large Data Bases (VLDB) 6(11), 1102–1113 (2013)
Google Scholar
Neumann, T., Moerkotte, G.: A Combined Framework for Grouping and Order Optimization. In: 30th International Conference on Very Large Data Bases (VLDB), Toronto, Canada, vol. 30, pp. 960–971 (2004)
Google Scholar
TPC-H benchmark, http://www.tpc.org/tpch/spec/tpch2.14.4.pdf
Teradata, http://www.teradata.com/white-papers/Teradata-Aggregate-Designer-eb6110
Weipeng, P.Y., Per-Ake, L.: Eager Aggregation and Lazy Aggregation. In: 21st International Conference on Very Large Data Bases (VLDB), Zurich, Switzerland, pp. 345–357 (1995)
Google Scholar
Hasan, W., Motwani, R.: Coloring Away Communication in Parallel Query Optimization. In: 21st International Conference on Very Large Data Bases (VLDB), Zurich, Switzerland, pp. 239–250 (1995)
Google Scholar
Wang, X., Cherniack, M.: Avoiding Sorting and Grouping in Processing Queries. In: 29th International Conference on Very Large Data Bases (VLDB), Berlin, Germany, vol. 29, pp. 826–837 (2003)
Google Scholar
Cao, Y., Bramandia, R., Chan, C.-Y., Tan, K.-L.: Sort-Sharing-Aware Query Processing. International Journal on Very Large Data Bases (VLDB) 21(3), 411–436 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Massachusetts Lowell, MA, USA
Swathi Kurunji, Tingjian Ge, Xinwen Fu, Benyuan Liu, Amrith Kumar & Cindy X. Chen

Authors

Swathi Kurunji
View author publications
You can also search for this author in PubMed Google Scholar
Tingjian Ge
View author publications
You can also search for this author in PubMed Google Scholar
Xinwen Fu
View author publications
You can also search for this author in PubMed Google Scholar
Benyuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Amrith Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Cindy X. Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Paul Sabatier University, IRIT, 118, route de Narbonne, 31062, Toulouse Cedex, France
Abdelkader Hameurlain & Franck Morvan &
HCMC University of Technology, 268 Ly Thuong Kiet Street, District 10, Ho Chi Minh City, Vietnam
Tran Khanh Dang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kurunji, S., Ge, T., Fu, X., Liu, B., Kumar, A., Chen, C.X. (2014). Optimizing Aggregate Query Processing in Cloud Data Warehouses. In: Hameurlain, A., Dang, T.K., Morvan, F. (eds) Data Management in Cloud, Grid and P2P Systems. Globe 2014. Lecture Notes in Computer Science, vol 8648. Springer, Cham. https://doi.org/10.1007/978-3-319-10067-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-10067-8_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10066-1
Online ISBN: 978-3-319-10067-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics