Skip to main content

Optimizing Aggregate Query Processing in Cloud Data Warehouses

  • Conference paper
Data Management in Cloud, Grid and P2P Systems (Globe 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8648))

Abstract

In this paper, we study and optimize the aggregate query processing in a highly distributed Cloud Data Warehouse, where each database stores a subset of relational data in a star-schema. Existing aggregate query processing algorithms focus on optimizing various query operations but give less importance to communication cost overhead (Two-phase algorithm). However, in cloud architectures, the communication cost overhead is an important factor in query processing. Thus, we consider communication overhead to improve the distributed query processing in such cloud data warehouses. We then design query-processing algorithms by analyzing aggregate operation and eliminating most of the sort and group-by operations with the help of integrity constraints and our proposed storage structures, PK-map and Tuple-index-map. Extensive experiments on PlanetLab cloud machines validate the effectiveness of our proposed framework in improving the response time, reducing node-to-node interdependency, minimizing communication overhead, and reducing database table access required for aggregate query.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kemper, A., Neumann, T.: Hyper: A hybrid OLTP and OLAP main memory database system based on virtual memory snapshots. In: IEEE 27th International Conference on Data Engineering (ICDE), Hannover, Germany, pp. 195–206 (2011)

    Google Scholar 

  2. Vlachou, A., Doulkeridis, C., Norvag, K., Kotidis, Y.: Peer-to-Peer Query Processing over Multidimensional Data. Springer (2012)

    Google Scholar 

  3. Curino, C., Evan, P.C.J., Raluca, A.P., Malviya, N., Wu, E., Madden, S., Balakrishnan, H., Zeldovich, N.: Relational Cloud: A Database Service for the cloud. In: 5th Biennial Conference on Innovative Data Systems Research (CIDR), California, USA, pp. 235–240 (2011)

    Google Scholar 

  4. Simmen, D., Shekita, E., Malkemus, T.: Fundamental Techniques for Order Optimization. In: ACM SIGMOD International Conference on Management of Data, Montreal, Canada, vol. 25, pp. 57–67 (1996)

    Google Scholar 

  5. Kossmann, D.: The state of the art in distributed query processing. ACM Computing Surveys (CSUR) 32(4), 422–469 (2000)

    Article  Google Scholar 

  6. Xin, D., Han, J., Li, X., Benjamin, W.W.: Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration. In: 29th International Conference on Very Large Data Bases (VLDB), Berlin, Germany, vol. 29, pp. 476–487 (2003)

    Google Scholar 

  7. Boukhelef, D., Kitagawa, H.: Efficient Management of Multidimensional Data in Structured Peer-to-Peer Overlays. In: 35th International Conference on Very Large Data Bases (VLDB), vol. 35. Lyon, France (2009)

    Google Scholar 

  8. Graefe, G.: New algorithms for join and grouping operations. Journal Computer Science - Research and Development 27(1), 3–27 (2012)

    Article  Google Scholar 

  9. Soundararajan, G., Lupei, D., Ghanbari, S., Adrian, D.P., Chen, J., Amza, C.: Dynamic Resource Allocation for Database Servers Running on Virtual Storage. In: 7th USENIX Conference on File and Storage Technologies (FAST), San Francisco, California, USA, pp. 71–84 (2009)

    Google Scholar 

  10. Garcia-Molina, H., Salem, K.: Main Memory Database Systems. IEEE Transactions on Knowledge and Data Engineering (TKDE) 4(6), 509–516 (1992)

    Article  Google Scholar 

  11. Plattner, H.: A common database approach for OLTP and OLAP using an in-memory column database. In: ACM SIGMOD International Conference on Management of Data, Providence, USA, pp. 1–2 (2009)

    Google Scholar 

  12. Peterson, L., Roscoe, T.: The design principles of PlanetLab. In: ACM SIGOPS Operating Systems Review, New York, USA, pp. 11–16 (2006)

    Google Scholar 

  13. Planetlab Cloud, http://www.planet-lab.org/

  14. Chaudhuri, S., Shim, K.: Including Group-By in Query Optimization. In: 20th International Conference on Very Large Data Bases (VLDB), Santiago, Chile, pp. 354–366 (1994)

    Google Scholar 

  15. Survey, http://www.oracle.com/us/products/database/high-performance-data-warehousing-1869944.pdf

  16. Agarwal, S., Agrawal, R., Prasad, M.D., Gupta, A., Jeffrey, F.N., Ramakrishnan, R., Sarawagi, S.: On the Computation of Multidimentional Aggregates. In: 22nd International Conference on Very Large Data Bases (VLDB), Mumbai (Bombay), India, vol. 22, pp. 506–521 (1996)

    Google Scholar 

  17. Kurunji, S., Ge, T., Liu, B., Chen, C.X.: Communication Cost Optimization for Cloud Data Warehouse Queries. In: 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings (CloudCom), Taipei, Taiwan, pp. 512–519 (2012)

    Google Scholar 

  18. Srikanth, B., Li, H., Unmesh, J., Zhu, Y., Vince, L., Thierry, C.: Adaptive and Big Data Scale Parallel Execution in Oracle. International Journal on Very Large Data Bases (VLDB) 6(11), 1102–1113 (2013)

    Google Scholar 

  19. Neumann, T., Moerkotte, G.: A Combined Framework for Grouping and Order Optimization. In: 30th International Conference on Very Large Data Bases (VLDB), Toronto, Canada, vol. 30, pp. 960–971 (2004)

    Google Scholar 

  20. TPC-H benchmark, http://www.tpc.org/tpch/spec/tpch2.14.4.pdf

  21. Teradata, http://www.teradata.com/white-papers/Teradata-Aggregate-Designer-eb6110

  22. Weipeng, P.Y., Per-Ake, L.: Eager Aggregation and Lazy Aggregation. In: 21st International Conference on Very Large Data Bases (VLDB), Zurich, Switzerland, pp. 345–357 (1995)

    Google Scholar 

  23. Hasan, W., Motwani, R.: Coloring Away Communication in Parallel Query Optimization. In: 21st International Conference on Very Large Data Bases (VLDB), Zurich, Switzerland, pp. 239–250 (1995)

    Google Scholar 

  24. Wang, X., Cherniack, M.: Avoiding Sorting and Grouping in Processing Queries. In: 29th International Conference on Very Large Data Bases (VLDB), Berlin, Germany, vol. 29, pp. 826–837 (2003)

    Google Scholar 

  25. Cao, Y., Bramandia, R., Chan, C.-Y., Tan, K.-L.: Sort-Sharing-Aware Query Processing. International Journal on Very Large Data Bases (VLDB) 21(3), 411–436 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kurunji, S., Ge, T., Fu, X., Liu, B., Kumar, A., Chen, C.X. (2014). Optimizing Aggregate Query Processing in Cloud Data Warehouses. In: Hameurlain, A., Dang, T.K., Morvan, F. (eds) Data Management in Cloud, Grid and P2P Systems. Globe 2014. Lecture Notes in Computer Science, vol 8648. Springer, Cham. https://doi.org/10.1007/978-3-319-10067-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10067-8_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10066-1

  • Online ISBN: 978-3-319-10067-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics