Skip to main content

Efficient Online Aggregates in Dense-Region-Based Data Cube Representations

  • Conference paper
Book cover Data Warehousing and Knowledge Discovery (DaWaK 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5691))

Included in the following conference series:

Abstract

In-memory OLAP systems require a space-efficient representation of sparse data cubes in order to accommodate large data sets. On the other hand, most efficient online aggregation techniques, such as prefix sums, are built on dense array-based representations. These are often not applicable to real-world data due to the size of the arrays which usually cannot be compressed well, as most sparsity is removed during pre-processing. A possible solution is to identify dense regions in a sparse cube and only represent those using arrays, while storing sparse data separately, e.g. in a spatial index structure. Previous dense-region-based approaches have concentrated mainly on the effectiveness of the dense-region detection (i.e. on the space-efficiency of the result). However, especially in higher-dimensional cubes, data is usually more cluttered, resulting in a potentially large number of small dense regions, which negatively affects query performance on such a structure. In this paper, our focus is not only on space-efficiency but also on time-efficiency, both for the initial dense-region extraction and for queries carried out in the resulting hybrid data structure. We describe two methods to trade available memory for increased aggregate query performance. In addition, optimizations in our approach significantly reduce the time to build the initial data structure compared to former systems. Also, we present a straightforward adaptation of our approach to support multi-core or multi-processor architectures, which can further enhance query performance. Experiments with different real-world data sets show how various parameter settings can be used to adjust the efficiency and effectiveness of our algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: Proceedings of ACM SIGMOD, pp. 322–331 (1990)

    Google Scholar 

  2. Cheung, D.W., Zhou, B., Kao, B., Kan, H., Lee, S.D.: Towards the building of dense-region-based OLAP system. Data and Knowledge Engineering 36(1), 1–27 (2001)

    Article  MATH  Google Scholar 

  3. Chun, S., Chung, C.-W., Lee, S.-L.: Space-efficient cubes for OLAP range-sum queries. Decision Support Systems 37(1), 83–102 (2004)

    Article  Google Scholar 

  4. Geffner, S., Agrawal, D., El Abbadi, A., Smith, T.: Relative prefix sums: an efficient approach for querying dynamic OLAP data cubes. In: Proceedings of International Conference on Data Engineering, Sydney, Australia, pp. 328–335 (1999)

    Google Scholar 

  5. Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, A.D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: Data Mining and Knowledge Discovery, pp. 29–53 (1997)

    Google Scholar 

  6. Gupta, H., Harinarayan, V., Rajaraman, A., Ullman, J.: Index selection for OLAP. In: Proceedings of the 13th International Conference on Data Engineering, pp. 208–219 (1997)

    Google Scholar 

  7. Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of ACM SIGMOD, pp. 47–57 (1984)

    Google Scholar 

  8. Ho, C.-T., Agrawal, R., Megido, N., Srikant, R.: Range queries in OLAP data cubes. In: Proceedings of ACM SIGMOD, pp. 73–88 (1997)

    Google Scholar 

  9. Lauer, T., Mai, D., Hagedorn, P.: Efficient range-sum queries along dimensional hierarchies in data cubes. In: Proceedings of DBKDA, Cancún, Mexico (2009)

    Google Scholar 

  10. Mamoulis, N., Bakiras, S., Kalnis, P.: Evaluation of top-k OLAP queries using aggregate R-trees. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 236–253. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  11. Lee, S.-L.: An effective algorithm to extract dense sub-cubes from a large sparse cube. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 155–164. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Riedewald, M., Agrawal, D., El Abbadi, A.: Flexible data cubes for online aggregation. In: Proc. of the International Conference on Database Theory, pp. 159–173 (2001)

    Google Scholar 

  13. Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques. Addison-Wesley, Reading (2000)

    MATH  Google Scholar 

  14. Zhao, Y., Deshpande, P., Naughton, J.: An array-based algorithm for simultaneous multidimensional aggregates. In: Proc. ACM SIGMOD, Arizona, USA, pp. 159–170 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Haddadin, K., Lauer, T. (2009). Efficient Online Aggregates in Dense-Region-Based Data Cube Representations. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2009. Lecture Notes in Computer Science, vol 5691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03730-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03730-6_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03729-0

  • Online ISBN: 978-3-642-03730-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics