Abstract
In-memory OLAP systems require a space-efficient representation of sparse data cubes in order to accommodate large data sets. On the other hand, most efficient online aggregation techniques, such as prefix sums, are built on dense array-based representations. These are often not applicable to real-world data due to the size of the arrays which usually cannot be compressed well, as most sparsity is removed during pre-processing. A possible solution is to identify dense regions in a sparse cube and only represent those using arrays, while storing sparse data separately, e.g. in a spatial index structure. Previous dense-region-based approaches have concentrated mainly on the effectiveness of the dense-region detection (i.e. on the space-efficiency of the result). However, especially in higher-dimensional cubes, data is usually more cluttered, resulting in a potentially large number of small dense regions, which negatively affects query performance on such a structure. In this paper, our focus is not only on space-efficiency but also on time-efficiency, both for the initial dense-region extraction and for queries carried out in the resulting hybrid data structure. We describe two methods to trade available memory for increased aggregate query performance. In addition, optimizations in our approach significantly reduce the time to build the initial data structure compared to former systems. Also, we present a straightforward adaptation of our approach to support multi-core or multi-processor architectures, which can further enhance query performance. Experiments with different real-world data sets show how various parameter settings can be used to adjust the efficiency and effectiveness of our algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: Proceedings of ACM SIGMOD, pp. 322–331 (1990)
Cheung, D.W., Zhou, B., Kao, B., Kan, H., Lee, S.D.: Towards the building of dense-region-based OLAP system. Data and Knowledge Engineering 36(1), 1–27 (2001)
Chun, S., Chung, C.-W., Lee, S.-L.: Space-efficient cubes for OLAP range-sum queries. Decision Support Systems 37(1), 83–102 (2004)
Geffner, S., Agrawal, D., El Abbadi, A., Smith, T.: Relative prefix sums: an efficient approach for querying dynamic OLAP data cubes. In: Proceedings of International Conference on Data Engineering, Sydney, Australia, pp. 328–335 (1999)
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, A.D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: Data Mining and Knowledge Discovery, pp. 29–53 (1997)
Gupta, H., Harinarayan, V., Rajaraman, A., Ullman, J.: Index selection for OLAP. In: Proceedings of the 13th International Conference on Data Engineering, pp. 208–219 (1997)
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of ACM SIGMOD, pp. 47–57 (1984)
Ho, C.-T., Agrawal, R., Megido, N., Srikant, R.: Range queries in OLAP data cubes. In: Proceedings of ACM SIGMOD, pp. 73–88 (1997)
Lauer, T., Mai, D., Hagedorn, P.: Efficient range-sum queries along dimensional hierarchies in data cubes. In: Proceedings of DBKDA, Cancún, Mexico (2009)
Mamoulis, N., Bakiras, S., Kalnis, P.: Evaluation of top-k OLAP queries using aggregate R-trees. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 236–253. Springer, Heidelberg (2005)
Lee, S.-L.: An effective algorithm to extract dense sub-cubes from a large sparse cube. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 155–164. Springer, Heidelberg (2006)
Riedewald, M., Agrawal, D., El Abbadi, A.: Flexible data cubes for online aggregation. In: Proc. of the International Conference on Database Theory, pp. 159–173 (2001)
Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques. Addison-Wesley, Reading (2000)
Zhao, Y., Deshpande, P., Naughton, J.: An array-based algorithm for simultaneous multidimensional aggregates. In: Proc. ACM SIGMOD, Arizona, USA, pp. 159–170 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Haddadin, K., Lauer, T. (2009). Efficient Online Aggregates in Dense-Region-Based Data Cube Representations. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2009. Lecture Notes in Computer Science, vol 5691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03730-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-03730-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03729-0
Online ISBN: 978-3-642-03730-6
eBook Packages: Computer ScienceComputer Science (R0)