Exact and Approximate Sizes of Convex Datacubes

Nedjar, Sébastien

doi:10.1007/978-3-642-03730-6_17

Sébastien Nedjar¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5691))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1042 Accesses
2 Citations

Abstract

In various approaches, data cubes are pre-computed in order to efficiently answer Olap queries. The notion of data cube has been explored in various ways: iceberg cubes, range cubes, differential cubes or emerging cubes. Previously, we have introduced the concept of convex cube which generalizes all the quoted variants of cubes. More precisely, the convex cube captures all the tuples satisfying a monotone and/or antimonotone constraint combination. This paper is dedicated to a study of the convex cube size. Actually, knowing the size of such a cube even before computing it has various advantages. First of all, free space can be saved for its storage and the data warehouse administration can be improved. However the main interest of this size knowledge is to choose at best the constraints to apply in order to get a workable result. For an aided calibrating of constraints, we propose a sound characterization, based on inclusion-exclusion principle, of the exact size of convex cube as long as an upper bound which can be very quickly yielded. Moreover we adapt the nearly optimal algorithm HyperLogLog in order to provide a very good approximation of the exact size of convex cubes. Our analytical results are confirmed by experiments: the approximated size of convex cubes is really close to their exact size and can be computed quasi immediately.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beyer, K.S., Ramakrishnan, R.: Bottom-up computation of sparse and iceberg cubes. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) SIGMOD Conference, pp. 359–370. ACM Press, New York (1999)
Google Scholar
Casali, A., Cicchetti, R., Lakhal, L.: Cube lattices: A framework for multidimensional data mining. In: Barbará, D., Kamath, C. (eds.) SDM. SIAM, Philadelphia (2003)
Google Scholar
Casali, A.: Mining borders of the difference of two datacubes. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 391–400. Springer, Heidelberg (2004)
Chapter Google Scholar
Nedjar, S., Casali, A., Cicchetti, R., Lakhal, L.: Emerging cubes for trends analysis in olap databases. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2007. LNCS, vol. 4654, pp. 135–144. Springer, Heidelberg (2007)
Chapter Google Scholar
Han, J., Chen, Y., Dong, G., Pei, J., Wah, B.W., Wang, J., Cai, Y.D.: Stream cube: An architecture for multi-dimensional analysis of data streams. Distributed and Parallel Databases 18(2), 173–197 (2005)
Article Google Scholar
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)
Article Google Scholar
Casali, A., Nedjar, S., Cicchetti, R., Lakhal, L.: Convex cube: Towards a unified structure for multidimensional databases. In: Wagner, R., Revell, N., Pernul, G. (eds.) DEXA 2007. LNCS, vol. 4653, pp. 572–581. Springer, Heidelberg (2007)
Chapter Google Scholar
Flajolet, P., Fusy, E., Gandouet, O., Meunier, F.: Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In: Proceedings of the Conference on Analysis of Algorithms, AofA 2007, pp. 127–146 (2007)
Google Scholar
Lakshmanan, L.V.S., Pei, J., Han, J.: Quotient cube: How to summarize the semantics of a data cube. In: VLDB, pp. 778–789. Morgan Kaufmann, San Francisco (2002)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)
MATH Google Scholar
Pei, J., Han, J., Lakshmanan, L.V.S.: Pushing convertible constraints in frequent itemset mining. Data Min. Knowl. Discov. 8(3), 227–252 (2004)
Article MathSciNet Google Scholar
Vel, M.: Theory of Convex Structures, vol. (50). North-Holland, Amsterdam (1993)
MATH Google Scholar
Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing iceberg concept lattices with titanic. Data Knowl. Eng. 42(2), 189–222 (2002)
Article MATH Google Scholar
Feller, W.: An Introduction to Probability Theory and Its Applications, vol. 2. John Wiley & Sons, Chichester (1971)
MATH Google Scholar
Shukla, A., Deshpande, P., Naughton, J.F., Ramasamy, K.: Storage estimation for multidimensional aggregates in the presence of hierarchies. In: Vijayaraman, T.M., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.) VLDB, pp. 522–531. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Xin, D., Han, J., Li, X., Shao, Z., Wah, B.W.: Computing iceberg cubes by top-down and bottom-up integration: The starcubing approach. IEEE Trans. Knowl. Data Eng. 19(1), 111–126 (2007)
Article Google Scholar
Zaki, M.J., Hsiao, C.J.: Charm: An efficient algorithm for closed itemset mining. In: Grossman, R.L., Han, J., Kumar, V., Mannila, H., Motwani, R. (eds.) SDM. SIAM, Philadelphia (2002)
Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Information Systems 24(1), 25–46 (1999)
Article MathSciNet MATH Google Scholar
Xin, D., Shao, Z., Han, J., Liu, H.: C-cubing: Efficient computation of closed cubes by aggregation-based checking. In: Liu, L., Reuter, A., Whang, K.Y., Zhang, J. (eds.) ICDE, p. 4. IEEE Computer Society, Los Alamitos (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Informatique Fondamentale de Marseille (LIF), Aix-Marseille Université - CNRS, Case 901, 163 Avenue de Luminy, 13288, Marseille Cedex 9, France
Sébastien Nedjar

Authors

Sébastien Nedjar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Aalborg University, Selma Lagerlöfsvej 300, 9220, Aalborg Ø, Denmark
Torben Bach Pedersen
IBM India Research Lab, Plot No. 4, Block C, Institutional Area, Vasant Kunj, 110 070, New Delhi, India
Mukesh K. Mohania
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, 1040, Wien, Austria
A Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nedjar, S. (2009). Exact and Approximate Sizes of Convex Datacubes. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2009. Lecture Notes in Computer Science, vol 5691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03730-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-03730-6_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03729-0
Online ISBN: 978-3-642-03730-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics