Theoretical Bounds on the Size of Condensed Representations

Dexters, Nele; Calders, Toon

doi:10.1007/978-3-540-31841-5_4

Theoretical Bounds on the Size of Condensed Representations

Nele Dexters¹⁸ &
Toon Calders¹⁸

Conference paper

184 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3377))

Abstract

Recent studies demonstrate the usefulness of condensed representations as a semantic compression technique for the frequent itemsets. Especially in inductive databases, condensed representations are a useful tool as an intermediate format to support exploration of the itemset space. In this paper we establish theoretical upper bounds on the maximal size of an itemset in different condensed representations. A central notion in the development of the bounds are the l-free sets, that form the basis of many well-known representations. We will bound the maximal cardinality of an l-free set based on the size of the database. More concrete, we compute a lower bound for the size of the database in terms of the size of the l-free set, and when the database size is smaller than this lower bound, we know that the set cannot be l-free. An efficient method for calculating the exact value of the bound, based on combinatorial identities of partial row sums, is presented. We also present preliminary results on a statistical approximation of the bound and we illustrate the results with some simulations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imilienski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. ACM SIGMOD Int. Conf. Management of Data, Washington, D.C., pp. 207–216 (1993)
Google Scholar
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 439–450 (2000)
Google Scholar
Blake, C.L., Merz, C.J.: The UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by means of free-sets. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp. 75–85 (2000)
Google Scholar
Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proc. PODS Int. Conf. Principles of Database Systems (2001)
Google Scholar
Calders, T.: Deducing bounds on the frequency of itemsets. In: EDBT Workshop DTDM Database Techniques in Data Mining (2002)
Google Scholar
Calders, T.: Axiomatization and Deduction Rules for the Frequency of Itemsets. PhD thesis, University of Antwerp, Belgium (2003), http://win-www.ruca.ua.ac.be/u/calders/download/thesis.pdf
Calders, T., Goethals, B.: Minimal k-free representations of frequent sets. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp. 71–82 (2002)
Google Scholar
Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp. 74–85. Springer, Heidelberg (2002)
Chapter Google Scholar
Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proc. KDD Int. Conf. Knowledge Discovery in Databases (2002)
Google Scholar
Hettich, S., Bay, S.D.: The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine (1999), http://kdd.ics.uci.edu
Kryszkiewicz, M.: Concise representation of frequent patterns based on disjunction-free generators. In: Proc. IEEE Int. Conf. on Data Mining, pp. 305–312 (2001)
Google Scholar
Kryszkiewicz, M.: Upper bound on the length of generalized disjunction free patterns. In: SSDBM (2004)
Google Scholar
Kryszkiewicz, M., Gajek, M.: Concise representation of frequent patterns based on generalized disjunction-free generators. In: Proc. PaKDD Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 159–171 (2002)
Google Scholar
Kryszkiewicz, M., Gajek, M.: Why to apply generalized disjunction-free generators representation of frequent patterns? In: Proc. International Syposium on Methodologies for Intelligent Systems, pp. 382–392 (2002)
Google Scholar
Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations. In: Proc. KDD Int. Conf. Knowledge Discovery in Databases (1996)
Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Proc. ICDT Int. Conf. Database Theory, pp. 398–416 (1999)
Google Scholar
Van den Bussche, J., Geerts, F., Goethals, B.: A tight upper bound on the number of candidate patterns. In: Proc. ICDM, pp. 155–162 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Antwerp, Belgium
Nele Dexters & Toon Calders

Authors

Nele Dexters
View author publications
You can also search for this author in PubMed Google Scholar
Toon Calders
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Mathematics and computer Science Department, University of Antwerp, Middelheimlaan 1, 2020, Antwerp, Belgium
Bart Goethals
Department of Computer Science, Universiteit Utrecht,
Arno Siebes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dexters, N., Calders, T. (2005). Theoretical Bounds on the Size of Condensed Representations. In: Goethals, B., Siebes, A. (eds) Knowledge Discovery in Inductive Databases. KDID 2004. Lecture Notes in Computer Science, vol 3377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31841-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-31841-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25082-1
Online ISBN: 978-3-540-31841-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics