Abstract
Recent studies demonstrate the usefulness of condensed representations as a semantic compression technique for the frequent itemsets. Especially in inductive databases, condensed representations are a useful tool as an intermediate format to support exploration of the itemset space. In this paper we establish theoretical upper bounds on the maximal size of an itemset in different condensed representations. A central notion in the development of the bounds are the l-free sets, that form the basis of many well-known representations. We will bound the maximal cardinality of an l-free set based on the size of the database. More concrete, we compute a lower bound for the size of the database in terms of the size of the l-free set, and when the database size is smaller than this lower bound, we know that the set cannot be l-free. An efficient method for calculating the exact value of the bound, based on combinatorial identities of partial row sums, is presented. We also present preliminary results on a statistical approximation of the bound and we illustrate the results with some simulations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imilienski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. ACM SIGMOD Int. Conf. Management of Data, Washington, D.C., pp. 207–216 (1993)
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 439–450 (2000)
Blake, C.L., Merz, C.J.: The UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by means of free-sets. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp. 75–85 (2000)
Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proc. PODS Int. Conf. Principles of Database Systems (2001)
Calders, T.: Deducing bounds on the frequency of itemsets. In: EDBT Workshop DTDM Database Techniques in Data Mining (2002)
Calders, T.: Axiomatization and Deduction Rules for the Frequency of Itemsets. PhD thesis, University of Antwerp, Belgium (2003), http://win-www.ruca.ua.ac.be/u/calders/download/thesis.pdf
Calders, T., Goethals, B.: Minimal k-free representations of frequent sets. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp. 71–82 (2002)
Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp. 74–85. Springer, Heidelberg (2002)
Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proc. KDD Int. Conf. Knowledge Discovery in Databases (2002)
Hettich, S., Bay, S.D.: The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine (1999), http://kdd.ics.uci.edu
Kryszkiewicz, M.: Concise representation of frequent patterns based on disjunction-free generators. In: Proc. IEEE Int. Conf. on Data Mining, pp. 305–312 (2001)
Kryszkiewicz, M.: Upper bound on the length of generalized disjunction free patterns. In: SSDBM (2004)
Kryszkiewicz, M., Gajek, M.: Concise representation of frequent patterns based on generalized disjunction-free generators. In: Proc. PaKDD Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 159–171 (2002)
Kryszkiewicz, M., Gajek, M.: Why to apply generalized disjunction-free generators representation of frequent patterns? In: Proc. International Syposium on Methodologies for Intelligent Systems, pp. 382–392 (2002)
Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations. In: Proc. KDD Int. Conf. Knowledge Discovery in Databases (1996)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Proc. ICDT Int. Conf. Database Theory, pp. 398–416 (1999)
Van den Bussche, J., Geerts, F., Goethals, B.: A tight upper bound on the number of candidate patterns. In: Proc. ICDM, pp. 155–162 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dexters, N., Calders, T. (2005). Theoretical Bounds on the Size of Condensed Representations. In: Goethals, B., Siebes, A. (eds) Knowledge Discovery in Inductive Databases. KDID 2004. Lecture Notes in Computer Science, vol 3377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31841-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-31841-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25082-1
Online ISBN: 978-3-540-31841-5
eBook Packages: Computer ScienceComputer Science (R0)