Skip to main content

Theoretical Bounds on the Size of Condensed Representations

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3377))

Abstract

Recent studies demonstrate the usefulness of condensed representations as a semantic compression technique for the frequent itemsets. Especially in inductive databases, condensed representations are a useful tool as an intermediate format to support exploration of the itemset space. In this paper we establish theoretical upper bounds on the maximal size of an itemset in different condensed representations. A central notion in the development of the bounds are the l-free sets, that form the basis of many well-known representations. We will bound the maximal cardinality of an l-free set based on the size of the database. More concrete, we compute a lower bound for the size of the database in terms of the size of the l-free set, and when the database size is smaller than this lower bound, we know that the set cannot be l-free. An efficient method for calculating the exact value of the bound, based on combinatorial identities of partial row sums, is presented. We also present preliminary results on a statistical approximation of the bound and we illustrate the results with some simulations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imilienski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. ACM SIGMOD Int. Conf. Management of Data, Washington, D.C., pp. 207–216 (1993)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 439–450 (2000)

    Google Scholar 

  3. Blake, C.L., Merz, C.J.: The UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  4. Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by means of free-sets. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp. 75–85 (2000)

    Google Scholar 

  5. Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proc. PODS Int. Conf. Principles of Database Systems (2001)

    Google Scholar 

  6. Calders, T.: Deducing bounds on the frequency of itemsets. In: EDBT Workshop DTDM Database Techniques in Data Mining (2002)

    Google Scholar 

  7. Calders, T.: Axiomatization and Deduction Rules for the Frequency of Itemsets. PhD thesis, University of Antwerp, Belgium (2003), http://win-www.ruca.ua.ac.be/u/calders/download/thesis.pdf

  8. Calders, T., Goethals, B.: Minimal k-free representations of frequent sets. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp. 71–82 (2002)

    Google Scholar 

  9. Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp. 74–85. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  10. Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proc. KDD Int. Conf. Knowledge Discovery in Databases (2002)

    Google Scholar 

  11. Hettich, S., Bay, S.D.: The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine (1999), http://kdd.ics.uci.edu

  12. Kryszkiewicz, M.: Concise representation of frequent patterns based on disjunction-free generators. In: Proc. IEEE Int. Conf. on Data Mining, pp. 305–312 (2001)

    Google Scholar 

  13. Kryszkiewicz, M.: Upper bound on the length of generalized disjunction free patterns. In: SSDBM (2004)

    Google Scholar 

  14. Kryszkiewicz, M., Gajek, M.: Concise representation of frequent patterns based on generalized disjunction-free generators. In: Proc. PaKDD Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 159–171 (2002)

    Google Scholar 

  15. Kryszkiewicz, M., Gajek, M.: Why to apply generalized disjunction-free generators representation of frequent patterns? In: Proc. International Syposium on Methodologies for Intelligent Systems, pp. 382–392 (2002)

    Google Scholar 

  16. Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations. In: Proc. KDD Int. Conf. Knowledge Discovery in Databases (1996)

    Google Scholar 

  17. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Proc. ICDT Int. Conf. Database Theory, pp. 398–416 (1999)

    Google Scholar 

  18. Van den Bussche, J., Geerts, F., Goethals, B.: A tight upper bound on the number of candidate patterns. In: Proc. ICDM, pp. 155–162 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dexters, N., Calders, T. (2005). Theoretical Bounds on the Size of Condensed Representations. In: Goethals, B., Siebes, A. (eds) Knowledge Discovery in Inductive Databases. KDID 2004. Lecture Notes in Computer Science, vol 3377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31841-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-31841-5_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25082-1

  • Online ISBN: 978-3-540-31841-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics