Skip to main content

Memory-Aware Frequent k-Itemset Mining

  • Conference paper
Knowledge Discovery in Inductive Databases (KDID 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3933))

Included in the following conference series:

  • 394 Accesses

Abstract

In this paper we show that the well known problem of computing frequent k-itemsets (i.e. itemsets of cardinality k) in a given dataset can be reduced to the problem of finding iceberg queries from a stream of queries suitably constructed from the original dataset. Hence, algorithms for computing frequent k-itemsets can be obtained by adapting algorithms for computing iceberg queries. In the paper we show that, for sparse datasets, this can be done directly, i.e. without generating frequent x-itemsets, for each x < k, as done in the most common algorithms based on a level-wise approach. We exploit a recent algorithm for finding iceberg queries and define an algorithm which requires only three sequential passes over the dataset to compute the frequent k-itemsets (even for k > 3). An important feature of the algorithm is that the amount of main memory required can be determined in advance, and it is shown to be very low for sparse datasets. Experiments show that for very large datasets with millions of small transactions our proposal outperforms the state-of-the-art algorithms. Furthermore, we sketch a first extension of our algorithm that works over data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining – a general survey and comparison. SIGKDD Explorations 2, 58–64 (2000)

    Article  Google Scholar 

  2. Goethals, B.: Survey on frequent pattern mining (2003)

    Google Scholar 

  3. Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: International Conference on Data Mining, pp. 369–376 (2001)

    Google Scholar 

  4. Han, E.H., Karypis, G., Kumar, V., Mobasher, B.: Clustering based on association rule hypergraphs. In: Research Issues on Data Mining and Knowledge Discovery (1997)

    Google Scholar 

  5. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, May 16-18, 2000, pp. 1–12. ACM, New York (2000)

    Chapter  Google Scholar 

  6. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  7. Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. In: Proceedings of the ACM PODS 2003, vol. 28, ACM Press, New York (2003)

    Google Scholar 

  8. Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. The VLDB Journal, 432–444 (1995)

    Google Scholar 

  9. Toivonen, H.: Sampling large databases for association rules. In: Vijayaraman, T.M., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.) Proc. 1996 Int. Conf. Very Large Data Bases, pp. 134–145. Morgan Kaufmann, San Francisco (1996)

    Google Scholar 

  10. Chen, B., Haas, P., Scheuermann, P.: A new two-phase sampling based algorithm for discovering association rules. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 462–468. ACM Press, New York (2002)

    Chapter  Google Scholar 

  11. Chen, B., Haas, P.J., Schauermann, P.: Fast: A new sampling-based algorithm for discovering association rules. In: 18th International Conference on Data Engineering (2002)

    Google Scholar 

  12. Goethals, B.: Memory issues in frequent itemset mining. In: Proceedings of the 2004 ACM Symposium on Applied Computing (SAC 2004), Nicosia, Cyprus, March 14 –17. ACM, New York (2004)

    Google Scholar 

  13. Zaki, M.J.: Scalable algorithms for association mining. In: IEEE Transactions on Knowledge and Data Engineering, pp. 372–390. ACM Press, New York (2000)

    Google Scholar 

  14. Goethals, B., Zaki, M.J. (eds.): Proceedings of the ICDM 2003, Workshop on Frequent Itemset Mining Implementations, FIMI 2003, Melbourne, Florida, USA, December 19. CEUR Workshop Proceedings, vol. 90. CEUR-WS.org (2003)

    Google Scholar 

  15. Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: Examiner: Optimized levelwise frequent pattern mining with monotone constraint. In: International Conference on Data Mining 2003, Melbourne, Florida, USA, pp. 11–18 (2003)

    Google Scholar 

  16. Borgelt, C.: Keeping things simple: Finding frequent item sets by recursive elimination. In: Workshop Open Software for Data Mining, on Frequent Pattern Mining Implementations (OSDM 2005), Chicago, IL, USA (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Atzori, M., Mancarella, P., Turini, F. (2006). Memory-Aware Frequent k-Itemset Mining. In: Bonchi, F., Boulicaut, JF. (eds) Knowledge Discovery in Inductive Databases. KDID 2005. Lecture Notes in Computer Science, vol 3933. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733492_3

Download citation

  • DOI: https://doi.org/10.1007/11733492_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33292-3

  • Online ISBN: 978-3-540-33293-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics