Memory-Aware Frequent k-Itemset Mining

Atzori, Maurizio; Mancarella, Paolo; Turini, Franco

doi:10.1007/11733492_3

Maurizio Atzori^18,19,
Paolo Mancarella¹⁸ &
Franco Turini¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3933))

Included in the following conference series:

International Workshop on Knowledge Discovery in Inductive Databases

394 Accesses

Abstract

In this paper we show that the well known problem of computing frequent k-itemsets (i.e. itemsets of cardinality k) in a given dataset can be reduced to the problem of finding iceberg queries from a stream of queries suitably constructed from the original dataset. Hence, algorithms for computing frequent k-itemsets can be obtained by adapting algorithms for computing iceberg queries. In the paper we show that, for sparse datasets, this can be done directly, i.e. without generating frequent x-itemsets, for each x < k, as done in the most common algorithms based on a level-wise approach. We exploit a recent algorithm for finding iceberg queries and define an algorithm which requires only three sequential passes over the dataset to compute the frequent k-itemsets (even for k > 3). An important feature of the algorithm is that the amount of main memory required can be determined in advance, and it is shown to be very low for sparse datasets. Experiments show that for very large datasets with millions of small transactions our proposal outperforms the state-of-the-art algorithms. Furthermore, we sketch a first extension of our algorithm that works over data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining – a general survey and comparison. SIGKDD Explorations 2, 58–64 (2000)
Article Google Scholar
Goethals, B.: Survey on frequent pattern mining (2003)
Google Scholar
Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: International Conference on Data Mining, pp. 369–376 (2001)
Google Scholar
Han, E.H., Karypis, G., Kumar, V., Mobasher, B.: Clustering based on association rule hypergraphs. In: Research Issues on Data Mining and Knowledge Discovery (1997)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, May 16-18, 2000, pp. 1–12. ACM, New York (2000)
Chapter Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. In: Proceedings of the ACM PODS 2003, vol. 28, ACM Press, New York (2003)
Google Scholar
Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. The VLDB Journal, 432–444 (1995)
Google Scholar
Toivonen, H.: Sampling large databases for association rules. In: Vijayaraman, T.M., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.) Proc. 1996 Int. Conf. Very Large Data Bases, pp. 134–145. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Chen, B., Haas, P., Scheuermann, P.: A new two-phase sampling based algorithm for discovering association rules. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 462–468. ACM Press, New York (2002)
Chapter Google Scholar
Chen, B., Haas, P.J., Schauermann, P.: Fast: A new sampling-based algorithm for discovering association rules. In: 18th International Conference on Data Engineering (2002)
Google Scholar
Goethals, B.: Memory issues in frequent itemset mining. In: Proceedings of the 2004 ACM Symposium on Applied Computing (SAC 2004), Nicosia, Cyprus, March 14 –17. ACM, New York (2004)
Google Scholar
Zaki, M.J.: Scalable algorithms for association mining. In: IEEE Transactions on Knowledge and Data Engineering, pp. 372–390. ACM Press, New York (2000)
Google Scholar
Goethals, B., Zaki, M.J. (eds.): Proceedings of the ICDM 2003, Workshop on Frequent Itemset Mining Implementations, FIMI 2003, Melbourne, Florida, USA, December 19. CEUR Workshop Proceedings, vol. 90. CEUR-WS.org (2003)
Google Scholar
Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: Examiner: Optimized levelwise frequent pattern mining with monotone constraint. In: International Conference on Data Mining 2003, Melbourne, Florida, USA, pp. 11–18 (2003)
Google Scholar
Borgelt, C.: Keeping things simple: Finding frequent item sets by recursive elimination. In: Workshop Open Software for Data Mining, on Frequent Pattern Mining Implementations (OSDM 2005), Chicago, IL, USA (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, University of Pisa, Italy
Maurizio Atzori, Paolo Mancarella & Franco Turini
ISTI-CNR, Area della Ricerca di Pisa, Italy
Maurizio Atzori

Authors

Maurizio Atzori
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Mancarella
View author publications
You can also search for this author in PubMed Google Scholar
Franco Turini
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Pisa KDD Laboratory, ISTI - C.N.R, Area della Ricerca di Pisa, Via Giuseppe Moruzzi 1, Pisa, Italy
Francesco Bonchi
INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Atzori, M., Mancarella, P., Turini, F. (2006). Memory-Aware Frequent k-Itemset Mining. In: Bonchi, F., Boulicaut, JF. (eds) Knowledge Discovery in Inductive Databases. KDID 2005. Lecture Notes in Computer Science, vol 3933. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733492_3

Download citation

DOI: https://doi.org/10.1007/11733492_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33292-3
Online ISBN: 978-3-540-33293-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics