Abstract
In this paper we show that the well known problem of computing frequent k-itemsets (i.e. itemsets of cardinality k) in a given dataset can be reduced to the problem of finding iceberg queries from a stream of queries suitably constructed from the original dataset. Hence, algorithms for computing frequent k-itemsets can be obtained by adapting algorithms for computing iceberg queries. In the paper we show that, for sparse datasets, this can be done directly, i.e. without generating frequent x-itemsets, for each x < k, as done in the most common algorithms based on a level-wise approach. We exploit a recent algorithm for finding iceberg queries and define an algorithm which requires only three sequential passes over the dataset to compute the frequent k-itemsets (even for k > 3). An important feature of the algorithm is that the amount of main memory required can be determined in advance, and it is shown to be very low for sparse datasets. Experiments show that for very large datasets with millions of small transactions our proposal outperforms the state-of-the-art algorithms. Furthermore, we sketch a first extension of our algorithm that works over data streams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining – a general survey and comparison. SIGKDD Explorations 2, 58–64 (2000)
Goethals, B.: Survey on frequent pattern mining (2003)
Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: International Conference on Data Mining, pp. 369–376 (2001)
Han, E.H., Karypis, G., Kumar, V., Mobasher, B.: Clustering based on association rule hypergraphs. In: Research Issues on Data Mining and Knowledge Discovery (1997)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, May 16-18, 2000, pp. 1–12. ACM, New York (2000)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. In: Proceedings of the ACM PODS 2003, vol. 28, ACM Press, New York (2003)
Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. The VLDB Journal, 432–444 (1995)
Toivonen, H.: Sampling large databases for association rules. In: Vijayaraman, T.M., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.) Proc. 1996 Int. Conf. Very Large Data Bases, pp. 134–145. Morgan Kaufmann, San Francisco (1996)
Chen, B., Haas, P., Scheuermann, P.: A new two-phase sampling based algorithm for discovering association rules. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 462–468. ACM Press, New York (2002)
Chen, B., Haas, P.J., Schauermann, P.: Fast: A new sampling-based algorithm for discovering association rules. In: 18th International Conference on Data Engineering (2002)
Goethals, B.: Memory issues in frequent itemset mining. In: Proceedings of the 2004 ACM Symposium on Applied Computing (SAC 2004), Nicosia, Cyprus, March 14 –17. ACM, New York (2004)
Zaki, M.J.: Scalable algorithms for association mining. In: IEEE Transactions on Knowledge and Data Engineering, pp. 372–390. ACM Press, New York (2000)
Goethals, B., Zaki, M.J. (eds.): Proceedings of the ICDM 2003, Workshop on Frequent Itemset Mining Implementations, FIMI 2003, Melbourne, Florida, USA, December 19. CEUR Workshop Proceedings, vol. 90. CEUR-WS.org (2003)
Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: Examiner: Optimized levelwise frequent pattern mining with monotone constraint. In: International Conference on Data Mining 2003, Melbourne, Florida, USA, pp. 11–18 (2003)
Borgelt, C.: Keeping things simple: Finding frequent item sets by recursive elimination. In: Workshop Open Software for Data Mining, on Frequent Pattern Mining Implementations (OSDM 2005), Chicago, IL, USA (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Atzori, M., Mancarella, P., Turini, F. (2006). Memory-Aware Frequent k-Itemset Mining. In: Bonchi, F., Boulicaut, JF. (eds) Knowledge Discovery in Inductive Databases. KDID 2005. Lecture Notes in Computer Science, vol 3933. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733492_3
Download citation
DOI: https://doi.org/10.1007/11733492_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33292-3
Online ISBN: 978-3-540-33293-0
eBook Packages: Computer ScienceComputer Science (R0)