Abstract
In order to mining frequent itemsets on data stream efficiently, a new approach was proposed in this paper. The memory efficient and accurate one-pass algorithm divides all the frequent itemsets into frequent equivalence classes and prune all the redundant itemsets except for those represent the GLB(Greatest Lower Bound) and LUB(Least Upper Bound) of the frequent equivalence class and the number of GLB and LUB is much less than that of frequent itemsets. In order to maintain these equivalence classes, A compact data structure, the frequent itemset enumerate tree (FIET) was proposed in the paper. The detailed experimental evaluation on synthetic and real datasets shows that the algorithm is very accurate in practice and requires significantly lower memory than Jin and Agrawal’s one pass algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Mannila, H., Srikant, R., Toivonent, H., Verkamo, A.I.: Fast discovery of associantion rules. In: Fayyad, U., et al. (eds.) Advances in knowledge Discovery and Data Mining, pp. 307–328. AAAI press, Menlo Park (1996)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc.1994 Int. conf. Very Large DataBases (VLDB 1994), Santiago, Chile, September 1994, pp. 487–499 (1994)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream systems. In: Proceedings of the 2002 ACM Symposium on principles of Database Systems (PODS 2002). ACM Press, New York (2002)
Borgelt, C.: Apriori implementation, http://fuzzy.cs.UniMagdeburg.de/borgelt/Software
Dobra, A., Gehrke, J., Garofalakis, M., Rastogi, R.: Processing complex aggregate queries over data streams. In: proc.of the 2002 ACM SIGMOD Intl. Conf. on Management of Data (June 2002)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the ACM Conference on Knowledge and Data Discovery (SIGKDD) (2000)
Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: Proc.of the 2001 ACM SIGMOD Intl. Conf. on Manaagement of Data, pp. 13–24. ACM Press, New York (2001)
Giannella, C., Han, J., Pei, J., yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Proceedings of the NSF Workshop on Next Generation Data Mining (November 2002)
Gibbons, P.B., Tirthapura, S.: Estimating simple functions on the union of data streams. In: Proc.of the 2001 ACM Symp. on parallel Algorithms and Architechtures, pp. 281–291. ACM Press, New York (2001)
Goethals, B.: Fp-tree implementation, http://www.cs.helsinki.fi/u/goethals/software/index/html
Jin, R., Agrawal, G.: An algorithm for in-core frequent itemset mining on streaming data (submitted, 2004)
Zaki, M.J., Hsiao, C.: Charm: An efficient algorithm for closed itemset mining. In: 2nd SIAM Int’l. conf. on Data Mining (2002)
Han, J., Pei, J., Yin, Y.: Mining Frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD Conference on Management of Data (2000)
Li, C., Cong, G., Tung, A.K.H., Wang, S.: Incremental Maintainence of Quotient Cube for sum and Median. In: Proceedings of SIGKDD, Seattle, WA, USA, pp. 226–235 (August 2004)
Manku, G.S., Motwain, R.: Approximate Frequency Counts Over Data Streams. In: Proceedings of Conference on Very Large DataBase (VLDB), pp. 346–357 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhi-jun, X., Hong, C., Li, C. (2006). An Efficient Algorithm for Frequent Itemset Mining on Data Streams. In: Perner, P. (eds) Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining. ICDM 2006. Lecture Notes in Computer Science(), vol 4065. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11790853_37
Download citation
DOI: https://doi.org/10.1007/11790853_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36036-0
Online ISBN: 978-3-540-36037-7
eBook Packages: Computer ScienceComputer Science (R0)