An Efficient Algorithm for Frequent Itemset Mining on Data Streams

Zhi-jun, Xie; Hong, Chen; Li, Cuiping

doi:10.1007/11790853_37

An Efficient Algorithm for Frequent Itemset Mining on Data Streams

Xie Zhi-jun¹⁹,
Chen Hong¹⁹ &
Cuiping Li¹⁹

Conference paper

1834 Accesses
13 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4065))

Abstract

In order to mining frequent itemsets on data stream efficiently, a new approach was proposed in this paper. The memory efficient and accurate one-pass algorithm divides all the frequent itemsets into frequent equivalence classes and prune all the redundant itemsets except for those represent the GLB(Greatest Lower Bound) and LUB(Least Upper Bound) of the frequent equivalence class and the number of GLB and LUB is much less than that of frequent itemsets. In order to maintain these equivalence classes, A compact data structure, the frequent itemset enumerate tree (FIET) was proposed in the paper. The detailed experimental evaluation on synthetic and real datasets shows that the algorithm is very accurate in practice and requires significantly lower memory than Jin and Agrawal’s one pass algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Mannila, H., Srikant, R., Toivonent, H., Verkamo, A.I.: Fast discovery of associantion rules. In: Fayyad, U., et al. (eds.) Advances in knowledge Discovery and Data Mining, pp. 307–328. AAAI press, Menlo Park (1996)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc.1994 Int. conf. Very Large DataBases (VLDB 1994), Santiago, Chile, September 1994, pp. 487–499 (1994)
Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream systems. In: Proceedings of the 2002 ACM Symposium on principles of Database Systems (PODS 2002). ACM Press, New York (2002)
Google Scholar
Borgelt, C.: Apriori implementation, http://fuzzy.cs.UniMagdeburg.de/borgelt/Software
Dobra, A., Gehrke, J., Garofalakis, M., Rastogi, R.: Processing complex aggregate queries over data streams. In: proc.of the 2002 ACM SIGMOD Intl. Conf. on Management of Data (June 2002)
Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the ACM Conference on Knowledge and Data Discovery (SIGKDD) (2000)
Google Scholar
Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: Proc.of the 2001 ACM SIGMOD Intl. Conf. on Manaagement of Data, pp. 13–24. ACM Press, New York (2001)
Chapter Google Scholar
Giannella, C., Han, J., Pei, J., yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Proceedings of the NSF Workshop on Next Generation Data Mining (November 2002)
Google Scholar
Gibbons, P.B., Tirthapura, S.: Estimating simple functions on the union of data streams. In: Proc.of the 2001 ACM Symp. on parallel Algorithms and Architechtures, pp. 281–291. ACM Press, New York (2001)
Chapter Google Scholar
Goethals, B.: Fp-tree implementation, http://www.cs.helsinki.fi/u/goethals/software/index/html
Jin, R., Agrawal, G.: An algorithm for in-core frequent itemset mining on streaming data (submitted, 2004)
Google Scholar
Zaki, M.J., Hsiao, C.: Charm: An efficient algorithm for closed itemset mining. In: 2nd SIAM Int’l. conf. on Data Mining (2002)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining Frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD Conference on Management of Data (2000)
Google Scholar
Li, C., Cong, G., Tung, A.K.H., Wang, S.: Incremental Maintainence of Quotient Cube for sum and Median. In: Proceedings of SIGKDD, Seattle, WA, USA, pp. 226–235 (August 2004)
Google Scholar
Manku, G.S., Motwain, R.: Approximate Frequency Counts Over Data Streams. In: Proceedings of Conference on Very Large DataBase (VLDB), pp. 346–357 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information, RenMin University, BeiJing, 100872, P.R. China
Xie Zhi-jun, Chen Hong & Cuiping Li

Authors

Xie Zhi-jun
View author publications
You can also search for this author in PubMed Google Scholar
Chen Hong
View author publications
You can also search for this author in PubMed Google Scholar
Cuiping Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and applied Computer Sciences, IBaI, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhi-jun, X., Hong, C., Li, C. (2006). An Efficient Algorithm for Frequent Itemset Mining on Data Streams. In: Perner, P. (eds) Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining. ICDM 2006. Lecture Notes in Computer Science(), vol 4065. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11790853_37

Download citation

DOI: https://doi.org/10.1007/11790853_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36036-0
Online ISBN: 978-3-540-36037-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics