Approximate Frequent Itemset Discovery from Data Stream

Ciampi, Anna; Fumarola, Fabio; Appice, Annalisa; Malerba, Donato

doi:10.1007/978-3-642-10291-2_16

Approximate Frequent Itemset Discovery from Data Stream

Anna Ciampi²¹,
Fabio Fumarola²¹,
Annalisa Appice²¹ &
…
Donato Malerba²¹

Conference paper

795 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5883))

Abstract

Traditional algorithms for frequent itemset discovery are designed for static data. They cannot be straightforwardly applied to data streams which are continuous, unbounded, usually coming at high speed and often with a data distribution which changes with time. The main challenges of frequent pattern mining in data streams are: avoiding multiple scans of the entire dataset, optimizing memory usage and capturing distribution drift. To face these challenges, we propose a novel algorithm, which is based on a sliding window model in order to deal with efficiency issues and to keep up with distribution change. Each window consists of several slides. The generation of itemsets is local to each slide, while the estimation of their approximate support is based on the window. Efficiency in the generation of the itemsets is ensured by the usage of a synopsis structure, called SE-tree. Experiments prove the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: KDD 2003, pp. 487–492. ACM Press, New York (2003)
Chapter Google Scholar
Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: Maintaining closed frequent itemsets over a stream sliding window. In: Perner, P. (ed.) ICDM 2004. LNCS (LNAI), vol. 3275, pp. 59–66. Springer, Heidelberg (2004)
Google Scholar
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. SIGMOD Rec 34(2), 18–26 (2005)
Article Google Scholar
Ganti, V., Gehrke, J., Ramakrishnan, R.: Mining data streams under block evolution. SIGKDD Explorations 3(2), 1–10 (2002)
Article Google Scholar
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining frequent patterns in data streams at multiple time granularities, technical report, computer science department, indiana university (2002)
Google Scholar
Golab, L., Dehaan, D., Demaine, E.D., Lopez-Ortiz, A., Munro, J.I.: Identifying frequent items in sliding windows over on-line packet streams. In: Proceedings of the Internet Measurement Conference, pp. 173–178. ACM Press, New York (2003)
Chapter Google Scholar
Lin, C., Chiu, D., Wu, Y.: Mining frequent itemsets from data streams with a time-sensitive sliding window. In: SDM 2005 (2005)
Google Scholar
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB 2002, pp. 346–357 (2002)
Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Article Google Scholar
Mozafari, B., Thakkar, H., Zaniolo, C.: Verifying and mining frequent patterns from large windows over data streams. In: DE 2008, pp. 179–188 (2008)
Google Scholar
Ren, J., Li, K.: Find recent frequent items with sliding windows in data streams. In: IIH-MSP 2007, pp. 625–628. IEEE Computer Society Press, Los Alamitos (2007)
Google Scholar
Rymon, R.: An se-tree based characterization of the induction problem. In: ICML 1993, pp. 268–275. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Silvestri, C., Orlando, S.: Approximate mining of frequent patterns on streams. Intell. Data Anal. 11(1), 49–73 (2007)
Google Scholar
Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: VLDB 2004, VLDB Endowment, pp. 204–215 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università degli Studi di Bari, via Orabona, 4, 70126, Bari, Italy
Anna Ciampi, Fabio Fumarola, Annalisa Appice & Donato Malerba

Authors

Anna Ciampi
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Fumarola
View author publications
You can also search for this author in PubMed Google Scholar
Annalisa Appice
View author publications
You can also search for this author in PubMed Google Scholar
Donato Malerba
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento de Science Sociali, Cognitive e Quantitative, University of Modena and Reggio Emilia, Via Allegri 9, 42100, Reggio Emilia, Italia
Roberto Serra
Dipartimento di Ingegneria dell‘Informazione, Università degli Studi di Modena, Via Vignolese 905, 41100, Modena, Italy
Rita Cucchiara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ciampi, A., Fumarola, F., Appice, A., Malerba, D. (2009). Approximate Frequent Itemset Discovery from Data Stream. In: Serra, R., Cucchiara, R. (eds) AI*IA 2009: Emergent Perspectives in Artificial Intelligence. AI*IA 2009. Lecture Notes in Computer Science(), vol 5883. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10291-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-10291-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10290-5
Online ISBN: 978-3-642-10291-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics