Efficiently Discovering Recent Frequent Items in Data Streams

Tantono, Ferry Irawan; Manerikar, Nishad; Palpanas, Themis

doi:10.1007/978-3-540-69497-7_16

Ferry Irawan Tantono¹,
Nishad Manerikar¹ &
Themis Palpanas¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5069))

Included in the following conference series:

International Conference on Scientific and Statistical Database Management

1264 Accesses
10 Citations

Abstract

The problem of frequent item discovery in streaming data has attracted a lot of attention lately. While the above problem has been studied extensively, and several techniques have been proposed for its solution, these approaches treat all the values of the data stream equally. Nevertheless, not all values are of equal importance. In several situations, we are interested more in the new values that have appeared in the stream, rather than in the older ones.

In this paper, we address the problem of finding recent frequent items in a data stream given a small bounded memory, and present novel algorithms to this direction. We propose a basic algorithm that extends the functionality of existing approaches by monitoring item frequencies in recent windows. Subsequently, we present an improved version of the algorithm with significantly improved performance (in terms of accuracy), at no extra memory cost. Finally, we perform an extensive experimental evaluation, and show that the proposed algorithms can efficiently identify the frequent items in ad hoc recent windows of a data stream.

This work was partially supported by the FP7 EU Large-scale Integrating Project OKKAM – Enabling a Web of Entities (contract no. ICT-215032). For more details, visit http://www.okkam.org

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Frequent itemset mining dataset repository, university of helsinki (2008), http://fimi.cs.helsinki.fi/data/
Massive data analysis lab, rutgers university (2008), http://www.cs.rutgers.edu/~muthu/massdal.html
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB, pp. 81–92 (2003)
Google Scholar
Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: A case study. In: Knowledge Discovery and Data Mining, pp. 254–260 (1999)
Google Scholar
Bulut, A., Singh, A.K.: Swat: Hierarchical stream summarization in large networks. In: ICDE, pp. 303–314 (2003)
Google Scholar
Chang, C.-H., Yang, S.-H.: Enhancing swf for incremental association mining by itemset maintenance. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 301–312. Springer, Heidelberg (2003)
Chapter Google Scholar
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)
Chapter Google Scholar
Chen, Y., Dong, G., Han, J., Wah, B.W., Wang, J.: Multi-dimensional regression analysis of time-series data streams. In: VLDB, pp. 323–334 (2002)
Google Scholar
Cheung, D.W.-L., Han, J., Ng, V.T.Y., Wong, C.Y.: Maintenance of discovered association rules in large databases: An incremental updating technique. In: ICDE, pp. 106–114 (1996)
Google Scholar
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Article MATH MathSciNet Google Scholar
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)
Article MathSciNet Google Scholar
Estan, C., Varghese, G.: New directions in traffic measurement and accounting. In: SIGCOMM, pp. 323–336 (2002)
Google Scholar
Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.D.: Computing iceberg queries efficiently. In: VLDB, pp. 299–310 (1998)
Google Scholar
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining frequent patterns in data streams at multiple time granularities. In: NSF Workshop on Next Generation Data Mining (2003)
Google Scholar
Gibbons, P.B., Matias, Y.: Synopsis data structures for massive data sets. In: DIMACS Series in Discrete Mathematics and Theoretical Computer Science (1999)
Google Scholar
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In: VLDB, pp. 79–88 (2001)
Google Scholar
Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: CIKM 2003: Proceedings of the twelfth international conference on Information and knowledge management, pp. 287–294. ACM Press, New York (2003)
Chapter Google Scholar
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28(1), 51–55 (2003)
Article Google Scholar
Kohavi, R., Provost, F.J.: Applications of data mining to electronic commerce. Data Min. Knowl. Discov. 5(1/2), 5–10 (2001)
Article MATH Google Scholar
Lee, C.-H., Lin, C.-R., Chen, M.-S.: Sliding window filtering: an efficient method for incremental mining on a time-variant database. Inf. Syst. 30(3), 227–244 (2005)
Article Google Scholar
Lin, C.-H., Chiu, D.-Y., Wu, Y.-H., Chen, A.L.P.: Mining frequent itemsets from data streams with a time-sensitive sliding window. In: SDM (2005)
Google Scholar
Manerikar, N., Palpanas, T.: Frequent Items in Streaming Data: An Experimental Evaluation of the State-of-the-Art. Technical Report DISI-08-017, University of Trento (March 2008)
Google Scholar
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams (2002)
Google Scholar
Muthukrishnan, S.: Data streams: algorithms and applications. Foundations and Trends in Theoretical Computer Science 1(2) (2005)
Google Scholar
Palpanas, T., Vlachos, M., Keogh, E.J., Gunopulos, D., Truppel, W.: Online amnesic approximation of streaming time series. In: ICDE, pp. 338–349 (2004)
Google Scholar
Whitney, A.T., Shasha, D.: Lots o’ ticks: Real-time high performance time series queries on billions of trades and quotes. In: SIGMOD Conference, p. 617 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Trento,
Ferry Irawan Tantono, Nishad Manerikar & Themis Palpanas

Authors

Ferry Irawan Tantono
View author publications
You can also search for this author in PubMed Google Scholar
Nishad Manerikar
View author publications
You can also search for this author in PubMed Google Scholar
Themis Palpanas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bertram Ludäscher Nikos Mamoulis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tantono, F.I., Manerikar, N., Palpanas, T. (2008). Efficiently Discovering Recent Frequent Items in Data Streams. In: Ludäscher, B., Mamoulis, N. (eds) Scientific and Statistical Database Management. SSDBM 2008. Lecture Notes in Computer Science, vol 5069. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69497-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-69497-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69476-2
Online ISBN: 978-3-540-69497-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics