Skip to main content

Efficiently Discovering Recent Frequent Items in Data Streams

  • Conference paper
Book cover Scientific and Statistical Database Management (SSDBM 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5069))

Abstract

The problem of frequent item discovery in streaming data has attracted a lot of attention lately. While the above problem has been studied extensively, and several techniques have been proposed for its solution, these approaches treat all the values of the data stream equally. Nevertheless, not all values are of equal importance. In several situations, we are interested more in the new values that have appeared in the stream, rather than in the older ones.

In this paper, we address the problem of finding recent frequent items in a data stream given a small bounded memory, and present novel algorithms to this direction. We propose a basic algorithm that extends the functionality of existing approaches by monitoring item frequencies in recent windows. Subsequently, we present an improved version of the algorithm with significantly improved performance (in terms of accuracy), at no extra memory cost. Finally, we perform an extensive experimental evaluation, and show that the proposed algorithms can efficiently identify the frequent items in ad hoc recent windows of a data stream.

This work was partially supported by the FP7 EU Large-scale Integrating Project OKKAM – Enabling a Web of Entities (contract no. ICT-215032). For more details, visit http://www.okkam.org

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Frequent itemset mining dataset repository, university of helsinki (2008), http://fimi.cs.helsinki.fi/data/

  2. Massive data analysis lab, rutgers university (2008), http://www.cs.rutgers.edu/~muthu/massdal.html

  3. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB, pp. 81–92 (2003)

    Google Scholar 

  4. Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: A case study. In: Knowledge Discovery and Data Mining, pp. 254–260 (1999)

    Google Scholar 

  5. Bulut, A., Singh, A.K.: Swat: Hierarchical stream summarization in large networks. In: ICDE, pp. 303–314 (2003)

    Google Scholar 

  6. Chang, C.-H., Yang, S.-H.: Enhancing swf for incremental association mining by itemset maintenance. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 301–312. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  8. Chen, Y., Dong, G., Han, J., Wah, B.W., Wang, J.: Multi-dimensional regression analysis of time-series data streams. In: VLDB, pp. 323–334 (2002)

    Google Scholar 

  9. Cheung, D.W.-L., Han, J., Ng, V.T.Y., Wong, C.Y.: Maintenance of discovered association rules in large databases: An incremental updating technique. In: ICDE, pp. 106–114 (1996)

    Google Scholar 

  10. Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  11. Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)

    Article  MathSciNet  Google Scholar 

  12. Estan, C., Varghese, G.: New directions in traffic measurement and accounting. In: SIGCOMM, pp. 323–336 (2002)

    Google Scholar 

  13. Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.D.: Computing iceberg queries efficiently. In: VLDB, pp. 299–310 (1998)

    Google Scholar 

  14. Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining frequent patterns in data streams at multiple time granularities. In: NSF Workshop on Next Generation Data Mining (2003)

    Google Scholar 

  15. Gibbons, P.B., Matias, Y.: Synopsis data structures for massive data sets. In: DIMACS Series in Discrete Mathematics and Theoretical Computer Science (1999)

    Google Scholar 

  16. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In: VLDB, pp. 79–88 (2001)

    Google Scholar 

  17. Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: CIKM 2003: Proceedings of the twelfth international conference on Information and knowledge management, pp. 287–294. ACM Press, New York (2003)

    Chapter  Google Scholar 

  18. Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28(1), 51–55 (2003)

    Article  Google Scholar 

  19. Kohavi, R., Provost, F.J.: Applications of data mining to electronic commerce. Data Min. Knowl. Discov. 5(1/2), 5–10 (2001)

    Article  MATH  Google Scholar 

  20. Lee, C.-H., Lin, C.-R., Chen, M.-S.: Sliding window filtering: an efficient method for incremental mining on a time-variant database. Inf. Syst. 30(3), 227–244 (2005)

    Article  Google Scholar 

  21. Lin, C.-H., Chiu, D.-Y., Wu, Y.-H., Chen, A.L.P.: Mining frequent itemsets from data streams with a time-sensitive sliding window. In: SDM (2005)

    Google Scholar 

  22. Manerikar, N., Palpanas, T.: Frequent Items in Streaming Data: An Experimental Evaluation of the State-of-the-Art. Technical Report DISI-08-017, University of Trento (March 2008)

    Google Scholar 

  23. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams (2002)

    Google Scholar 

  24. Muthukrishnan, S.: Data streams: algorithms and applications. Foundations and Trends in Theoretical Computer Science 1(2) (2005)

    Google Scholar 

  25. Palpanas, T., Vlachos, M., Keogh, E.J., Gunopulos, D., Truppel, W.: Online amnesic approximation of streaming time series. In: ICDE, pp. 338–349 (2004)

    Google Scholar 

  26. Whitney, A.T., Shasha, D.: Lots o’ ticks: Real-time high performance time series queries on billions of trades and quotes. In: SIGMOD Conference, p. 617 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bertram Ludäscher Nikos Mamoulis

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tantono, F.I., Manerikar, N., Palpanas, T. (2008). Efficiently Discovering Recent Frequent Items in Data Streams. In: Ludäscher, B., Mamoulis, N. (eds) Scientific and Statistical Database Management. SSDBM 2008. Lecture Notes in Computer Science, vol 5069. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69497-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69497-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69476-2

  • Online ISBN: 978-3-540-69497-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics