Skip to main content

Finding Frequent Items in Data Streams Using ESBF

  • Conference paper
Emerging Technologies in Knowledge Discovery and Data Mining (PAKDD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4819))

Included in the following conference series:

Abstract

In this paper, we introduce a novel data structure, ESBF (Ex- tensible and Scalable Bloom Filter), and the algorithm FI-ESBF (Finding frequent Items using ESBF) for estimating the frequent items in data streams. FI-ESBF can work with high precision while using much less memory than those of the best reported algorithm does considering the large number of distinct items in the stream. ESBF is the extension of counting Bloom Filter(CBF), By using it, we are allowed to adjust the size of memory used dynamically according to the different data distribution and the number of distinct items in the data streams, therefore the priori knowledge about the data distribution of the streams and the number of distinct elements to be stored is not required.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bloom, B.: Space/time tradeoffs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  2. Fang, M., et al.: Computing iceberg queries efficiently. In VLDB (August 1998)

    Google Scholar 

  3. Charikar, M., Chen, K., Farach-Colton, M.: Finding Frequent Items in Data Streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Manku, G., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 346–357 (2002)

    Google Scholar 

  5. Xu Yu, J., Chong, Z., Lu, H., Zhou, A.: False Positive or False Negative:Mining Frequent Itemsets form High Speed Transactional Data Streams. In: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 204–215 (2004)

    Google Scholar 

  6. Cormode, G., Muthukrishnan, S.: Whats Hot and Whats Not: Tracking Most Frequent Items Dynamically. In: Proceedings of the 22nd Symposium on Principles of Databse Systems, pp. 296–306 (June 2003)

    Google Scholar 

  7. Garofalakis, M., Gehrke, J., Rastogi, R.: Querying and mining data streams: you only get one look. In: the tutorial notes of the 28th Int’l Conference on Very Large Databases, Hong Kong, China (August 2002)

    Google Scholar 

  8. Demaine, E.D., Lopez-Ortiz, A., Munro, J.I.: Frequency Estimation of Internet Packet Streams with Limited Space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  9. Estan, C., Varghese, G.: New Directions in Traffic Measurement and Accounting: Focusing on the Elephants, Ignoring the Mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)

    Article  Google Scholar 

  10. Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically Maintaining Frequent Items over A Data Stream. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 287–294. ACM Press, New York (2003)

    Chapter  Google Scholar 

  11. Karp, R., Shenker, S., Papadimitriou, C.: A Simple Algorithm for Finding Frequent Elements in Streams and Bags. ACM Transactions on Database Systems 28(1), 51–55 (2003)

    Article  Google Scholar 

  12. Metwally, A., Agrawal, D., El Abbadi, A.: Efficient Computation of Frequent and Top-k Elements in Data Streams. Technical Report 2005-23, University of California, Santa Barbara (September 2005)

    Google Scholar 

  13. Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. IEEE/ACM Transactons on networking 8(3) (June 2000)

    Google Scholar 

  14. Aguilar-Saborit, J., Trancoso, P., Muntes-Mulero, V., Larriba-Pey, J.L.: Dynamic Count Filters. SIGMOD Record 35(1) (March 2006)

    Google Scholar 

  15. Cohen, S., Matias, Y.: Spectral Bloom Filters. In: SIGMOD 2003, June 912 , San Diego, CA (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takashi Washio Zhi-Hua Zhou Joshua Zhexue Huang Xiaohua Hu Jinyan Li Chao Xie Jieyue He Deqing Zou Kuan-Ching Li Mário M. Freire

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, S., Hao, X., Xu, H., Hu, Y. (2007). Finding Frequent Items in Data Streams Using ESBF. In: Washio, T., et al. Emerging Technologies in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77018-3_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77018-3_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77016-9

  • Online ISBN: 978-3-540-77018-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics