Finding Frequent Items in Data Streams Using ESBF

Wang, ShuYun; Hao, XiuLan; Xu, HeXiang; Hu, YunFa

doi:10.1007/978-3-540-77018-3_26

ShuYun Wang¹,
XiuLan Hao¹,
HeXiang Xu¹ &
…
YunFa Hu¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4819))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1503 Accesses
4 Citations

Abstract

In this paper, we introduce a novel data structure, ESBF (Ex- tensible and Scalable Bloom Filter), and the algorithm FI-ESBF (Finding frequent Items using ESBF) for estimating the frequent items in data streams. FI-ESBF can work with high precision while using much less memory than those of the best reported algorithm does considering the large number of distinct items in the stream. ESBF is the extension of counting Bloom Filter(CBF), By using it, we are allowed to adjust the size of memory used dynamically according to the different data distribution and the number of distinct items in the data streams, therefore the priori knowledge about the data distribution of the streams and the number of distinct elements to be stored is not required.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bloom, B.: Space/time tradeoffs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Article MATH Google Scholar
Fang, M., et al.: Computing iceberg queries efficiently. In VLDB (August 1998)
Google Scholar
Charikar, M., Chen, K., Farach-Colton, M.: Finding Frequent Items in Data Streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)
Chapter Google Scholar
Manku, G., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 346–357 (2002)
Google Scholar
Xu Yu, J., Chong, Z., Lu, H., Zhou, A.: False Positive or False Negative:Mining Frequent Itemsets form High Speed Transactional Data Streams. In: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 204–215 (2004)
Google Scholar
Cormode, G., Muthukrishnan, S.: Whats Hot and Whats Not: Tracking Most Frequent Items Dynamically. In: Proceedings of the 22nd Symposium on Principles of Databse Systems, pp. 296–306 (June 2003)
Google Scholar
Garofalakis, M., Gehrke, J., Rastogi, R.: Querying and mining data streams: you only get one look. In: the tutorial notes of the 28th Int’l Conference on Very Large Databases, Hong Kong, China (August 2002)
Google Scholar
Demaine, E.D., Lopez-Ortiz, A., Munro, J.I.: Frequency Estimation of Internet Packet Streams with Limited Space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002)
Chapter Google Scholar
Estan, C., Varghese, G.: New Directions in Traffic Measurement and Accounting: Focusing on the Elephants, Ignoring the Mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)
Article Google Scholar
Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically Maintaining Frequent Items over A Data Stream. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 287–294. ACM Press, New York (2003)
Chapter Google Scholar
Karp, R., Shenker, S., Papadimitriou, C.: A Simple Algorithm for Finding Frequent Elements in Streams and Bags. ACM Transactions on Database Systems 28(1), 51–55 (2003)
Article Google Scholar
Metwally, A., Agrawal, D., El Abbadi, A.: Efficient Computation of Frequent and Top-k Elements in Data Streams. Technical Report 2005-23, University of California, Santa Barbara (September 2005)
Google Scholar
Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. IEEE/ACM Transactons on networking 8(3) (June 2000)
Google Scholar
Aguilar-Saborit, J., Trancoso, P., Muntes-Mulero, V., Larriba-Pey, J.L.: Dynamic Count Filters. SIGMOD Record 35(1) (March 2006)
Google Scholar
Cohen, S., Matias, Y.: Spectral Bloom Filters. In: SIGMOD 2003, June 912 , San Diego, CA (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing and Information Technology, Fudan University, P.R.C.
ShuYun Wang, XiuLan Hao, HeXiang Xu & YunFa Hu

Authors

ShuYun Wang
View author publications
You can also search for this author in PubMed Google Scholar
XiuLan Hao
View author publications
You can also search for this author in PubMed Google Scholar
HeXiang Xu
View author publications
You can also search for this author in PubMed Google Scholar
YunFa Hu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Takashi Washio Zhi-Hua Zhou Joshua Zhexue Huang Xiaohua Hu Jinyan Li Chao Xie Jieyue He Deqing Zou Kuan-Ching Li Mário M. Freire

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Hao, X., Xu, H., Hu, Y. (2007). Finding Frequent Items in Data Streams Using ESBF. In: Washio, T., et al. Emerging Technologies in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77018-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-540-77018-3_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77016-9
Online ISBN: 978-3-540-77018-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics