FIDS: Monitoring Frequent Items over Distributed Data Streams

Fuller, Robert; Kantardzic, Mehmed

doi:10.1007/978-3-540-73499-4_35

Robert Fuller¹ &
Mehmed Kantardzic¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4571))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

3647 Accesses
4 Citations

Abstract

Many applications require the discovery of items which have occur frequently within multiple distributed data streams. Past solutions for this problem either require a high degree of error tolerance or can only provide results periodically. In this paper we introduce a new algorithm designed for continuously tracking frequent items over distributed data streams providing either exact or approximate answers. We tested the efficiency of our method using two real-world data sets. The results indicated significant reduction in communication cost when compared to naïve approaches and an existing efficient algorithm called Top-K Monitoring. Since our method does not rely upon approximations to reduce communication overhead and is explicitly designed for tracking frequent items, our method also shows increased quality in its tracking results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arasu, A., Manku, G.: Approximate Counts and Quantiles over Sliding Windows. In: PDDS. Proc. of the 23rd ACM Symposium on Principles of Database System, pp. 286–296. ACM Press, New York (2004)
Google Scholar
Arlitt, M., Jin, T.: 1998 World Cup Web Site Access Logs (1998), http://www.acm.org/sigcomm/ITA/
Babcock, B., Olston, C.: Distributed Top-k Monitoring. In: Proc. of ACM SIGMOD Intl. Conf. on Management of Data, pp. 28–39. ACM Press, New York (2003)
Google Scholar
Cormode, G., Garofalakis, M.: Sketching Streams Through the Net: Distributed Approximate Query Tracking. In: Proc. of 31st Intl. Conf. on Very Large Data Bases, pp. 13–24 (2005)
Google Scholar
Cormode, G., Garofalakis, M.: Efficient Strategies for Continuous Distributed Tracking Tasks. IEE Data Engineering Bulletin 28, 33–39 (2005)
Google Scholar
Cormode, G., Muthukrishnan, S.: Whats Hot and Whats Not: Tracking Most Frequent Items Dynamically. In: PODS. Proc. of the 22nd ACM Symposium on Principles of Database Systems, pp. 296–306. ACM Press, New York (2003)
Google Scholar
Demaine, E., Lopez-Ortiz, A., Munro, J.: Frequency estimation of internet packet streams with limited space. In: Proc. of the 10th Annual European Symposium on Algorithms, pp. 348–360 (2002)
Google Scholar
Golab, L., DeHann, D., Demaine, E., Lopez-Ortiz, A., Munro, J.: Identifying Frequent Items in Sliding Windows over On-Line Packet Streams. In: IMC. Proc. of ACM Internet Measurements Conference, pp. 173–178. ACM Press, New York (2003)
Chapter Google Scholar
Kim, H., Karp, B.: Autograph: Toward Automated Distributed Worm Signature Detection. In: Proc. of the 13th USENIX Security Symposium, pp. 271–286 (2004)
Google Scholar
Lee, L.K., Ting, H.F.: A Simpler More Efficient Deterministic Scheme for Finding Frequent Items over Sliding Windows. In: PODS. Proc. of the 25th ACM Symposium on Principles of Database Systems, pp. 290–297. ACM Press, New York (2006)
Google Scholar
Manjhi, A., Shkapenyuk, V., Dhamdhere, K., Olston, C.: Finding (Recently) Frequent Items in Distributed Data Streams. In: ICDE. Proc. of Intl. Conf. on Data Engineering, pp. 767–778 (2005)
Google Scholar
Manku, G., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proceedings of 28th Intl. Conf. on Very Large Data Bases, pp. 364–357 (2002)
Google Scholar
Metwally, A., Agrawal, D., Abbadi, A.: Computation of Frequent and Top-k Elements in Data Streams. In: Proceedings of the 10th ICDT. Intl. Conf. on Database Theory, pp. 398–412 (2005)
Google Scholar
Paxson, V., Floyd, S.: Wide-Area Traffic: The Failure of Poisson Modeling. IEEE/ACM Trasactions on Networking 226–244 (1995)
Google Scholar
van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)
Google Scholar
Stanojevic, R.: Scalable Heavy-Hitter Identification http://www.hamilton.ie/person/rade/ScalableHH.pdf
Zhu, Y., Shasha, D.: StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In: Proc. of the 28th Intl. Conf. on Very Large Databases, pp. 358–369 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering and Computer Science Department, University of Louisville, Louisville, KY 40292,
Robert Fuller & Mehmed Kantardzic

Authors

Robert Fuller
View author publications
You can also search for this author in PubMed Google Scholar
Mehmed Kantardzic
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fuller, R., Kantardzic, M. (2007). FIDS: Monitoring Frequent Items over Distributed Data Streams. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_35

Download citation

DOI: https://doi.org/10.1007/978-3-540-73499-4_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics