Skip to main content

A Geometric Approach to Monitoring Threshold Functions over Distributed Data Streams

  • Chapter
Book cover Ubiquitous Knowledge Discovery

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6202))

Abstract

Monitoring data streams in a distributed system is the focus of much research in recent years. Most of the proposed schemes, however, deal with monitoring simple aggregated values, such as the frequency of appearance of items in the streams. More involved challenges, such as the important task of feature selection (e.g., by monitoring the information gain of various features), still require very high communication overhead using naive, centralized algorithms.

We present a novel geometric approach by which an arbitrary global monitoring task can be split into a set of constraints applied locally on each of the streams. The constraints are used to locally filter out data increments that do not affect the monitoring outcome, thus avoiding unnecessary communication. As a result, our approach enables monitoring of arbitrary threshold functions over distributed data streams in an efficient manner.

We present experimental results on real-world data which demonstrate that our algorithms are highly scalable, and considerably reduce communication load in comparison to centralized algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Online data mining for co-evolving time sequences. In: ICDE 2000: Proceedings of the 16th International Conference on Data Engineering, Washington, DC, USA, p. 13. IEEE Computer Society, Los Alamitos (2000)

    Google Scholar 

  2. Fjording the stream: An architecture for queries over streaming sensor data. In: ICDE 2002: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), Washington, DC, USA, p. 555. IEEE Computer Society, Los Alamitos (2002)

    Google Scholar 

  3. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: STOC 1996: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, pp. 20–29. ACM Press, New York (1996)

    Chapter  Google Scholar 

  4. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS 2002: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–16. ACM Press, New York (2002)

    Google Scholar 

  5. Babu, S., Widom, J.: Continuous queries over data streams. SIGMOD Rec. 30(3), 109–120 (2001)

    Article  Google Scholar 

  6. Bulut, A., Singh, A.K., Vitenberg, R.: Distributed data streams indexing using content-based routing paradigm. In: IPDPS. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  7. Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.B.: Monitoring streams - a new class of data management applications. In: VLDB, pp. 215–226 (2002)

    Google Scholar 

  8. Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.: Scalable Distributed Stream Processing. In: CIDR 2003 - First Biennial Conference on Innovative Data Systems Research, Asilomar, CA (January 2003)

    Google Scholar 

  9. Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., Varma, R.: Query processing, resource management, and approximation in a data stream management system. In: CIDR 2003 - First Biennial Conference on Innovative Data Systems Research, Asilomar, CA, pp. 245–256 (2003)

    Google Scholar 

  10. Cormode, G., Garofalakis, M., Muthukrishnan, S., Rastogi, R.: Holistic aggregates in a networked world: distributed tracking of approximate quantiles. In: SIGMOD 2005: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 25–36 (2005)

    Google Scholar 

  11. Dilman, M., Raz, D.: Efficient reactive monitoring. In: INFOCOM, pp. 1012–1019 (2001)

    Google Scholar 

  12. Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research, 361–397 (2004)

    Google Scholar 

  13. Liu, L., Pu, C., Tang, W.: Continual queries for internet scale event-driven information delivery. IEEE Transactions on Knowledge and Data Engineering 11(4), 610–628 (1999)

    Article  Google Scholar 

  14. Madden, S., Shah, M., Hellerstein, J.M., Raman, V.: Continuously adaptive continuous queries over streams. In: SIGMOD 2002: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM Press, New York (2002)

    Google Scholar 

  15. Manjhi, A., Shkapenyuk, V., Dhamdhere, K., Olston, C.: Finding (recently) frequent items in distributed data streams. In: ICDE 2005: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), Washington, DC, USA, pp. 767–778. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  16. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB, pp. 346–357 (2002)

    Google Scholar 

  17. Terry, D., Goldberg, D., Nichols, D., Oki, B.: Continuous queries over append-only databases. In: SIGMOD 1992: Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data, pp. 321–330. ACM Press, New York (1992)

    Chapter  Google Scholar 

  18. Zhu, Y., Shasha, D.: Statstream: Statistical monitoring of thousands of data streams in real time. In: VLDB, pp. 358–369 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sharfman, I., Schuster, A., Keren, D. (2010). A Geometric Approach to Monitoring Threshold Functions over Distributed Data Streams. In: May, M., Saitta, L. (eds) Ubiquitous Knowledge Discovery. Lecture Notes in Computer Science(), vol 6202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16392-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16392-0_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16391-3

  • Online ISBN: 978-3-642-16392-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics