Skip to main content

Abstract

Distributed stream networks continuously track the global score of the data and alert whenever a given threshold is crossed. The global score is computed by applying a scoring function over the aggregated streams. However, the sheer volume and dynamic nature of the streams impose excessive communication overhead.

Most recent approaches eliminate the need for continuous communication, by using local constraints assigned at the individual streams. These constraints guarantee that as long as no constraint is violated, the threshold is not crossed, and therefore no communication is necessary. Regrettably, local constraint violations become more and more frequent as the network grows and, in the presence of such violations, communication is inevitable.

In this paper, we show that in most cases the violations can be resolved efficiently. Although our solution requires only a reduced subset of the network streams, finding the minimum resolving set is NP-hard. Through analysis of the probability for resolution, we suggest methods to select the resolving set so as to minimize the expected communication overhead and the expected latency of the process. Experimental results with both synthetic and real-life data sets demonstrate that our methods yield considerable improvements over existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. The European air quality database. http://dataservice.eea.europa.eu/dataservice/

  2. Artikis, A., et al.: Scalable proactive event-driven decision making. IEEE Technol. Soc. Magaz. 33, 35–41 (2014)

    Google Scholar 

  3. Babcock, B., Olston, C.: Distributed top-k monitoring. In: SIGMOD Conference, New York, NY, USA, pp. 28–39. ACM Press (2003)

    Google Scholar 

  4. Bar-Or, A., Keren, D., Schuster, A., Wolff, R.: Hierarchical decision tree induction in distributed genomic databases. IEEE Trans. Knowl. Data Eng. 17(8), 1138–1151 (2005)

    Google Scholar 

  5. Bar-Or, A., Schuster, A., Wolff, R., Keren, D.: Decision tree induction in high dimensional, hierarchically distributed databases. In: Proceedings of the 2005 SIAM International Conference on Data Mining, pp. 466–470 (2005)

    Google Scholar 

  6. Ben-David, D.: Scalable monitoring of distributed streams. Master Thesis, Technion (2013)

    Google Scholar 

  7. Ben-Yehuda, O.A., Ben-Yehuda, M., Schuster, A., Tsafrir, D.: The rise of RaaS: the resource as a service cloud. Commun. ACM 57, 76–84 (2014)

    Google Scholar 

  8. Ben-Yehuda, O.A., Posener, E., Ben-Yehuda, M., Schuster, A., Mualem, A.: Ginseng: market-driven memory allocation. ACM SIGPLAN Not. 49, 41–52 (2014)

    Google Scholar 

  9. Ben-Yehuda, O.A., Schuster, A., Sharov, A., Silberstein, M., Iosup, A.: Expert: pareto-efficient task replication on grids and a cloud. In: Parallel and Distributed Processing Symposium (IPDPS) (2012)

    Google Scholar 

  10. Birk, Y., Keidar, I., Liss, L., Schuster, A.: Efficient dynamic aggregation. In: Dolev, S. (ed.) DISC 2006. LNCS, vol. 4167, pp. 90–104. Springer, Heidelberg (2006). https://doi.org/10.1007/11864219_7

    Chapter  MATH  Google Scholar 

  11. Birk, Y., Keidar, I., Liss, L., Schuster, A., Wolff, R.: Veracity radius: capturing the locality of distributed computations. In: Proceedings of the Twenty-Fifth Annual ACM Symposium on Principles of Distributed Computing, pp. 102–111 (2006)

    Google Scholar 

  12. Birk, Y., Liss, L., Schuster, A., Wolff, R.: A local algorithm for ad hoc majority voting via charge fusion. In: Guerraoui, R. (ed.) DISC 2004. LNCS, vol. 3274, pp. 275–289. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30186-8_20

    Chapter  Google Scholar 

  13. Boley, M., Kamp, M., Keren, D., Schuster, A., Sharfman, I.: Communication-efficient distributed online prediction using dynamic model synchronizations. In: BD3@VLDB (2013)

    Google Scholar 

  14. Cormode, G.: Continuous distributed monitoring: a short survey. In: Proceedings of the First International Workshop on Algorithms and Models for Distributed Event Processing, AlMoDEP 2011, New York, NY, USA, pp. 1–10. ACM (2011)

    Google Scholar 

  15. Cormode, G., Garofalakis, M.N..: Sketching streams through the net: distributed approximate query tracking. In: VLDB Conference, pp. 13–24 (2005)

    Google Scholar 

  16. Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. ACM Trans. Algorithms 7(2), 21 (2011)

    Google Scholar 

  17. Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Optimal sampling from distributed streams. In: PODS Conference, pp. 77–86 (2010)

    Google Scholar 

  18. Edmonds, J.: Paths, trees, and flowers. Canad. J. Math. 17(3), 449–467 (1965)

    Article  MathSciNet  Google Scholar 

  19. Friedman, A., Schuster, A.: Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 493–502 (2010)

    Google Scholar 

  20. Friedman, A., Schuster, A., Wolff, R.: k-anonymous decision tree induction. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 151–162. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_18

    Chapter  Google Scholar 

  21. Friedman, A., Sharfman, I., Keren, D., Schuster, A.: Privacy-preserving distributed stream monitoring. In: NDSS (2014)

    Google Scholar 

  22. Friedman, A., Wolff, R., Schuster, A.: Providing k-anonymity in data mining. VLDB J. 17(4), 789–804 (2008)

    Article  Google Scholar 

  23. Funaro, L., Ben-Yehuda, O.A., Schuster, A.: Ginseng: market-driven LLC allocation. In: 2016 USENIX Annual Technical Conference (2016)

    Google Scholar 

  24. Gabel, M., Keren, D., Schuster, A.: Communication-efficient outlier detection for scale-out systems. In: BD3@VLDB, pp. 19–24 (2013)

    Google Scholar 

  25. Gabel, M., Keren, D., Schuster, A.: Monitoring least squares models of distributed streams. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015)

    Google Scholar 

  26. Gabel, M., Keren, D., Schuster, A.: Anarchists, unite: practical entropy approximation for distributed streams. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 837–846 (2017)

    Google Scholar 

  27. Gabel, M., Schuster, A., Keren, D.: Communication-efficient distributed variance monitoring and outlier detection for multivariate time series. In: Parallel and Distributed Processing Symposium (IPDPS), pp. 37–47 (2014)

    Google Scholar 

  28. Giatrakos, N., Deligiannakis, I.S.A., Garofalakis, M., Schuster, A.: Prediction-based geometric monitoring over distributed data streams. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 265–276 (2012)

    Google Scholar 

  29. Giatrakos, N., Deligiannakis, A., Garofalakis, M., Sharfman, I., Schuster, A.: Distributed geometric query monitoring using prediction models. ACM Trans. Database Syst. (TODS) 39(2), 16 (2014)

    Article  MathSciNet  Google Scholar 

  30. Gilburd, B., Schuster, A., Wolff, R.: k-TTP: a new privacy model for large-scale distributed environments. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 563–568 (2004)

    Google Scholar 

  31. Grumberg, O., Heyman, T., Ifergan, N., Schuster, A.: Achieving speedups in distributed symbolic reachability analysis through asynchronous computation. In: Borrione, D., Paul, W. (eds.) CHARME 2005. LNCS, vol. 3725, pp. 129–145. Springer, Heidelberg (2005). https://doi.org/10.1007/11560548_12

    Chapter  Google Scholar 

  32. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)

    Google Scholar 

  33. Huang, L., Nguyen, X., Garofalakis, M.N., Jordan, M.I., Joseph, A.D., Taft, N.: In-network PCA and anomaly detection. In: NIPS, pp. 617–624 (2006)

    Google Scholar 

  34. Jain, N., Dahlin, M., Zhang, Y., Kit, D., Mahajan, P., Yalagandula, P.: Star: self-tuning aggregation for scalable monitoring. In: VLDB, pp. 962–973 (2007)

    Google Scholar 

  35. Kamp, M., Boley, M., Keren, D., Schuster, A., Sharfman, I.: Communication-efficient distributed online prediction by dynamic model synchronization. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8724, pp. 623–639. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44848-9_40

    Chapter  Google Scholar 

  36. Karmon, K., Liss, L., Schuster, A.: GWiQ-P: an efficient decentralized grid-wide quota enforcement protocol. In: High Performance, Distributed Computing, pp. 222–232 (2005)

    Google Scholar 

  37. Keralapura, R., Cormode, G., Ramamirtham, J.: Communication-efficient distributed monitoring of thresholded counts. In: SIGMOD Conference, pp. 289–300 (2006)

    Google Scholar 

  38. Keren, D., Sagy, G., Abboud, A., Ben-David, D., Schuster, A., Sharfman, I., Deligiannakis, A.: Geometric monitoring of heterogeneous streams. IEEE Trans. Knowl. Data Eng. 26, 1890–1903 (2014)

    Google Scholar 

  39. Keren, D., Sagy, G., Abboud, A., Ben-David, D., Sharfman, I., Schuster, A.: Safe-zones for monitoring distributed streams. In: BD3@VLDB (2013)

    Google Scholar 

  40. Keren, D., Sharfman, I., Schuster, A., Livne, A.: Shape sensitive geometric monitoring. IEEE Trans. Knowl. Data Eng. 24, 1520–1535 (2012)

    Article  Google Scholar 

  41. Kiciman, E., Livshits, V.B.: Ajaxscope: a platform for remotely monitoring the client-side behavior of web 2.0 applications. TWEB 4(4), 13 (2010)

    Google Scholar 

  42. Kolchinsky, I., Schuster, A., Keren, D.: Efficient detection of complex event patterns using lazy chain automata. arXiv preprint arXiv:1612.05110 (2016)

  43. Kolchinsky, I., Sharfman, I., Schuster, A.: Lazy evaluation methods for detecting complex events. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, pp. 34–45 (2015)

    Google Scholar 

  44. Krivitski, D., Schuster, A., Wolff, R.: A local facility location algorithm for sensor networks. In: Prasanna, V.K., Iyengar, S.S., Spirakis, P.G., Welsh, M. (eds.) DCOSS 2005. LNCS, vol. 3560, pp. 368–375. Springer, Heidelberg (2005). https://doi.org/10.1007/11502593_28

    Chapter  Google Scholar 

  45. Lazerson, A., Gabel, M., Keren, D., Schuster, A.: One for all and all for one: simultaneous approximation of multiple functions over distributed streams. In: Proceedings of the 11th ACM International Conference on Distributed and Event-Based Systems, pp. 203–214 (2017)

    Google Scholar 

  46. Lazerson, A., Keren, D., Schuster, A.: Lightweight monitoring of distributed streams. In: KDD, pp. 1685–1694 (2016)

    Google Scholar 

  47. Lazerson, A., Sharfman, I., Keren, D., Schuster, A., Garofalakis, M., Samoladas, V.: Monitoring distributed streams using convex decompositions. In: Proceedings of the VLDB Endowment, vol. 8 (2015)

    Google Scholar 

  48. Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5(4), 361–397 (2004)

    Google Scholar 

  49. Madden, S., Franklin, M.J.: Fjording the stream: an architecture for queries over streaming sensor data. In: ICDE, pp. 555–566 (2002)

    Google Scholar 

  50. Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: SIGMOD Conference, pp. 563–574 (2003)

    Google Scholar 

  51. Palatin, N., Leizarowitz, A., Schuster, A., Wolff, R.: Mining for misconfigured machines in grid systems. In: Data Mining Techniques in Grid Computing Environments (2008)

    Google Scholar 

  52. Rabbat, M., Nowak, R.D.: Distributed optimization in sensor networks. In: IPSN, pp. 20–27 (2004)

    Google Scholar 

  53. Rose, T., Stevenson, M., Whitehead, M.: The reuters corpus volume 1 - from yesterday’s news to tomorrow’s language resources. In: Proceedings of the Third International Conference on Language Resources and Evaluation, Las Palmas de Gran Canaria, May 2002

    Google Scholar 

  54. Sagy, G., Keren, D., Sharfman, I., Schuster, A.: Distributed threshold querying of general functions by a difference of monotonic representation. PVLDB 4(2), 46–57 (2010)

    Google Scholar 

  55. Sagy, G., Sharfman, I., Keren, D., Schuster, A.: Top-k vectorial aggregation queries in a distributed environment. J. Parallel Distrib. Comput. 71(2), 302–315 (2011)

    Article  Google Scholar 

  56. Schuster, A., Keren, D., Lazerson, A.: Distributed processing using convex bounding functions. US Patent App. 15/208,721 (2016)

    Google Scholar 

  57. Schuster, A., Keren, D., Sagy, G., Sherfman, I.: Method and system of managing and/or monitoring distributed computing based on geometric constraints. Patent number 8949409, application number 12/817,242 (2015)

    Google Scholar 

  58. Schuster, A., Wolff, R., Gilburd, B.: Privacy-preserving association rule mining in large-scale distributed systems. In: CCGrid Cluster Computing and the Grid (2004)

    Google Scholar 

  59. Sharfman, I., Schuster, A., Keren, D.: Aggregate threshold queries in sensor networks. In: IPDPS, pp. 1–10 (2007)

    Google Scholar 

  60. Sharfman, I., Schuster, A., Keren, D.: A geometric approach to monitoring threshold functions over distributed data streams. ACM Trans. Database Syst. 32(4), 23 (2007)

    Google Scholar 

  61. Sharfman, I., Schuster, A., Keren, D.: Aggregate threshold queries in sensor networks. In: Parallel and Distributed Processing Symposium (IPDPS) (2007)

    Google Scholar 

  62. Sharfman, I., Schuster, A., Keren, D.: Shape sensitive geometric monitoring. In: Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 301–310 (2008)

    Google Scholar 

  63. Sharfman, I., Schuster, A., Keren, D.: A geometric approach to monitoring threshold functions over distributed data streams. In: Ubiquitous Knowledge Discovery, vol. 6202, pp. 163–186 (2010)

    Google Scholar 

  64. Ramamritham, K., Shah, S.: Handling non-linear polynomial queries over dynamic data. In: ICDE Conference (2008)

    Google Scholar 

  65. Tropp, J.A.: User-friendly tail bounds for sums of random matrices. ArXiv e-prints, April 2010

    Google Scholar 

  66. Verner, U., Schuster, A., Silberstein, M.: Processing data streams with hard real-time constraints on heterogeneous systems. In: Proceedings of the International Conference on Supercomputing, pp. 120–129 (2011)

    Google Scholar 

  67. Verner, U., Schuster, A., Silberstein, M., Mendelson, A.: Scheduling processing of real-time data streams on heterogeneous multi-GPU systems. In: Proceedings of the 5th Annual International Systems and Storage Conference (2012)

    Google Scholar 

  68. Wolff, R., Bhaduri, K., Kargupta, H.: A generic local algorithm for mining data streams in large distributed systems. IEEE Trans. Knowl. Data Eng. 21(4), 465–478 (2009)

    Article  Google Scholar 

  69. Wolff, R., Schuster, A.: Association rule mining in peer-to-peer systems. In: ICDM Conference, pp. 363–374 (2003)

    Google Scholar 

  70. Yi, B.-K., Sidiropoulos, N., Johnson, T., Jagadish, H.V.., Faloutsos, C., Biliris, A.: Online data mining for co-evolving time sequences. In: ICDE, pp. 13–22 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Assaf Schuster .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ben-David, D., Sagy, G., Abboud, A., Keren, D., Schuster, A. (2018). Violation Resolution in Distributed Stream Networks. In: Ganapathi, G., Subramaniam, A., Graña, M., Balusamy, S., Natarajan, R., Ramanathan, P. (eds) Computational Intelligence, Cyber Security and Computational Models. Models and Techniques for Intelligent Systems and Automation. ICC3 2017. Communications in Computer and Information Science, vol 844. Springer, Singapore. https://doi.org/10.1007/978-981-13-0716-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-0716-4_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-0715-7

  • Online ISBN: 978-981-13-0716-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics