Abstract
Distributed stream networks continuously track the global score of the data and alert whenever a given threshold is crossed. The global score is computed by applying a scoring function over the aggregated streams. However, the sheer volume and dynamic nature of the streams impose excessive communication overhead.
Most recent approaches eliminate the need for continuous communication, by using local constraints assigned at the individual streams. These constraints guarantee that as long as no constraint is violated, the threshold is not crossed, and therefore no communication is necessary. Regrettably, local constraint violations become more and more frequent as the network grows and, in the presence of such violations, communication is inevitable.
In this paper, we show that in most cases the violations can be resolved efficiently. Although our solution requires only a reduced subset of the network streams, finding the minimum resolving set is NP-hard. Through analysis of the probability for resolution, we suggest methods to select the resolving set so as to minimize the expected communication overhead and the expected latency of the process. Experimental results with both synthetic and real-life data sets demonstrate that our methods yield considerable improvements over existing approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
The European air quality database. http://dataservice.eea.europa.eu/dataservice/
Artikis, A., et al.: Scalable proactive event-driven decision making. IEEE Technol. Soc. Magaz. 33, 35–41 (2014)
Babcock, B., Olston, C.: Distributed top-k monitoring. In: SIGMOD Conference, New York, NY, USA, pp. 28–39. ACM Press (2003)
Bar-Or, A., Keren, D., Schuster, A., Wolff, R.: Hierarchical decision tree induction in distributed genomic databases. IEEE Trans. Knowl. Data Eng. 17(8), 1138–1151 (2005)
Bar-Or, A., Schuster, A., Wolff, R., Keren, D.: Decision tree induction in high dimensional, hierarchically distributed databases. In: Proceedings of the 2005 SIAM International Conference on Data Mining, pp. 466–470 (2005)
Ben-David, D.: Scalable monitoring of distributed streams. Master Thesis, Technion (2013)
Ben-Yehuda, O.A., Ben-Yehuda, M., Schuster, A., Tsafrir, D.: The rise of RaaS: the resource as a service cloud. Commun. ACM 57, 76–84 (2014)
Ben-Yehuda, O.A., Posener, E., Ben-Yehuda, M., Schuster, A., Mualem, A.: Ginseng: market-driven memory allocation. ACM SIGPLAN Not. 49, 41–52 (2014)
Ben-Yehuda, O.A., Schuster, A., Sharov, A., Silberstein, M., Iosup, A.: Expert: pareto-efficient task replication on grids and a cloud. In: Parallel and Distributed Processing Symposium (IPDPS) (2012)
Birk, Y., Keidar, I., Liss, L., Schuster, A.: Efficient dynamic aggregation. In: Dolev, S. (ed.) DISC 2006. LNCS, vol. 4167, pp. 90–104. Springer, Heidelberg (2006). https://doi.org/10.1007/11864219_7
Birk, Y., Keidar, I., Liss, L., Schuster, A., Wolff, R.: Veracity radius: capturing the locality of distributed computations. In: Proceedings of the Twenty-Fifth Annual ACM Symposium on Principles of Distributed Computing, pp. 102–111 (2006)
Birk, Y., Liss, L., Schuster, A., Wolff, R.: A local algorithm for ad hoc majority voting via charge fusion. In: Guerraoui, R. (ed.) DISC 2004. LNCS, vol. 3274, pp. 275–289. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30186-8_20
Boley, M., Kamp, M., Keren, D., Schuster, A., Sharfman, I.: Communication-efficient distributed online prediction using dynamic model synchronizations. In: BD3@VLDB (2013)
Cormode, G.: Continuous distributed monitoring: a short survey. In: Proceedings of the First International Workshop on Algorithms and Models for Distributed Event Processing, AlMoDEP 2011, New York, NY, USA, pp. 1–10. ACM (2011)
Cormode, G., Garofalakis, M.N..: Sketching streams through the net: distributed approximate query tracking. In: VLDB Conference, pp. 13–24 (2005)
Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. ACM Trans. Algorithms 7(2), 21 (2011)
Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Optimal sampling from distributed streams. In: PODS Conference, pp. 77–86 (2010)
Edmonds, J.: Paths, trees, and flowers. Canad. J. Math. 17(3), 449–467 (1965)
Friedman, A., Schuster, A.: Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 493–502 (2010)
Friedman, A., Schuster, A., Wolff, R.: k-anonymous decision tree induction. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 151–162. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_18
Friedman, A., Sharfman, I., Keren, D., Schuster, A.: Privacy-preserving distributed stream monitoring. In: NDSS (2014)
Friedman, A., Wolff, R., Schuster, A.: Providing k-anonymity in data mining. VLDB J. 17(4), 789–804 (2008)
Funaro, L., Ben-Yehuda, O.A., Schuster, A.: Ginseng: market-driven LLC allocation. In: 2016 USENIX Annual Technical Conference (2016)
Gabel, M., Keren, D., Schuster, A.: Communication-efficient outlier detection for scale-out systems. In: BD3@VLDB, pp. 19–24 (2013)
Gabel, M., Keren, D., Schuster, A.: Monitoring least squares models of distributed streams. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015)
Gabel, M., Keren, D., Schuster, A.: Anarchists, unite: practical entropy approximation for distributed streams. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 837–846 (2017)
Gabel, M., Schuster, A., Keren, D.: Communication-efficient distributed variance monitoring and outlier detection for multivariate time series. In: Parallel and Distributed Processing Symposium (IPDPS), pp. 37–47 (2014)
Giatrakos, N., Deligiannakis, I.S.A., Garofalakis, M., Schuster, A.: Prediction-based geometric monitoring over distributed data streams. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 265–276 (2012)
Giatrakos, N., Deligiannakis, A., Garofalakis, M., Sharfman, I., Schuster, A.: Distributed geometric query monitoring using prediction models. ACM Trans. Database Syst. (TODS) 39(2), 16 (2014)
Gilburd, B., Schuster, A., Wolff, R.: k-TTP: a new privacy model for large-scale distributed environments. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 563–568 (2004)
Grumberg, O., Heyman, T., Ifergan, N., Schuster, A.: Achieving speedups in distributed symbolic reachability analysis through asynchronous computation. In: Borrione, D., Paul, W. (eds.) CHARME 2005. LNCS, vol. 3725, pp. 129–145. Springer, Heidelberg (2005). https://doi.org/10.1007/11560548_12
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)
Huang, L., Nguyen, X., Garofalakis, M.N., Jordan, M.I., Joseph, A.D., Taft, N.: In-network PCA and anomaly detection. In: NIPS, pp. 617–624 (2006)
Jain, N., Dahlin, M., Zhang, Y., Kit, D., Mahajan, P., Yalagandula, P.: Star: self-tuning aggregation for scalable monitoring. In: VLDB, pp. 962–973 (2007)
Kamp, M., Boley, M., Keren, D., Schuster, A., Sharfman, I.: Communication-efficient distributed online prediction by dynamic model synchronization. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8724, pp. 623–639. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44848-9_40
Karmon, K., Liss, L., Schuster, A.: GWiQ-P: an efficient decentralized grid-wide quota enforcement protocol. In: High Performance, Distributed Computing, pp. 222–232 (2005)
Keralapura, R., Cormode, G., Ramamirtham, J.: Communication-efficient distributed monitoring of thresholded counts. In: SIGMOD Conference, pp. 289–300 (2006)
Keren, D., Sagy, G., Abboud, A., Ben-David, D., Schuster, A., Sharfman, I., Deligiannakis, A.: Geometric monitoring of heterogeneous streams. IEEE Trans. Knowl. Data Eng. 26, 1890–1903 (2014)
Keren, D., Sagy, G., Abboud, A., Ben-David, D., Sharfman, I., Schuster, A.: Safe-zones for monitoring distributed streams. In: BD3@VLDB (2013)
Keren, D., Sharfman, I., Schuster, A., Livne, A.: Shape sensitive geometric monitoring. IEEE Trans. Knowl. Data Eng. 24, 1520–1535 (2012)
Kiciman, E., Livshits, V.B.: Ajaxscope: a platform for remotely monitoring the client-side behavior of web 2.0 applications. TWEB 4(4), 13 (2010)
Kolchinsky, I., Schuster, A., Keren, D.: Efficient detection of complex event patterns using lazy chain automata. arXiv preprint arXiv:1612.05110 (2016)
Kolchinsky, I., Sharfman, I., Schuster, A.: Lazy evaluation methods for detecting complex events. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, pp. 34–45 (2015)
Krivitski, D., Schuster, A., Wolff, R.: A local facility location algorithm for sensor networks. In: Prasanna, V.K., Iyengar, S.S., Spirakis, P.G., Welsh, M. (eds.) DCOSS 2005. LNCS, vol. 3560, pp. 368–375. Springer, Heidelberg (2005). https://doi.org/10.1007/11502593_28
Lazerson, A., Gabel, M., Keren, D., Schuster, A.: One for all and all for one: simultaneous approximation of multiple functions over distributed streams. In: Proceedings of the 11th ACM International Conference on Distributed and Event-Based Systems, pp. 203–214 (2017)
Lazerson, A., Keren, D., Schuster, A.: Lightweight monitoring of distributed streams. In: KDD, pp. 1685–1694 (2016)
Lazerson, A., Sharfman, I., Keren, D., Schuster, A., Garofalakis, M., Samoladas, V.: Monitoring distributed streams using convex decompositions. In: Proceedings of the VLDB Endowment, vol. 8 (2015)
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5(4), 361–397 (2004)
Madden, S., Franklin, M.J.: Fjording the stream: an architecture for queries over streaming sensor data. In: ICDE, pp. 555–566 (2002)
Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: SIGMOD Conference, pp. 563–574 (2003)
Palatin, N., Leizarowitz, A., Schuster, A., Wolff, R.: Mining for misconfigured machines in grid systems. In: Data Mining Techniques in Grid Computing Environments (2008)
Rabbat, M., Nowak, R.D.: Distributed optimization in sensor networks. In: IPSN, pp. 20–27 (2004)
Rose, T., Stevenson, M., Whitehead, M.: The reuters corpus volume 1 - from yesterday’s news to tomorrow’s language resources. In: Proceedings of the Third International Conference on Language Resources and Evaluation, Las Palmas de Gran Canaria, May 2002
Sagy, G., Keren, D., Sharfman, I., Schuster, A.: Distributed threshold querying of general functions by a difference of monotonic representation. PVLDB 4(2), 46–57 (2010)
Sagy, G., Sharfman, I., Keren, D., Schuster, A.: Top-k vectorial aggregation queries in a distributed environment. J. Parallel Distrib. Comput. 71(2), 302–315 (2011)
Schuster, A., Keren, D., Lazerson, A.: Distributed processing using convex bounding functions. US Patent App. 15/208,721 (2016)
Schuster, A., Keren, D., Sagy, G., Sherfman, I.: Method and system of managing and/or monitoring distributed computing based on geometric constraints. Patent number 8949409, application number 12/817,242 (2015)
Schuster, A., Wolff, R., Gilburd, B.: Privacy-preserving association rule mining in large-scale distributed systems. In: CCGrid Cluster Computing and the Grid (2004)
Sharfman, I., Schuster, A., Keren, D.: Aggregate threshold queries in sensor networks. In: IPDPS, pp. 1–10 (2007)
Sharfman, I., Schuster, A., Keren, D.: A geometric approach to monitoring threshold functions over distributed data streams. ACM Trans. Database Syst. 32(4), 23 (2007)
Sharfman, I., Schuster, A., Keren, D.: Aggregate threshold queries in sensor networks. In: Parallel and Distributed Processing Symposium (IPDPS) (2007)
Sharfman, I., Schuster, A., Keren, D.: Shape sensitive geometric monitoring. In: Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 301–310 (2008)
Sharfman, I., Schuster, A., Keren, D.: A geometric approach to monitoring threshold functions over distributed data streams. In: Ubiquitous Knowledge Discovery, vol. 6202, pp. 163–186 (2010)
Ramamritham, K., Shah, S.: Handling non-linear polynomial queries over dynamic data. In: ICDE Conference (2008)
Tropp, J.A.: User-friendly tail bounds for sums of random matrices. ArXiv e-prints, April 2010
Verner, U., Schuster, A., Silberstein, M.: Processing data streams with hard real-time constraints on heterogeneous systems. In: Proceedings of the International Conference on Supercomputing, pp. 120–129 (2011)
Verner, U., Schuster, A., Silberstein, M., Mendelson, A.: Scheduling processing of real-time data streams on heterogeneous multi-GPU systems. In: Proceedings of the 5th Annual International Systems and Storage Conference (2012)
Wolff, R., Bhaduri, K., Kargupta, H.: A generic local algorithm for mining data streams in large distributed systems. IEEE Trans. Knowl. Data Eng. 21(4), 465–478 (2009)
Wolff, R., Schuster, A.: Association rule mining in peer-to-peer systems. In: ICDM Conference, pp. 363–374 (2003)
Yi, B.-K., Sidiropoulos, N., Johnson, T., Jagadish, H.V.., Faloutsos, C., Biliris, A.: Online data mining for co-evolving time sequences. In: ICDE, pp. 13–22 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ben-David, D., Sagy, G., Abboud, A., Keren, D., Schuster, A. (2018). Violation Resolution in Distributed Stream Networks. In: Ganapathi, G., Subramaniam, A., Graña, M., Balusamy, S., Natarajan, R., Ramanathan, P. (eds) Computational Intelligence, Cyber Security and Computational Models. Models and Techniques for Intelligent Systems and Automation. ICC3 2017. Communications in Computer and Information Science, vol 844. Springer, Singapore. https://doi.org/10.1007/978-981-13-0716-4_13
Download citation
DOI: https://doi.org/10.1007/978-981-13-0716-4_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0715-7
Online ISBN: 978-981-13-0716-4
eBook Packages: Computer ScienceComputer Science (R0)