Skip to main content

Sliding Window Top-K Monitoring over Distributed Data Streams

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10366))

Abstract

The problem of distributed monitoring has been intensively investigated recently. This paper studies monitoring the top k data objects with the largest aggregate numeric values from distributed data streams within a fixed-size monitoring window W, while minimizing communication cost across the network. We propose a novel algorithm, which reallocates numeric values of data objects among distributed monitoring nodes by assigning revision factors when local constraints are violated, and keeps the local top-k result at distributed nodes in line with the global top-k result. Extensive experiments are conducted on top of Apache Storm to demonstrate the efficiency and scalability of our algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Twitter storm. http://storm.apache.org/

  2. Amagata, D., Hara, T., Nishio, S.: Sliding window top-k dominating query processing over distributed data streams. Distrib. Parallel Databases 34(4), 535–566 (2016). http://dx.doi.org/10.1007/s10619-015-7187-9

    Article  Google Scholar 

  3. Babcock, B., Olston, C.: Distributed top-k monitoring. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, 9–12 June 2003, pp. 28–39 (2003). http://doi.acm.org/10.1145/872757.872764

  4. Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web-accessible databases. In: Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, 26 February–1 March 2002, pp. 369–380 (2002). http://dx.doi.org/10.1109/ICDE.2002.994751

  5. Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: Proceedings of the Twenty-Third Annual ACM Symposium on Principles of Distributed Computing, PODC 2004, St. John’s, Newfoundland, Canada, 25–28 July 2004, pp. 206–215 (2004). http://doi.acm.org/10.1145/1011767.1011798

  6. Cormode, G.: The continuous distributed monitoring model. SIGMOD Rec. 42(1), 5–14 (2013). http://doi.acm.org/10.1145/2481528.2481530

    Article  Google Scholar 

  7. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Santa Barbara, California, USA, 21–23 May 2001 (2001). http://doi.acm.org/10.1145/375551.375567

  8. Giatrakos, N., Deligiannakis, A., Garofalakis, M.N., Sharfman, I., Schuster, A.: Prediction-based geometric monitoring over distributed data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, Scottsdale, AZ, USA, 20–24 May 2012, pp. 265–276 (2012). http://doi.acm.org/10.1145/2213836.2213867

  9. Gupta, R., Ramamritham, K., Mohania, M.K.: Ratio threshold queries over distributed data sources. In: Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, Long Beach, California, USA, 1–6 March 2010, pp. 581–584 (2010). http://dx.doi.org/10.1109/ICDE.2010.5447920

  10. Kashyap, S.R., Ramamirtham, J., Rastogi, R., Shukla, P.: Efficient constraint monitoring using adaptive thresholds. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, 7–12 April 2008, Cancún, México, pp. 526–535 (2008). http://dx.doi.org/10.1109/ICDE.2008.4497461

  11. Keren, D., Sharfman, I., Schuster, A., Livne, A.: Shape sensitive geometric monitoring. IEEE Trans. Knowl. Data Eng. 24(8), 1520–1535 (2012). http://dx.doi.org/10.1109/TKDE.2011.102

    Article  Google Scholar 

  12. Lazerson, A., Keren, D., Schuster, A.: Lightweight monitoring of distributed streams. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 1685–1694 (2016). http://doi.acm.org/10.1145/2939672.2939820

  13. Lazerson, A., Sharfman, I., Keren, D., Schuster, A., Garofalakis, M.N., Samoladas, V.: Monitoring distributed streams using convex decompositions. PVLDB 8(5), 545–556 (2015). http://www.vldb.org/pvldb/vol8/p545-lazerson.pdf

    Google Scholar 

  14. Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, 27–29 June 2006, pp. 635–646 (2006). http://doi.acm.org/10.1145/1142473.1142544

  15. Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005). http://doi.acm.org/10.1145/1061318.1061320

    Article  Google Scholar 

  16. Sharfman, I., Schuster, A., Keren, D.: A geometric approach to monitoring threshold functions over distributed data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, 27–29 June 2006, pp. 301–312 (2006). http://doi.acm.org/10.1145/1142473.1142508

  17. Yang, D., Shastri, A., Rundensteiner, E.A., Ward, M.O.: An optimal strategy for monitoring top-k queries in streaming windows. In: Proceedings of 14th International Conference on Extending Database Technology, EDBT 2011, Uppsala, Sweden, 21–24 March 2011, pp. 57–68 (2011). http://doi.acm.org/10.1145/1951365.1951375

  18. Yi, K., Zhang, Q.: Optimal tracking of distributed heavy hitters and quantiles. In: Proceedings of the Twenty-Eigth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2009, Providence, Rhode Island, USA, 19 June–1 July 2009, pp. 167–174 (2009). http://doi.acm.org/10.1145/1559795.1559820

  19. Zipf, G.K.: Selected studies of the principle of relative frequency in language. Language 9(1), 89–92 (1932)

    Google Scholar 

Download references

Acknowledgment

This work was supported in part by the National Basic Research 973 Program of China under Grant No. 2015CB352502, the National Natural Science Foundation of China under Grant Nos. 61272092 and 61572289, the Natural Science Foundation of Shandong Province of China under Grant Nos. ZR2012FZ004 and ZR2015FM002, the Science and Technology Development Program of Shandong Province of China under Grant No. 2014GGE27178, and the NSERC Discovery Grants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohui Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Lv, Z., Chen, B., Yu, X. (2017). Sliding Window Top-K Monitoring over Distributed Data Streams. In: Chen, L., Jensen, C., Shahabi, C., Yang, X., Lian, X. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10366. Springer, Cham. https://doi.org/10.1007/978-3-319-63579-8_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63579-8_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63578-1

  • Online ISBN: 978-3-319-63579-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics