Skip to main content
Log in

Time-slide window join over data streams

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

The join is an important operator in processing data streams. To produce outputs continuously over unbounded data streams, sliding windows are generally used to limit the scope of the join at a certain time. In the existing join algorithms, only a simple type of windows have been considered, which are updated whenever a new data item arrives on any input stream. On the other hand, a more common type of windows have not been addressed yet, whose intervals are updated periodically, i.e., slid by a predefined time interval. In this paper, we consider the time-slide windows in joining multiple data streams. The algorithm for the time-slide window join can vary according to (i) how frequently the join is evaluated and (ii) which structure is used for windowing. Regarding this, possible algorithms are discussed, and experimental results that compare their performances are provided in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Refer to the paper (Li et al. 2005) for the exact semantic of window queries.

  2. The term ”basic window” has been more commonly used in literature. From this, we use basic windows to denote disjoint subwindows in a certain window.

  3. In this paper, time-based windows are considered for generality. Tuple-based windows can also be captured by assuming that a single tuple arrives every time unit.

  4. In general, the size of a basic window can be obtained by GCD(W, T), where GCD denotes a greatest common divisor. The change in the window size does not give influence to join results as long as the slide interval T does not change, which will be discussed in the next section.

  5. Golab et al. have discussed the strategies for join evaluation and tuple expiration in Golab and Oszu (2003b). But, in their work, only tuple-slide windows have been considered.

  6. In the proposed method, given W seconds, join is executed W/T e times, while windowing is performed W/T times.

References

  • Abadi, D.J. et al. (2003). Aurora: a new model and architecture for data stream management. VLDB Journal, 12(2), 120–139.

    Article  Google Scholar 

  • Ananthanarayanan, R. et al. (2013). Photon: fault-tolerant and scalable joining of continuous data streams. In: proceedings of the 2013 ACM SIGMOD international conference on Management of data, SIGMOD’13. ACM (pp. 577–588).

  • Arasu, A., Babu, S., Widom, J. (2006). The CQL continuous query language: semantic foundations and query execution. VLDB Journal, 15(2), 121–142.

    Article  Google Scholar 

  • Babcock, B. et al. (2002). Models and issues in data stream systems. In: Proceedings of the 21st ACM Symposium on Principles of Database Systems, PODS’02. ACM (pp. 1–16).

  • Babu, S., Srivastava, U., Widom, J. (2004). Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. ACM Transactions on Database Systems, 29(3), 545–580.

    Article  Google Scholar 

  • Chakraborty, A., & Singh, A. (2013). Parallelizing windowed stream joins in a shared-nothing cluster. In: Proceedings of Cluster 2013. IEEE (pp. 1–5).

  • Cranor, C., Johnson, T., Spatscheck, O., Shkapenyuk, V. (2003). Gigascope: a stream database for network applications. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of data, SIGMOD’03. ACM (pp. 647–651).

  • Das, A., Gehrke, G., Riedewald, M. (2003). Approximate join processing over data streams. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of data, SIGMOD’03. ACM (pp. 40–51).

  • Dash, R., & Fegaras, L. (2012). Delivering QoS in XML data stream processing using load shedding. International Journal of Database Management Systems (IJDMS), 4(3), 49–71.

    Article  Google Scholar 

  • Ding, L., Mehta, N., Rundensteiner, E.A., Heineman, G.T. (2004). Joining punctuated streams. In: Advances in Database Technology. EDBT.

  • Gedik, B., Wu, K.L., Yu, P.S., Liu, L. (2005). Adaptive load shedding for windowed stream joins. In: Proceedings of the 14th ACM international conference on Information and knowledge management, CIKM’05. ACM (pp. 171–178).

  • Gedik, B., Wu, K.L., Yu, P.S., Liu, L. (2007a). A load shedding framework and optimizations for m-Way windowed stream joins. In: Proceedings of the IEEE 23rd International Conference on Data Engineering, ICDE’07. IEEE, pp 536-545. IEEE T KNOWL DATA EN, 19 (10), pp. 1363–1380.

  • Gedik, B., Wu, K.L., Yu, P.S., Liu, L. (2007b). GrubJoin: an adaptive, multi-way, windowed stream join with time correlation-aware CPU load shedding.

  • Ghanem, T.M., Aref, W.G., Elmagarmid, A.K. (2006). Exploiting predicate-window semantics over data streams. Sigmod Record, 35(1), 3–8.

    Article  Google Scholar 

  • Golab, L., & Oszu, M.T. (2003a). Issues in data stream management. Sigmod Record, 32(2), 5–14.

    Article  Google Scholar 

  • Golab, L., & Oszu, M.T. (2003b). Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of the 29th VLDB conference. VLDB (pp. 500–511).

  • Gu, X., Yu, P.S., Wang, H. (2007). Adaptive load diffusion for multiway windowed stream joins. In: Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering, ICDE’07. IEEE (pp. 146–155).

  • Hammad, M.A., Aref, W.G., Elmagarmid, A.K. (2003a). Stream window join: tracking moving objects in sensor-network databases. In: Proceedings of the 15th Scientific and Statistical Database Management, SSDBM’03 (pp. 75–84).

  • Hammad, M.A., Franklin, M.J., Aref, W.G., Elmagarmid, A.K. (2003b). Scheduling for shared window joins over data streams. In: Proceedings of the 29th VLDB conference. VLDB (pp. 297–308).

  • Hong, M. et al. (2007). Massively multi-query join processing in publish/subscribe systems. In: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, SIGMOD’07. ACM (pp. 761–772).

  • Johnson, T., Muthukrishnan, S., Shkapenyuk, V., Spatscheck, O. (2005). A heartbeat mechanism and its application in Gigascope. In: Proceedings of the 31st VLDB conference. VLDB (pp. 1079–1088).

  • Kamel, I., Aghbari, Z.A., Awad, T. (2010). MG-join: detecting phenomena and their correlation in high dimensional data streams. Distribution Parallel Data, 28, 67–92.

    Article  Google Scholar 

  • Kang, J., Naughton, J.F., Viglas, S.D. (2003). Evaluating window joins over unbounded streams. In: Proceedings of the 19th International Conference on Data Engineering, ICDE’03. IEEE (pp. 341–352).

  • Karnagel, T., Schlegel, B., Habich, D., Lehner, W. (2013). Stream join processing on heterogeneous processors. In: BTW Workshops (pp. 17–26).

  • Kim, H.G., Kim, C., Kim, M.H. (2012). Adaptive disorder control in data stream processing. Computing and Informatics, 31, 1001–1018.

    Google Scholar 

  • Kwon, T.H., Kim, H.G., Kim, M.H., Son, J.H. (2009). AMJoin: an advanced join algorithm for multiple data streams using a bit-vector hash table. IEICE Transactions on Information and Systems, E92-D(7), 1429–1434.

    Article  Google Scholar 

  • Kwon, T.H., Lee, K.Y., Kim, M.H. (2011). Load shedding for multi-way stream joins based on arrival order patterns. Jounral of Intelligent Information System, 37, 245–265.

    Article  Google Scholar 

  • Law, Y.N., & Zaniolo, C. (2007). Load shedding for window joins on multiple data streams. In: Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop, ICDEW’07. IEEE, (pp. 674–683).

  • Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A. (2005). Semantics and evaluation techniques for window aggregates in data streams. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, SIGMOD’05. ACM (pp. 311–322).

  • Nagendra, M., & Candan, K.S. (2013a). Layered processing of skyline-window-join (SWJ) queries using iteration-fabric. In: Proceedings of the 29th International Conference on Data Engineering, ICDE’13. IEEE (pp. 985–996).

  • Nagendra, M., & Candan, K.S. (2013b). SkySuite - A framework of skyline-join operators for static and stream environments. In: Proceddings of the VLDB Endowment 6(12). VLDB (pp .1266–1269).

  • Oge, Y., Miyoshi, T., Kawashima, H., Yoshinaga, T. (2011). An implementation of Handshake join on FPGA. In: Proceedings of the Second International Conference on Networking and Computing, ICNC’11. IEEE (pp. 95–104).

  • Ojewole, A., Zhu, Q., Hou, W.C. (2006). Window join approximation over data streams with importance semantics. In: Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM’06. ACM, (pp. 112–121).

  • Palma, W., Akbarinia, R., Pacitti, E., Valduriez, P. (2009). DHTJoin: processing continuous join queries using DHT networks. Distribution Parallel Data, 26(2–3), 291–317.

    Article  Google Scholar 

  • Srivastava, U., & Widom, J. (2004a). Flexible time management in data stream systems. In: Proceedings of the 23rd symposium on Principles of database systems, PODS’04. ACM (pp. 263–274).

  • Srivastava, U., & Widom, J. (2004b). Memory-limited execution of windowed stream joins. In: Proceedings of the 30th VLDB conference. VLDB (pp. 324–335).

  • Teubner, J., & Mueller, R. (2011). How soccer players would do stream joins. In: Proceedings of the 2011 ACM SIGMOD international conference on Management of data, SIGMOD’11. ACM (pp. 625–636).

  • Tucker, P.A., Maier, D., Sheard, T., Fegaras, L. (2003). Exploiting punctuation semantics in continuous data streams. IEEE Transactions on Knowledge and Data Engineering, 15(3), 555–568.

    Article  Google Scholar 

  • Urhan, T., & Franklin, M.J. (2000). XJoin: a reactively-scheduled pipelined join operator, IEEE Data Engineering Bulletin.

  • Valsomatzis, E., & Gounaris, A. (2013). Driver input selection for main-memory multi-way joins. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC’13. ACM (pp. 818–825).

  • Viglas, S.D., Naughton, J.F., Burger, J. (2003). Maximizing the output rate of multi-way join queries over streaming information sources, In: Proceedings of the 29th VLDB conference. VLDB (pp. 285–296).

  • Xie, K., Yang, J., Chen, Y. (2005). On joining and caching stochastic streams. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, SIGMOD’05. ACM (pp. 359–370).

  • Yang, X., Lim, H.B., zsu, T.M., Tan, K.L. (2007). In-network execution of monitoring queries in sensor networks. In: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, SIGMOD’07. ACM (pp. 521–532).

  • Zhang, R., Qi, Z., Lin, D., Wang, W., Wong, R.C.W. (2012). A highly optimized algorithm for continuous intersection join queries over moving objects. VLDB Journal, 21, 561–586.

    Article  Google Scholar 

  • Zhu, X., Gupta, H., Tang, B. (2009). Join of multiple data streams in sensor networks. IEEE Transactions on Knowledge and Data Engineering, 21(12), 1722–1736.

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the MSIP(Ministry of Science, ICT and Future Planning) of Korea under the ITRC support program(NIPA-2013-H0301-13-4009), and the National Research Foundation of Korea grant funded by the Korea government (MEST) (No. 2012R1A2A2A01046694). This paper was also supported by the Sahmyook University Research Fund in 2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoo Hyun Park.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, H.G., Park, Y.H., Cho, Y.H. et al. Time-slide window join over data streams. J Intell Inf Syst 43, 323–347 (2014). https://doi.org/10.1007/s10844-014-0325-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-014-0325-4

Keywords

Navigation