Skip to main content
Log in

Load shedding for multi-way stream joins based on arrival order patterns

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

We address the problem of load shedding for continuous multi-way join queries over multiple data streams. When the arrival rates of tuples from data streams exceed the system capacity, a load shedding algorithm drops some subset of input tuples to avoid system overloads. To decide which tuples to drop among the input tuples, most existing load shedding algorithms determine the priority of each input tuple based on the frequency or some historical statistics of its join attribute value, and then drop tuples with the lowest priority. However, those value-based algorithms cannot determine the priorities of tuples properly in environments where join attribute values are unique and each join attribute value occurs at most once in each data stream. In this paper, we propose a load shedding algorithm specifically designed for such environments. The proposed load shedding algorithm determines the priority of each tuple based on the order of streams in which its join attribute value appears, rather than its join attribute value itself. Consequently, the priorities of tuples can be determined effectively in environments where join attribute values are unique and do not repeat. The experimental results show that the proposed algorithm outperforms the existing algorithms in such environments in terms of effectiveness and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Bai, Y., Wang, H., & Zaniolo, C. (2007). Load shedding in classifying multi-source streaming data: A bayes risk approach. In: Proceedings of the 7th SIAM international conference on data mining (pp. 425–430).

  • Chen, M. S., Park, J. S., & Yu, P. S. (1998). Efficient data mining for path traversal patterns. IEEE Transaction on Knowledge and Data Engineering, 10(2), 209–221.

    Article  Google Scholar 

  • Cranor, C. D., Johnson, T., Spatscheck, O., & Shkapenyuk, V. (2003). Gigascope: A stream database for network applications. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data (pp. 647–651).

  • Das, A., Gehrke, J., & Riedewald, M. (2003). Approximate join processing over data streams. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data (pp. 40–51).

  • Dobra, A., Garofalakis, M. N., Gehrke, J., & Rastogi, R. (2002). Processing complex aggregate queries over data streams. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data (pp. 61–72).

  • Gedik, B., Wu, K. L., Yu, P. S., & Liu, L. (2007). A load shedding framework and optimizations for m-way windowed stream joins. In: Proceedings of the 23rd IEEE international conference on data engineering (pp. 536–545).

  • Gehrke, J., & Madden, S. (2004). Query processing in sensor networks. IEEE Pervasive Computing, 3(1), 46–55.

    Article  Google Scholar 

  • Golab, L., & Ozsu, M. T. (2003). Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of the 29th international conference on very large data bases (pp. 500–511).

  • Hammad, M. A., Aref, W. G., & Elmagarmid, A. K. (2003). Stream window join: Tracking moving objects in sensor-network databases. In: Proceedings of 15th international conference on scientific and statistical database management (pp. 75–84).

  • Kwon, T. H., Kim, H. G., Kim, M. H., & Son, J. H. (2009). Amjoin: An advanced join algorithm for multiple data streams using a bit-vector hash table. IEICE Transaction on Information and Systems, E92-D(7), 1429–1434.

    Article  Google Scholar 

  • Law, Y. N., & Zaniolo, C. (2007). Load shedding for window joins on multiple data streams. In: Proceedings of the 23rd IEEE international conference on data engineering (pp. 674–683).

  • Nanopoulos, A., Katsaros, D., & Manolopoulos, Y. (2003). A data mining algorithm for generalized web prefetching. IEEE Transaction on Knowledge and Data Engineering, 15(5), 1155–1169.

    Article  Google Scholar 

  • Srivastava, U., & Widom, J. (2004). Memory-limited execution of windowed stream joins. In: Proceedings of the 30th international conference on very large data bases (pp. 324–335).

  • Viglas, S., Naughton, J. F., & Burger, J. (2003). Maximizing the output rate of multi-way join queries over streaming information sources. In: Proceedings of the 29th international conference on very large data bases (pp. 285–296).

  • Yu, H., Lim, E. P., & Zhang, J. (2006). On in-network synopsis join processing for sensor networks. In: Proceedings of the 7th international conference on mobile data management (pp. 32–39).

Download references

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2010-0018865).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ki Yong Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kwon, TH., Lee, K.Y. & Kim, M.H. Load shedding for multi-way stream joins based on arrival order patterns. J Intell Inf Syst 37, 245–265 (2011). https://doi.org/10.1007/s10844-010-0138-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-010-0138-z

Keywords

Navigation