Skip to main content

SSCJ: A Semi-Stream Cache Join Using a Front-Stage Cache Module

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8057))

Included in the following conference series:

Abstract

Semi-stream processing has become an emerging area of research in the field of data stream management. One common operation in semi-stream processing is joining a stream with disk-based master data using a join operator. This join operator typically works under limited main memory and this memory is generally not large enough to hold the whole disk-based master data. Recently, a number of semi-stream join algorithms have been proposed in the literature to achieve an optimal performance but still there is room to improve the performance. In this paper we propose a novel Semi-Stream Cache Join (SSCJ) using a front-stage cache module. The algorithm takes advantage of skewed distributions, and we present results for Zipfian distributions of the type that appear in many applications. We analyze the performance of SSCJ with a well known related join algorithm, HYBRIDJOIN (Hybrid Join). We also provide the cost model for our approach and validate it with experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, C.: The Long Tail: Why the Future of Business Is Selling Less of More. Hyperion (2006)

    Google Scholar 

  2. Bornea, M.A., Deligiannakis, A., Kotidis, Y., Vassalos, V.: Semi-streamed index join for near-real time execution of ETL transformations. In: IEEE 27th International Conference on Data Engineering (ICDE 2011), pp. 159–170 (April 2011)

    Google Scholar 

  3. Chakraborty, A., Singh, A.: A partition-based approach to support streaming updates over persistent data in an active datawarehouse. In: IPDPS 2009: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–11. IEEE Computer Society, Washington, DC (2009)

    Chapter  Google Scholar 

  4. Karakasidis, A., Vassiliadis, P., Pitoura, E.: ETL queues for active data warehousing. In: IQIS 2005: Proceedings of the 2nd International Workshop on Information Quality in Information Systems, pp. 28–39. ACM, New York (2005)

    Chapter  Google Scholar 

  5. Knuth, D.E.: The art of computer programming, 2nd edn. Sorting and searching, vol. 3. Addison Wesley Longman Publishing Co., Inc., Redwood City (1998)

    MATH  Google Scholar 

  6. Asif Naeem, M., Dobbie, G., Weber, G.: An event-based near real-time data integration architecture. In: EDOCW 2008: Proceedings of the 2008 12th Enterprise Distributed Object Computing Conference Workshops, pp. 401–404. IEEE Computer Society, Washington, DC (2008)

    Chapter  Google Scholar 

  7. Asif Naeem, M., Dobbie, G., Weber, G.: HYBRIDJOIN for near-real-time data warehousing. International Journal of Data Warehousing and Mining (IJDWM) 7(4), 21–42 (2011)

    Article  Google Scholar 

  8. Asif Naeem, M., Dobbie, G., Weber, G., Alam, S.: R-MESHJOIN for near-real-time data warehousing. In: DOLAP 2010: Proceedings of the ACM 13th International Workshop on Data Warehousing and OLAP, Toronto, Canada. ACM (2010)

    Google Scholar 

  9. Polyzotis, N., Skiadopoulos, S., Vassiliadis, P., Simitsis, A., Frantzell, N.E.: Supporting streaming updates in an active data warehouse. In: ICDE 2007: Proceedings of the 23rd International Conference on Data Engineering, Istanbul, Turkey, pp. 476–485 (2007)

    Google Scholar 

  10. Polyzotis, N., Skiadopoulos, S., Vassiliadis, P., Simitsis, A., Frantzell, N.: Meshing streaming updates with persistent data in an active data warehouse. IEEE Trans. on Knowl. and Data Eng. 20(7), 976–991 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag GmbH Berlin Heidelberg

About this paper

Cite this paper

Naeem, M.A., Weber, G., Dobbie, G., Lutteroth, C. (2013). SSCJ: A Semi-Stream Cache Join Using a Front-Stage Cache Module. In: Bellatreche, L., Mohania, M.K. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2013. Lecture Notes in Computer Science, vol 8057. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40131-2_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40131-2_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40130-5

  • Online ISBN: 978-3-642-40131-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics