Skip to main content

A Multi-way Semi-stream Join for a Near-Real-Time Data Warehouse

  • Conference paper
  • First Online:
Databases Theory and Applications (ADC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10538))

Included in the following conference series:

Abstract

Semi-stream processing, the operation of joining a stream of data with non-stream disk-based master data, is a crucial component of near real-time data warehousing. The requirements for semi-stream joins are fast, accurate processing and the ability to function well with limited memory. Currently, semi-stream algorithms presented in the literature such as MeshJoin, Semi-Stream Index Join and CacheJoin can join only one foreign key in the stream data with one table in the master data. However, it is quite likely that stream data have multiple foreign keys that need to join with multiple tables in the master data. We extend CacheJoin to form three new possibilities for multi-way semi-stream joins, namely Sequential, Semi-concurrent, and Concurrent joins. Initially, the new algorithms can join two foreign keys in the stream data with two tables in the master data. However, these algorithms can be easily generalized to join with any number of tables in the master data. We evaluated the performance of all three algorithms, and our results show that the semi-concurrent architecture performs best under the same scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Naeem, M.A., Jamil, N.: An efficient stream-based join to process end user transactions in real-time data warehousing. J. Digit. Inf. Manag. 12(3), 201–215 (2014)

    Google Scholar 

  2. Naeem, M.A., Weber, G., Dobbie, G., Lutteroth, C.: SSCJ: A semi-stream cache join using a front-stage cache module. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2013. LNCS, vol. 8057, pp. 236–247. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40131-2_20

    Chapter  Google Scholar 

  3. Naeem, M.A., Dobbie, G., Weber, G.: A lightweight stream-based join with limited resource consumption. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 431–442. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32584-7_35

    Chapter  Google Scholar 

  4. Bornea, M., Deligiannakis, A., Kotidis, Y., Vassalos, V.: Semi-streamed index join for near-real time execution of etl transformations. In: 27th International Conference on IEEE, pp. 159–170 (2011)

    Google Scholar 

  5. Naeem, M.A., Dobbie, G., Weber, G., Alam, S.: R-MESHJOIN for near-real-time data warehousing. In: Proceedings of the ACM 13th International Workshop on Data Warehousing and OLAP, DOLAP 2010. ACM, Toronto, Canada (2010)

    Google Scholar 

  6. Chakraborty, A., Singh, A.: A partition-based approach to support streaming updates over persistent data in an active datawarehouse. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009, pp. 1–11. IEEE Computer Society, Washington, DC (2009)

    Google Scholar 

  7. Naeem, M.A., Dobbie, G., Weber, G.: HybridJoin for near-real-time data warehousing. Int. J. Data Warehous. Min. 7(4), 24–43 (2011)

    Article  Google Scholar 

  8. Polyzotis, N., Skiadopoulos, S., Vassiliadis, P., Simitsis, A., Frantzell, N.: Meshing streaming updates with persistent data in an active data warehouse. IEEE Trans. Knowl. Data Eng. 20(7), 976–991 (2008)

    Article  Google Scholar 

  9. Polyzotis, N., Skiadopoulos, S., Vassiliadis, P.: Supporting streaming updates in an active data warehouse. In: ICDE 2007 Proceedings of the 23rd International Conference on Data Engineering, Istanbul, pp. 476–485 (2007)

    Google Scholar 

  10. Naeem, M., Dobbie, G., Lutteroth, C., Weber, G.: Skewed distributions in semi-stream joins: How much can caching help? Inf. Syst. 64, 63–74 (2017)

    Article  Google Scholar 

  11. Oracle. Advantages and disadvantages of a multithreaded/multicontexted application. https://docs.oracle.com/cd/E13203_01/tuxedo/tux71/html/pgthr5.htm

  12. Shirazi, J.: Java Performance Tuning. O’Reilly Media Inc., California (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kim Tung Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Naeem, M.A., Nguyen, K.T., Weber, G. (2017). A Multi-way Semi-stream Join for a Near-Real-Time Data Warehouse. In: Huang, Z., Xiao, X., Cao, X. (eds) Databases Theory and Applications. ADC 2017. Lecture Notes in Computer Science(), vol 10538. Springer, Cham. https://doi.org/10.1007/978-3-319-68155-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68155-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68154-2

  • Online ISBN: 978-3-319-68155-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics