Abstract
Semi-stream processing, the operation of joining a stream of data with non-stream disk-based master data, is a crucial component of near real-time data warehousing. The requirements for semi-stream joins are fast, accurate processing and the ability to function well with limited memory. Currently, semi-stream algorithms presented in the literature such as MeshJoin, Semi-Stream Index Join and CacheJoin can join only one foreign key in the stream data with one table in the master data. However, it is quite likely that stream data have multiple foreign keys that need to join with multiple tables in the master data. We extend CacheJoin to form three new possibilities for multi-way semi-stream joins, namely Sequential, Semi-concurrent, and Concurrent joins. Initially, the new algorithms can join two foreign keys in the stream data with two tables in the master data. However, these algorithms can be easily generalized to join with any number of tables in the master data. We evaluated the performance of all three algorithms, and our results show that the semi-concurrent architecture performs best under the same scenario.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Naeem, M.A., Jamil, N.: An efficient stream-based join to process end user transactions in real-time data warehousing. J. Digit. Inf. Manag. 12(3), 201–215 (2014)
Naeem, M.A., Weber, G., Dobbie, G., Lutteroth, C.: SSCJ: A semi-stream cache join using a front-stage cache module. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2013. LNCS, vol. 8057, pp. 236–247. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40131-2_20
Naeem, M.A., Dobbie, G., Weber, G.: A lightweight stream-based join with limited resource consumption. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 431–442. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32584-7_35
Bornea, M., Deligiannakis, A., Kotidis, Y., Vassalos, V.: Semi-streamed index join for near-real time execution of etl transformations. In: 27th International Conference on IEEE, pp. 159–170 (2011)
Naeem, M.A., Dobbie, G., Weber, G., Alam, S.: R-MESHJOIN for near-real-time data warehousing. In: Proceedings of the ACM 13th International Workshop on Data Warehousing and OLAP, DOLAP 2010. ACM, Toronto, Canada (2010)
Chakraborty, A., Singh, A.: A partition-based approach to support streaming updates over persistent data in an active datawarehouse. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009, pp. 1–11. IEEE Computer Society, Washington, DC (2009)
Naeem, M.A., Dobbie, G., Weber, G.: HybridJoin for near-real-time data warehousing. Int. J. Data Warehous. Min. 7(4), 24–43 (2011)
Polyzotis, N., Skiadopoulos, S., Vassiliadis, P., Simitsis, A., Frantzell, N.: Meshing streaming updates with persistent data in an active data warehouse. IEEE Trans. Knowl. Data Eng. 20(7), 976–991 (2008)
Polyzotis, N., Skiadopoulos, S., Vassiliadis, P.: Supporting streaming updates in an active data warehouse. In: ICDE 2007 Proceedings of the 23rd International Conference on Data Engineering, Istanbul, pp. 476–485 (2007)
Naeem, M., Dobbie, G., Lutteroth, C., Weber, G.: Skewed distributions in semi-stream joins: How much can caching help? Inf. Syst. 64, 63–74 (2017)
Oracle. Advantages and disadvantages of a multithreaded/multicontexted application. https://docs.oracle.com/cd/E13203_01/tuxedo/tux71/html/pgthr5.htm
Shirazi, J.: Java Performance Tuning. O’Reilly Media Inc., California (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Naeem, M.A., Nguyen, K.T., Weber, G. (2017). A Multi-way Semi-stream Join for a Near-Real-Time Data Warehouse. In: Huang, Z., Xiao, X., Cao, X. (eds) Databases Theory and Applications. ADC 2017. Lecture Notes in Computer Science(), vol 10538. Springer, Cham. https://doi.org/10.1007/978-3-319-68155-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-68155-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68154-2
Online ISBN: 978-3-319-68155-9
eBook Packages: Computer ScienceComputer Science (R0)