Abstract
Extract-Transform-Load (ETL) describes the process of loading data from a source to a destination. The source and the destination can be separated physically and transformations may take place in between. Data preparation happens regularly. To minimize interference with other business processes and to guarantee a high data availability these processes are often run during night times. Therefore the demand for shorter processing times of ETL-processes is increasing steadily. Besides data availability and actuality another reason is the transition to real- or near-time analysis of data and the growing data volume. There are several approaches for the optimization of ETL-processes which will be highlighted in detail in this article. A closer look will be taken on the advantages and disadvantages of the presented approaches. Concluding each approach will be set into competition and a recommendation depending on the use case is given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Castellanos, M.G., et al.: Quality-driven ETL design optimization, U.S. Patent No 8 (2014)
Gantz, J., Reinsel, D.: The 2011 Digital Universe Study: Extracting Value from Chaos, IDC IView (2011)
Halasipuram, R., Deshpande, P.M., Padmanabhan, S.: Determining essential statistics for cost based optimization of an ETL workflow. In: EDBT, pp. 307–318 (2014)
Karagiannis, A., Vassiliadis, P., Simitsis, A.: Scheduling strategies for efficient ETL execution. Inf. Syst. 38(6), 927–945 (2013)
Kumar, N., Kumar, P.S.: An efficient heuristic for logical optimization of ETL workflows. In: International Workshop on Business Intelligence for the Real-Time Enterprise, pp. 68–83. Springer, Heidelberg (2010)
Mehra, K.K., et al.: Extract, transform and load (ETL) system and method, U.S. Patent No. 9 (2017)
Liu, X., Iftikhar, N.: An ETL optimization framework using partitioning and parallelization. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp. 1015–1022 (2015)
Liu, X., Iftikhar, N.: Optimizing ETL dataflow using shared caching and parallelization methods, arXiv preprint arXiv:1409.1639 (2014)
Mayo, C., et al.: Taming big data: implementation of a clinical use-case driven architecture. Int. J. Radiat. Oncol. Biol. Phys. 96, E417–E418 (2016)
Orenga-Roglá, S., Chalmeta, R.: Social customer relationship management: taking advantage of Web 2.0 and Big Data technologies, SpringerPlus (2016)
Simitsis, A., et al.: Benchmarking ETL workflows. In: Technology Conference on Performance Evaluation and Benchmarking, pp. 199–220, Springer, Heidelberg (2009)
Simitsis, A., et al.: Optimizing ETL workflows for fault-tolerance. In: IEEE 26th International Conference on Data Engineering, pp. 385–396 (2010)
Simitsis, A., et al.: QoX-driven ETL design: reducing the cost of ETL consulting engagements. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 953–960 (2009)
Simitsis, A., Vassiliadis, P., Sellis, T.: Optimizing ETL processes in data warehouses. In: Data Engineering, pp. 564–575 (2005)
Simitsis, A., Vassiliadis, P., Sellis, T.: State-space optimization of ETL workflows. IEEE Trans. Knowl. Data Eng. 17(10), 1404–1419 (2005)
Tziovara, V., Simitsis, A.: ETL workflows: from formal specification to optimization. In: East European Conference on Advances in Databases and Information Systems, pp. 1–11. Springer, Heidelberg (2007)
Tziovara, V., Vassiliadis, P., Simitsis, A.: Deciding the physical implementation of ETL workflows. In: Proceedings of the ACM tenth international workshop on Data warehousing and OLAP, pp. 49–56 (2007)
Wang, G., et al.: Big data analytics in logistics and supply chain management: Certain investigations for research and applications. Int. J. Prod. Econ. 176, 98–110 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hahn, S.M.L. (2019). Analysis of Existing Concepts of Optimization of ETL-Processes. In: Silhavy, R. (eds) Software Engineering Methods in Intelligent Algorithms. CSOC 2019. Advances in Intelligent Systems and Computing, vol 984. Springer, Cham. https://doi.org/10.1007/978-3-030-19807-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-19807-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19806-0
Online ISBN: 978-3-030-19807-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)