Synonyms
Data warehouse back stage; Data warehouse refreshment; ELT; ETL; ETL process; ETL tool
Definition
Extraction, transformation, and loading (ETL) processes are responsible for the operations taking place in the back stage of a data warehouse architecture. In a high-level description of an ETL process, first, the data are extracted from the source data stores that can be online transaction processing (OLTP) or legacy systems, files under any format, web pages, various kinds of documents (e.g., spreadsheets and text documents), or even data coming in a streaming fashion. Typically, only the data that are different from the previous execution of an ETL process (newly inserted, updated, and deleted information) should be extracted from the sources. After this phase, the extracted data are propagated to a special-purpose area of the warehouse, called the data staging area (DSA), where their transformation, homogenization, and cleansing take place. The most frequently used...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Akkaoui ZE, Zimányi E, Mazón J, Trujillo J. A BPMN-based design and maintenance framework for ETL processes. Int J Data Warehouse Min. 2013;9(3):46–72.
Dayal U, Castellanos M, Simitsis A, Wilkinson K. Data integration flows for business intelligence. In: Advances in Database Technology, Proceedings of the 12th International Conference on Extending Database Technology; 2009. p. 1–11.
Fagin R, Kolaitis PG, Popa L. Data exchange: getting to the core. ACM Trans Database Syst. 2005;30(1):174–210.
Grund M, Krüger J, Plattner H, Zeier A, Cudré-Mauroux P, Madden S. HYRISE – a main memory hybrid storage engine. Proc. VLDB Endowment. 2010;4(2):105–16.
Haas LM, Hernández MA, Ho H, Popa L, Roth M. Clio grows up: from research prototype to industrial tool. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2005. p. 805–10.
Halasipuram R, Deshpande PM, Padmanabhan S. Determining essential statistics for cost based optimization of an ETL workflow. In: Proceedings of the 17th International Conference on Extending Database Technology; 2014. p. 307–18.
Inmon W. Building the data warehouse. 2nd ed. New York: John Wiley & Sons; 1996.
Kemper A, Neumann T. Hyper: a hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: Proceedings of the 27th International Conference on Data Engineering; 2011. p. 195–206.
Kimbal R, Reeves L, Ross M, Thornthwaite W. The data warehouse lifecycle toolkit: expert methods for designing, developing, and deploying data warehouses. New York: Wiley; 1998.
Labio W, Garcia-Molina H. Efficient snapshot differential algorithms for data warehousing. In: Proceedings of the 22th International Conference on Very Large Data Bases; 1996. p. 63–74.
Labio W, Wiener JL, Garcia-Molina H, Gorelik V. Efficient resumption of interrupted warehouse loads. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2000. p. 46–57.
Lenzerini M. Data integration: a theoretical perspective. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2002. p. 233–46.
Liu X, Thomsen C, Pedersen TB. ETLMR: a highly scalable dimensional ETL framework based on mapreduce. Trans Large-Scale Data- Knowl-Cent Syst. 2013;8:1–31.
Luján-Mora S, Vassiliadis P, Trujillo J. Data mapping diagrams for data warehouse design with UML. In: Proceedings of the 23rd International Conference on Conceptual Modeling; 2004. p. 191–204.
Oracle. Oracle9i SQL Reference. Release 9.2; 2002.
Rahm E, Bernstein PA. A survey of approaches to automatic schema matching. VLDB J. 2001;10(4): 334–50.
Rizzi S, Abelló A, Lechtenbörger J, Trujillo J. Research in data warehouse modeling and design: dead or alive? In: Proceedings of the ACM 9th International Workshop on Data Warehousing and OLAP; 2006. p. 3–10.
Romero O, Simitsis A, Abelló A. GEM: requirement-driven generation of ETL and multidimensional conceptual designs. In: Proceedings of the 13th International Conference on Data Warehousing and Knowledge Discovery; 2011. p. 80–95.
Roth MT, Schwarz PM. Don’t scrap it, wrap it! a wrapper architecture for legacy data sources. In: Proceedings of the 23th International Conference on Very Large Data Bases; 1997. p. 266–75.
Shu NC, Housel BC, Taylor RW, Ghosh SP, Lum VY. Express: a data extraction, processing, amd restructuring system. ACM Trans Database Syst. 1977;2(2):134–74.
Simitsis A, Vassiliadis P, Sellis TK. Optimizing ETL processes in data warehouses. In: Proceedings of the 21st International Conference on Data Engineering; 2005. p. 564–75.
Simitsis A, Vassiliadis P, Sellis TK. State-space optimization of ETL workflows. IEEE Trans Knowl Data Eng. 2005;17(10):1404–19.
Simitsis A, Wilkinson K, Castellanos M, Dayal U. Qox-driven ETL design: reducing the cost of ETL consulting engagements. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; p. 953–60.
Simitsis A, Wilkinson K, Castellanos M, Dayal U. Optimizing analytic data flows for multiple execution engines. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2012. p. 829–40.
Skoutas D, Simitsis A. Designing ETL processes using semantic web technologies. In: Proceedings of the ACM 9th International Workshop on Data Warehousing and OLAP; 2006. p. 67–74.
Thomsen C, Pedersen TB. Easy and effective parallel programmable ETL. In: Proceedings of the ACM 14th International Workshop on Data Warehousing and OLAP; 2011. p. 37–44.
TPC. TPC-DS (Decision Support) specification, draft version 52; Feb 2007.
Trujillo J, Luján-Mora S. A UML based approach for modeling ETL processes in data warehouses. In: Proceedings of the 22nd International Conference on Conceptual Modeling; 2003. p. 307–20.
Vassiliadis P, Karagiannis A, Tziovara V, Vassiliadis P, Simitsis A. Towards a benchmark for ETL workflows. In: Proceedings of the 5th International Workshop on Quality in Databases at VLDB; 2007.
Vassiliadis P, Simitsis A, Skiadopoulos S. Conceptual modeling for ETL processes. In: Proceedings of the ACM 5th International Workshop on Data Warehousing and OLAP; 2002. p. 14–21.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Simitsis, A., Vassiliadis, P. (2018). Extraction, Transformation, and Loading. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_158
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_158
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering