Skip to main content

A UML Based Approach for Modeling ETL Processes in Data Warehouses

  • Conference paper
Conceptual Modeling - ER 2003 (ER 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2813))

Included in the following conference series:

Abstract

Data warehouses (DWs) are complex computer systems whose main goal is to facilitate the decision making process of knowledge workers. ETL (Extraction-Transformation-Loading) processes are responsible for the extraction of data from heterogeneous operational data sources, their transformation (conversion, cleaning, normalization, etc.) and their loading into DWs. ETL processes are a key component of DWs because incorrect or misleading data will produce wrong business decisions, and therefore, a correct design of these processes at early stages of a DW project is absolutely necessary to improve data quality. However, not much research has dealt with the modeling of ETL processes. In this paper, we present our approach, based on the Unified Modeling Language (UML), which allows us to accomplish the conceptual modeling of these ETL processes. We provide the necessary mechanisms for an easy and quick specification of the common operations defined in these ETL processes such as, the integration of different data sources, the transformation between source and target attributes, the generation of surrogate keys and so on. Another advantage of our proposal is the use of the UML (standardization, ease-of-use and functionality) and the seamless integration of the design of the ETL processes with the DW conceptual schema.

This paper has been partially supported by the Spanish Ministery of Science and Technology, project number TIC2001-3530-C02-02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Inmon, W.H.: Building the Data Warehouse. QED Press/John Wiley (1992); Last edition: 3rd edn. John Wiley & Sons (2002)

    Google Scholar 

  2. SQL Power Group: How do I ensure the success of my DW? (2002), Internet: http://www.sqlpower.ca/page/dw best practices

  3. Strange, K.: ETLWas the Key to this Data Warehouse’s Success. Technical Report CS-15-3143, Gartner (2002)

    Google Scholar 

  4. Rahm, E., Do, H.: Data Cleaning: Problems and Current Approaches. IEEE Bulletin of the Technical Committee on Data Engineering 23, 3–13 (2000)

    Google Scholar 

  5. Friedman, T.: ETL Magic Quadrant Update: Market Pressure Increases. Technical Report M-19-1108, Gartner (2003)

    Google Scholar 

  6. Greenfield, L.: Data Extraction, Transforming, Loading (ETL) Tools. The Data Warehousing Information Center (2003), Internet http://www.dwinfocenter.org/clean.html

  7. Agosta, L.: Market Overview Update: ETL. Technical Report RPA-032002-00021, Giga Information Group (2002)

    Google Scholar 

  8. Kimball, R.: The Data Warehouse Toolkit. John Wiley & Sons, Chichester (1996); Last edition: 2nd edn. John Wiley & Sons (2002)

    Google Scholar 

  9. Object Management Group (OMG): Unified Modeling Language Specification 1.4 (2001), Internet http://www.omg.org/cgi-bin/doc?formal/01-09-67

  10. Trujillo, J., Palomar, M., Gómez, J., Song, I.: Designing Data Warehouses with OO Conceptual Models. IEEE Computer, special issue on Data Warehouses 34, 66–75 (2001)

    Google Scholar 

  11. Luján-Mora, S., Trujillo, J., Song, I.: Extending UML for Multidimensional Modeling. In: Jézéquel, J.-M., Hussmann, H., Cook, S. (eds.) UML 2002. LNCS, vol. 2460, pp. 290–304. Springer, Heidelberg (2002)

    Google Scholar 

  12. Luján-Mora, S., Trujillo, J., Song, I.: Multidimensional Modeling with UML Package Diagrams. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 199–213. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  13. Eckerson, W.: Data Quality and the Bottom Line. Technical report, The Data Warehousing Institute (2002)

    Google Scholar 

  14. Naiburg, E., Maksimchuk, R.: UML for Database Design. Addison-Wesley, Reading (2001)

    Google Scholar 

  15. Golfarelli, M., Rizzi, S.: A methodological Framework for Data Warehouse Design. In: Proc. of the ACM 1st Intl. Workshop on Data warehousing and OLAP (DOLAP 1998), Washington D.C., USA, pp. 3–9 (1998)

    Google Scholar 

  16. Sapia, C., Blaschka, M., Höfling, G., Dinter, B.: Extending the E/R Model for the Multidimensional Paradigm. In: Kambayashi, Y., Lee, D.-L., Lim, E.-p., Mohania, M., Masunaga, Y. (eds.) ER Workshops 1998. LNCS, vol. 1552, pp. 105–116. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  17. Tryfona, N., Busborg, F., Christiansen, J.: starER: A Conceptual Model for Data Warehouse Design. In: Proc. of the ACM 2nd Intl. Workshop on Data warehousing and OLAP (DOLAP 1999), Kansas City, Missouri, USA (1999)

    Google Scholar 

  18. Husemann, B., Lechtenborger, J., Vossen, G.: Conceptual Data Warehouse Design. In: Proc. of the 2nd. Intl. Workshop on Design and Management of Data Warehouses (DMDW 2000), Stockholm, Sweden, pp. 3–9 (2000)

    Google Scholar 

  19. Abelló, A., Samos, J., Saltor, F.: YAM2 (Yet Another Multidimensional Model): An Extension of UML. In: International Database Engineering & Applications Symposium (IDEAS 2002), Edmonton, Canada, pp. 172–181 (2002)

    Google Scholar 

  20. National Technical University of Athens (Greece): Knowledge and Database Systems Laboratory (2003), Internet http://www.dblab.ntua.gr/

  21. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual Modeling for ETL Processes. In: 5th ACM International Workshop on Data Warehousing and OLAP (DOLAP 2002), McLean, USA, pp. 14–21 (2002)

    Google Scholar 

  22. Vassiliadis, P., Vagena, Z., Skiadopoulos, S., Karayannidis, N., Sellis, T.: ARKTOS: towards the modeling, design, control and execution of ETL processes. Information Systems, 537–561 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Trujillo, J., Luján-Mora, S. (2003). A UML Based Approach for Modeling ETL Processes in Data Warehouses. In: Song, IY., Liddle, S.W., Ling, TW., Scheuermann, P. (eds) Conceptual Modeling - ER 2003. ER 2003. Lecture Notes in Computer Science, vol 2813. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39648-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39648-2_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20299-8

  • Online ISBN: 978-3-540-39648-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics