Advertisement

Performance Tests in Data Warehousing ETLM Process for Detection of Changes in Data Origin

  • Rosana L. A. Rocha
  • Leonardo Figueiredo Cardoso
  • Jano Moreira de Souza
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2737)

Abstract

In a data warehouse (DW) environment, when the operational environment does not posses or does not want to inform the data about the changes that occurred, controls have to be implemented to enable detection of these changes and to reflect them in the DW environment. The main scenarios are: i) the impossibility to instrument the DBMS (triggers, transaction log, stored procedures, replication, materialized views, old and new versions of data, etc) due to security policies, data property or performance issues; ii) the lack of instrumentation resources on the DBMS; iii) the use of legacy technologies such file systems or semi-structured data; iv) application proprietary databases and ERP systems. In another article [1], we presented the development and implementation of a technique that was derived for the comparison of database snapshots, where we use signatures to mark and detect changes. The technique is simple and can be applied to all four scenarios above. To prove the efficiency of our technique, in this article we do comparative performance tests between these approaches. We performed two benchmarks: the first one using synthetic data and the second one using the real data from a case study in the data warehouse project developed for Rio Sul Airlines, a regional aviation company belonging to the Brazil-based Varig group. We also describe the main approaches to solve the detection of changes in data origin.

Keywords

Performance Test Data Warehouse Origin Area Cleaning Management View Maintenance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rocha, R. L. A., Cardoso, L. F., Souza, J. M., 2003, An Improved Approach in Data Warehousing ETLM Process for Detection of Changes in Data Origin. COPPE/UFRJ, Report No ES-593/03 http://www.cos.ufrj.br/publicacoes/reltec/es59303.pdf
  2. 2.
    Do, L., Drew, P., Jin, W., et al.: Issues in Developing Very Large Databases. In: Proceedings of the 24th VLDB Conference, New York, USA, pp. 633–636 (August 1998)Google Scholar
  3. 3.
    Özsu, M.T., Valduriez, P.: Principles of Distributes Database Systems, 1st edn. Prentice Hall Inc., New Jersey (1991)Google Scholar
  4. 4.
    Zhuge, Y., Garcia-Molina, H., Hammer, J., et al.: View Maintenance in a Warehousing Environment. In: Proceedings of ACM SIGMOD International Conference on Management Data, San Jose, California, USA, pp. 316–327 (June 1995)Google Scholar
  5. 5.
    Zhuge, Y., Garcia-Molina, H., Wiener, J.L.: The Strobe Algorithms for Multi-Source Warehouse Consistency. In: Proceedings on Parallel and Distributed Information Systems, Miami Beach, Florida, USA, pp. 146–157 (December 1996)Google Scholar
  6. 6.
    Quass, D., Widom, J.: On-Line Warehouse View Maintenance. In: Proceedings of ACM SIGMOD International Conference on Management Data, Tucson, Arizona, USA, pp. 405–416 (May 1997)Google Scholar
  7. 7.
    Hull, R., Zhou, G.: Towards the Study of Performance Trade-offs Between Materialized and Virtual Integrated Views. In: Proc. Workshop on Materialized Views: Techniques and Applications (VIEWS 1996), Canada, pp. 91–102 (June 1996)Google Scholar
  8. 8.
    Quass, D., Gupta, A., Mumick, I.S., et al.: Making Views Self-Maintainable for Data Warehousing. In: Proceedings on Parallel and Distributed Information Systems, Miami Beach, Florida, USA, pp. 158–169 (December 1996)Google Scholar
  9. 9.
    Inmon, W.H., Kelley, C.: Rdb/VMS, developing the data warehouse. QED Pub. Group, Boston (1993)Google Scholar
  10. 10.
    Labio, W.J., Yerneni, R., Garcia-Molina, H.: Shrinking the Warehouse Update Window. In: Proceedings of ACM SIGMOD International Conference on Management Data, Philadelphia, USA, pp. 383–394 (June 1999)Google Scholar
  11. 11.
    Widom, J., Ceri, S.: Active Databases Systems: Triggers and Rules for Advanced Database Processing, San Francisco, California, USA (1996)Google Scholar
  12. 12.
    Craig, R.S., Vivona, J.A., Berkovitch, D.: Microsoft data warehousing building distributed decision support systems. Wiley, New York (1999)Google Scholar
  13. 13.
    Widom, J.: Research Problems in Data Warehousing. In: Proceedings of ACM CIKM International Conference on Management Data, USA, pp. 25–30 (November 1995)Google Scholar
  14. 14.
    Hammer, J., Garcia-Molina, H., Widom, J., et al.: The Stanford Data Warehousing Project. IEEE Quarterly Bulletin on Data Engineering; Special Issue on Materialized Views and Data Warehousing 18(2), 41–48 (1995)Google Scholar
  15. 15.
    Chawathe, S.S., Garcia-Molina, H.: Meaningful Change Detection in Structured Data. In: Proceedings of ACM SIGMOD International Conference on Management Data, Arizona, USA, pp. 26–37 (May1997)Google Scholar
  16. 16.
    Kimball, R.: Data Warehouse Toolkit. John Wiley & Sons, Inc., New York (1996)Google Scholar
  17. 17.
    Kimball, R.: The Data Warehouse Lifecycle Toolkit. In: Expert Methods for Designing, Developing, and Deploying Data Warehouses. John Wiley & Sons, Inc., New York (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Rosana L. A. Rocha
    • 1
  • Leonardo Figueiredo Cardoso
    • 1
  • Jano Moreira de Souza
    • 1
  1. 1.COPPE/UFRJFederal University of Rio de JaneiroRio de JaneiroBrazil

Personalised recommendations