Automating the Schema Matching Process for Heterogeneous Data Warehouses

Banek, Marko; Vrdoljak, Boris; Tjoa, A. Min; Skočir, Zoran

doi:10.1007/978-3-540-74553-2_5

Automating the Schema Matching Process for Heterogeneous Data Warehouses

Marko Banek¹,
Boris Vrdoljak¹,
A. Min Tjoa² &
…
Zoran Skočir¹

Conference paper

1244 Accesses
13 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4654))

Abstract

A federated data warehouse is a logical integration of data warehouses applicable when physical integration is impossible due to privacy policy or legal restrictions. In order to enable the translation of queries in a federated approach, schemas of the federated and the local warehouses must be matched. In this paper we present a procedure that enables the matching process for schema structures specific to the multidimensional model of data warehouses: facts, measures, dimensions, aggregation levels and dimensional attributes. Similarities between warehouse-specific structures are computed by using linguistic and structural comparison, where calculated values are used to create necessary mappings. We present restriction rules and recommendations for aggregation level matching, which builds the most complex part of the process. A software implementation of the entire process is provided in order to perform its verification, as well as to determine the proper selection metric for mapping different multidimensional structures.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bergamaschi, S., Castano, S., Vincini, M.: Semantic Integration of Semistructured and Structured Data Sources. SIGMOD Record 28, 54–59 (1999)
Article Google Scholar
Berger, S., Schrefl, M.: Analysing Multi-dimensional Data accross Autonomous Data Warehouses. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 120–133. Springer, Heidelberg (2006)
Chapter Google Scholar
Banek, M., Tjoa, A.M., Stolba, N.: Integrating Different Grain Levels in a Medical Data Warehouse Federation. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 185–194. Springer, Heidelberg (2006)
Chapter Google Scholar
Cabibbo, L., Torlone, R.: Integrating Heterogeneous Multidimensional Databases. In: Proc. Int. Conf. Scientific and Stat. Database Management 2005, pp. 205–214. IEEE Comp. Soc., Los Alamitos (2005)
Google Scholar
Dhamankar, R., Lee, Y., Doan, A.-H., Halevy, A.Y., Domingos, P.: iMAP: Discovering Complex Mappings between Database Schemas. In: Proc. SIGMOD Conf. 2004, pp. 383–394. ACM Press, New York (2004)
Chapter Google Scholar
Kim, W., Seo, J.: Classifying Semantic and Data Heterogeneity in Multidatabase Systems. IEEE Computer 24(12), 12–18 (1991)
Google Scholar
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic Schema Matching with Cupid. In: Proc. Int. Conf. on Very Large Data Bases 2001, pp. 49–58. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching. In: Proc. Int. Conf. on Data Engineering 2002, pp. 117–128. IEEE Computer Society, Los Alamitos (2002)
Chapter Google Scholar
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity Flooding: A Versatile Graph Matching Algorithm. Technical Report (2001), http://dbpubs.stanford.edu/pub/2001-25
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10, 334–350 (2001)
Article MATH Google Scholar
Rodríguez, M.A., Egenhofer, M.J.: Determining Semantic Similarity among Entity Classes from Different Ontologies. IEEE Trans. Knowl. Data Eng. 15, 442–456 (2003)
Article Google Scholar
Stolba, N., Banek, M., Tjoa, A.M.: The Security Issue of Federated Data Warehouses in the Area of Evidence-Based Medicine. In: Proc. Conf. Availability, Reliability and Security 2006, pp. 329–339. IEEE Computer Society, Los Alamitos (2006)
Chapter Google Scholar
Sheth, A.P., Larson, J.A.: Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys 22, 183–236 (1990)
Article Google Scholar
Princeton University Cognitive Science Laboratory: WordNet, a lexical database for English Language (last access March 25, 2007), http://wordnet.princeton.edu
Yang, D., Powers, D.M.W.: Measuring Semantic Similarity in the Taxonomy of WordNet. In: CRPIT 38, pp. 315–322, Australian Computer Society (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000 Zagreb, Croatia
Marko Banek, Boris Vrdoljak & Zoran Skočir
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, A-1040 Wien, Austria
A. Min Tjoa

Authors

Marko Banek
View author publications
You can also search for this author in PubMed Google Scholar
Boris Vrdoljak
View author publications
You can also search for this author in PubMed Google Scholar
A. Min Tjoa
View author publications
You can also search for this author in PubMed Google Scholar
Zoran Skočir
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Il Yeal Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Banek, M., Vrdoljak, B., Tjoa, A.M., Skočir, Z. (2007). Automating the Schema Matching Process for Heterogeneous Data Warehouses. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2007. Lecture Notes in Computer Science, vol 4654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74553-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-74553-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74552-5
Online ISBN: 978-3-540-74553-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics