Abstract
Data Integration (DI) is the problem of combining a set of heterogeneous, autonomous data sources and providing the user with a unified view of these data. Integrating data raises several challenges, since the designer usually encounters incompatible data models characterized by differences in structure and semantics. One of the hardest challenges is to define correspondences between schema elements (e.g., attributes) to determine how they relate to each other. Since most business data is currently stored in relational databases, here present a declarative and formal approach to specify 1-to-1, 1-m, and m-to-n correspondences between relational schema components. Differently from usual approaches, our (CAs) have semantics and can deal with outer-joins and data-metadata relationships. Finally, we demonstrate how to use the CAs to generate mapping expressions in the form of SQL queries, and we present some preliminary tests to verify the performance of the generated queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use bold to represent attribute names and uppercase to represent relation names.
- 2.
We use a path representation: an attribute A of a given relation R in a given database schema D is referred to as D.R.A. For simplicity, we omit the database schema when the context is clear.
- 3.
IES data was extract from http://www.dados.gov.br/dataset/instituicoes-de-ensino-superior.
- 4.
FSP data was extract from http://ruf.folha.uol.com.br/2014/rankingdeuniversidades/.
- 5.
CDV data was extract from http://wwwcustodevida.com.br/brasil.
- 6.
Match functions are functions that determine if two different instances represent the same concept in the real world.
References
Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., Becker, B.: The Data Warehouse Lifecycle Tookit, 2nd edn. Wiley, Indianapolis (2008)
Popfinger, C.: Enhanced Active Databases for Federated Information Systems. PhD thesis, Heinrich Heine University Düsseldorf (2006)
Langegger, A., Wöß, W., Blöchl, M.: A semantic web middleware for virtual data integration on the web. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 493–507. Springer, Heidelberg (2008)
Cruz, I.F., Antonelli, F.P., Stroe, C.: Agreementmaker: efficient matching for large real-world schemas and ontologies. Proc. VLDB Endow. 2(2), 1586–1589 (2009)
Bellahsene, Z., Bonifati, A., Rahm, E. (eds.): Schema Matching and Mapping. Data-Centric Systems and Applications. Springer, Heidelberg (2011)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)
Massmann, S., Raunich, S., Aumueller, D., Arnold, P., Rahm, E.: Evolution of the COMA match system. In: The 6th Intl. Workshop on Ontology Matching. (2011)
Mork, P., Seligman, L., Rosenthal, A., Korb, J., Wolf, C.: The Harmony integration workbench. J. Data Semant. 11, 65–93 (2008)
Pequeno, V.M., Pires, J.C.M.: Using perspective schemata to model the ETL process. In: ICMIS 2009, pp. 332–339. World Academy of Science, Engineering and Technology (2009)
Bohannon, P., Elnahrawy, E., Fan, W., Flaster, M.: Putting context into schema matching. In: VLDB, pp. 307–318 (2006)
Vidal, V.M.P., Lóscio, B.F.: Updating multiple databases through mediators. In: ICEIS 1999, pp. 163–170 (1999)
Dhamankar, R., Lee, Y., Doan, A., Halevy, A.Y., Domingos, P.: IMAP: Discovering complex mappings between database schemas. In: ACM SIGMOD, pp. 383–394 (2004)
Yan, L.L., Miller, R.J., Haas, L.M., Fagin, R.: Data-driven understanding and refinement of schema mappings. In: ACM SIGMOD, pp. 485–496. ACM (2001)
Pequeno, V.M., Aparício, J.N.: Using correspondence assertions to specify the semantics of views in an object-relational data warehouse. In: ICEIS 2005, pp. 219–225 (2005)
Lakshmanan, L., Sadri, F., Subramanian, I.: SchemaSQL - a language for interoperability in relational multi-database systems. In: VLDB, pp. 239–250. Morgan Kaufmann (1996)
Doan, A., Halevy, A., Ives, Z.: Principles of Data Integration. Morgan Kaufmann, Waltham (2012)
Wyss, C.M., Robertson, E.L.: Relational languages for metadata integration. ACM Trans. Database Syst. 30, 624–660 (2005)
Pequeno, V.M., Vidal, V.M.P., Casanova, M.A., Neto, L.E.T., Galhardas, H.: Specifying complex correspondences between relational schemas and rdf models for generating customized R2RML mappings. In: IDEAS 2014, pp. 96–104. ACM (2014)
Acknowledgements
This work was partially supported by national funds through FCT - Fundação para a Ciência e a Tecnologia, under the project PEst-OE/EEI/LA0021/2013, DataStorm Research Line of Excellency funding (EXCL/EEI-ESS/0257/2012) and the grant SFRH/BPD/76024/2011. We are especially grateful to Diego Cardoso (UFC, Brazil) for the implementation of the algorithms.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Pequeno, V., Galhardas, H., Vidal, V.M.P. (2015). Using Complex Correspondences for Integrating Relational Data Sources. In: Cordeiro, J., Hammoudi, S., Maciaszek, L., Camp, O., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2014. Lecture Notes in Business Information Processing, vol 227. Springer, Cham. https://doi.org/10.1007/978-3-319-22348-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-22348-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22347-6
Online ISBN: 978-3-319-22348-3
eBook Packages: Computer ScienceComputer Science (R0)