Skip to main content

Using Complex Correspondences for Integrating Relational Data Sources

  • Conference paper
  • First Online:
Enterprise Information Systems (ICEIS 2014)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 227))

Included in the following conference series:

  • 1040 Accesses

Abstract

Data Integration (DI) is the problem of combining a set of heterogeneous, autonomous data sources and providing the user with a unified view of these data. Integrating data raises several challenges, since the designer usually encounters incompatible data models characterized by differences in structure and semantics. One of the hardest challenges is to define correspondences between schema elements (e.g., attributes) to determine how they relate to each other. Since most business data is currently stored in relational databases, here present a declarative and formal approach to specify 1-to-1, 1-m, and m-to-n correspondences between relational schema components. Differently from usual approaches, our (CAs) have semantics and can deal with outer-joins and data-metadata relationships. Finally, we demonstrate how to use the CAs to generate mapping expressions in the form of SQL queries, and we present some preliminary tests to verify the performance of the generated queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use bold to represent attribute names and uppercase to represent relation names.

  2. 2.

    We use a path representation: an attribute A of a given relation R in a given database schema D is referred to as D.R.A. For simplicity, we omit the database schema when the context is clear.

  3. 3.

    IES data was extract from http://www.dados.gov.br/dataset/instituicoes-de-ensino-superior.

  4. 4.

    FSP data was extract from http://ruf.folha.uol.com.br/2014/rankingdeuniversidades/.

  5. 5.

    CDV data was extract from http://wwwcustodevida.com.br/brasil.

  6. 6.

    Match functions are functions that determine if two different instances represent the same concept in the real world.

References

  1. Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., Becker, B.: The Data Warehouse Lifecycle Tookit, 2nd edn. Wiley, Indianapolis (2008)

    Google Scholar 

  2. Popfinger, C.: Enhanced Active Databases for Federated Information Systems. PhD thesis, Heinrich Heine University Düsseldorf (2006)

    Google Scholar 

  3. Langegger, A., Wöß, W., Blöchl, M.: A semantic web middleware for virtual data integration on the web. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 493–507. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  4. Cruz, I.F., Antonelli, F.P., Stroe, C.: Agreementmaker: efficient matching for large real-world schemas and ontologies. Proc. VLDB Endow. 2(2), 1586–1589 (2009)

    Article  Google Scholar 

  5. Bellahsene, Z., Bonifati, A., Rahm, E. (eds.): Schema Matching and Mapping. Data-Centric Systems and Applications. Springer, Heidelberg (2011)

    MATH  Google Scholar 

  6. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  7. Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Massmann, S., Raunich, S., Aumueller, D., Arnold, P., Rahm, E.: Evolution of the COMA match system. In: The 6th Intl. Workshop on Ontology Matching. (2011)

    Google Scholar 

  9. Mork, P., Seligman, L., Rosenthal, A., Korb, J., Wolf, C.: The Harmony integration workbench. J. Data Semant. 11, 65–93 (2008)

    Google Scholar 

  10. Pequeno, V.M., Pires, J.C.M.: Using perspective schemata to model the ETL process. In: ICMIS 2009, pp. 332–339. World Academy of Science, Engineering and Technology (2009)

    Google Scholar 

  11. Bohannon, P., Elnahrawy, E., Fan, W., Flaster, M.: Putting context into schema matching. In: VLDB, pp. 307–318 (2006)

    Google Scholar 

  12. Vidal, V.M.P., Lóscio, B.F.: Updating multiple databases through mediators. In: ICEIS 1999, pp. 163–170 (1999)

    Google Scholar 

  13. Dhamankar, R., Lee, Y., Doan, A., Halevy, A.Y., Domingos, P.: IMAP: Discovering complex mappings between database schemas. In: ACM SIGMOD, pp. 383–394 (2004)

    Google Scholar 

  14. Yan, L.L., Miller, R.J., Haas, L.M., Fagin, R.: Data-driven understanding and refinement of schema mappings. In: ACM SIGMOD, pp. 485–496. ACM (2001)

    Google Scholar 

  15. Pequeno, V.M., Aparício, J.N.: Using correspondence assertions to specify the semantics of views in an object-relational data warehouse. In: ICEIS 2005, pp. 219–225 (2005)

    Google Scholar 

  16. Lakshmanan, L., Sadri, F., Subramanian, I.: SchemaSQL - a language for interoperability in relational multi-database systems. In: VLDB, pp. 239–250. Morgan Kaufmann (1996)

    Google Scholar 

  17. Doan, A., Halevy, A., Ives, Z.: Principles of Data Integration. Morgan Kaufmann, Waltham (2012)

    Google Scholar 

  18. Wyss, C.M., Robertson, E.L.: Relational languages for metadata integration. ACM Trans. Database Syst. 30, 624–660 (2005)

    Article  Google Scholar 

  19. Pequeno, V.M., Vidal, V.M.P., Casanova, M.A., Neto, L.E.T., Galhardas, H.: Specifying complex correspondences between relational schemas and rdf models for generating customized R2RML mappings. In: IDEAS 2014, pp. 96–104. ACM (2014)

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by national funds through FCT - Fundação para a Ciência e a Tecnologia, under the project PEst-OE/EEI/LA0021/2013, DataStorm Research Line of Excellency funding (EXCL/EEI-ESS/0257/2012) and the grant SFRH/BPD/76024/2011. We are especially grateful to Diego Cardoso (UFC, Brazil) for the implementation of the algorithms.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valéria Pequeno .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Pequeno, V., Galhardas, H., Vidal, V.M.P. (2015). Using Complex Correspondences for Integrating Relational Data Sources. In: Cordeiro, J., Hammoudi, S., Maciaszek, L., Camp, O., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2014. Lecture Notes in Business Information Processing, vol 227. Springer, Cham. https://doi.org/10.1007/978-3-319-22348-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22348-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22347-6

  • Online ISBN: 978-3-319-22348-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics