Schema Normalization for Improving Schema Matching

Sorrentino, Serena; Bergamaschi, Sonia; Gawinecki, Maciej; Po, Laura

doi:10.1007/978-3-642-04840-1_22

Serena Sorrentino²²,
Sonia Bergamaschi²¹,
Maciej Gawinecki²² &
…
Laura Po²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5829))

Included in the following conference series:

International Conference on Conceptual Modeling

1500 Accesses
18 Citations

Abstract

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the “hidden meaning” associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning” to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations. In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy.

Acknowledgements: This work was partially supported by MUR FIRB Network Peer for Business project (http://www.dbgroup.unimo.it/nep4b) and by the IST FP6 STREP project 2006 STASIS (http://www.dbgroup.unimo.it/stasis).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with COMA++. In: SIGMOD 2005, pp. 906–908 (2005)
Google Scholar
Bergamaschi, S., Castano, S., Vincini, M.: Semantic integration of semistructured and structured data sources. SIGMOD Record 28(1), 54–59 (1999)
Article Google Scholar
Bergamaschi, S., Po, L., Sorrentino, S.: Automatic annotation for mapping discovery in data integration systems. In: SEBD 2008, pp. 334–341 (2008)
Google Scholar
Beneventano, D., Bergamaschi, S., Guerra, F., Vincini, M.: Synthesizing an integrated ontology. IEEE Internet Computing 7(5), 42–51 (2003)
Article Google Scholar
Do, H.H., Melnik, S., Rahm, E.: Comparison of schema matching evaluations. In: Web, Web-Services, and Database Systems, pp. 221–237 (2002)
Google Scholar
Le, B.T., et al.: On ontology matching problems - for building a corporate semantic web in a multi-communities organization. ICEIS (4), 236–243 (2004)
Google Scholar
Hill, E., et al.: AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools. In: MSR 2008 (2008)
Google Scholar
Miller, G.A., et al.: Wordnet: An on-line lexical database. International Journal of Lexicography 3, 235–244 (1990)
Article Google Scholar
Feild, H., et al.: An Empirical Comparison of Techniques for Extracting Concept Abbreviations from Identifiers. In: SEA 2006 (November 2006)
Google Scholar
Miller, R.J., et al.: The Amalgam Schema and Data Integration Test Suite (2001), http://www.cs.toronto.edu/miller/amalgam
Uthurusamy, R., et al.: Extracting knowledge from diagnostic databases. IEEE Expert: Intelligent Systems and Their Applications 8(6), 27–38 (1993)
Google Scholar
Nastase, V., et al.: Learning noun-modifier semantic relations with corpus-based and wordnet-based features. In: AAAI (2006)
Google Scholar
Wong, W., et al.: Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text. In: AusDM 2006, pp. 83–89 (2006)
Google Scholar
Euzenat, J., Shvaiko, P.: Ontology matching. Springer, Heidelberg (2007)
MATH Google Scholar
Fan, J., Barker, K., Porter, B.W.: The knowledge required to interpret noun compounds. In: IJCAI, pp. 1483–1485 (2003)
Google Scholar
Finin, T.W.: The semantic interpretation of nominal compounds. In: AAAI, pp. 310–312 (1980)
Google Scholar
Giunchiglia, F., Shvaiko, P., Yatskevich, M.: S-match: an algorithm and an implementation of semantic matching. In: Semantic Interoperability and Integration (2005)
Google Scholar
Lapata, M.: The disambiguation of nominalizations. Computational Linguistics 28(3), 357–388 (2002)
Article Google Scholar
Levi, J.N.: The Syntax and Semantics of Complex Nominals. Academic Press, New York (1978)
Google Scholar
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: VLDB, pp. 49–58 (2001)
Google Scholar
Plag, I.: Word-Formation in English. Cambridge Textbooks in Linguistics. Cambridge University Press, New York (2003)
Google Scholar
Ratinov, L., Gudes, E.: Abbreviation Expansion in Schema Matching and Web Integration. In: WI 2004, pp. 485–489 (2004)
Google Scholar
Su, X., Gulla, J.A.: Semantic enrichment for ontology mapping. In: Meziane, F., Métais, E. (eds.) NLDB 2004. LNCS, vol. 3136, pp. 217–228. Springer, Heidelberg (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

DII, University of Modena and Reggio Emilia, Italy
Sonia Bergamaschi & Laura Po
ICT Doctorate School, University of Modena and Reggio Emilia, Italy
Serena Sorrentino & Maciej Gawinecki

Authors

Serena Sorrentino
View author publications
You can also search for this author in PubMed Google Scholar
Sonia Bergamaschi
View author publications
You can also search for this author in PubMed Google Scholar
Maciej Gawinecki
View author publications
You can also search for this author in PubMed Google Scholar
Laura Po
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Federal University of Minas Gerais, Av. Antônio Carlos 6627, Prédio do ICEx, sala 4010, Pampulha , Belo, 31270-901, Horizonte, MG, Brazil
Alberto H. F. Laender
Dipartimento di Scienze dell’Informazione, DICo, Università degli Studi di Milano, Comelico, 39/41, 20135, Milano, Italy
Silvana Castano
Hewlett-Packard Laboratories, 1501 Page Mill Rd., 94304, Palo Alto, CA, USA
Umeshwar Dayal
Department of Information Engineering and Computer Science, University of Trento, Via sommarive 14, 38050, Povo (Trento), Italy
Fabio Casati
Instituto de Informática, Universidade Federal do Rio Grande do Sul (UFRGS), Caixa Postal 15.064, 91.501-970, Porto Alegre, RS, Brasil
José Palazzo M. de Oliveira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sorrentino, S., Bergamaschi, S., Gawinecki, M., Po, L. (2009). Schema Normalization for Improving Schema Matching. In: Laender, A.H.F., Castano, S., Dayal, U., Casati, F., de Oliveira, J.P.M. (eds) Conceptual Modeling - ER 2009. ER 2009. Lecture Notes in Computer Science, vol 5829. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04840-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-04840-1_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04839-5
Online ISBN: 978-3-642-04840-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics