Skip to main content

Schema Normalization for Improving Schema Matching

  • Conference paper
Conceptual Modeling - ER 2009 (ER 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5829))

Included in the following conference series:

Abstract

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the “hidden meaning” associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning” to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations. In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy.

Acknowledgements: This work was partially supported by MUR FIRB Network Peer for Business project (http://www.dbgroup.unimo.it/nep4b) and by the IST FP6 STREP project 2006 STASIS (http://www.dbgroup.unimo.it/stasis).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with COMA++. In: SIGMOD 2005, pp. 906–908 (2005)

    Google Scholar 

  2. Bergamaschi, S., Castano, S., Vincini, M.: Semantic integration of semistructured and structured data sources. SIGMOD Record 28(1), 54–59 (1999)

    Article  Google Scholar 

  3. Bergamaschi, S., Po, L., Sorrentino, S.: Automatic annotation for mapping discovery in data integration systems. In: SEBD 2008, pp. 334–341 (2008)

    Google Scholar 

  4. Beneventano, D., Bergamaschi, S., Guerra, F., Vincini, M.: Synthesizing an integrated ontology. IEEE Internet Computing 7(5), 42–51 (2003)

    Article  Google Scholar 

  5. Do, H.H., Melnik, S., Rahm, E.: Comparison of schema matching evaluations. In: Web, Web-Services, and Database Systems, pp. 221–237 (2002)

    Google Scholar 

  6. Le, B.T., et al.: On ontology matching problems - for building a corporate semantic web in a multi-communities organization. ICEIS (4), 236–243 (2004)

    Google Scholar 

  7. Hill, E., et al.: AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools. In: MSR 2008 (2008)

    Google Scholar 

  8. Miller, G.A., et al.: Wordnet: An on-line lexical database. International Journal of Lexicography 3, 235–244 (1990)

    Article  Google Scholar 

  9. Feild, H., et al.: An Empirical Comparison of Techniques for Extracting Concept Abbreviations from Identifiers. In: SEA 2006 (November 2006)

    Google Scholar 

  10. Miller, R.J., et al.: The Amalgam Schema and Data Integration Test Suite (2001), http://www.cs.toronto.edu/miller/amalgam

  11. Uthurusamy, R., et al.: Extracting knowledge from diagnostic databases. IEEE Expert: Intelligent Systems and Their Applications 8(6), 27–38 (1993)

    Google Scholar 

  12. Nastase, V., et al.: Learning noun-modifier semantic relations with corpus-based and wordnet-based features. In: AAAI (2006)

    Google Scholar 

  13. Wong, W., et al.: Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text. In: AusDM 2006, pp. 83–89 (2006)

    Google Scholar 

  14. Euzenat, J., Shvaiko, P.: Ontology matching. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  15. Fan, J., Barker, K., Porter, B.W.: The knowledge required to interpret noun compounds. In: IJCAI, pp. 1483–1485 (2003)

    Google Scholar 

  16. Finin, T.W.: The semantic interpretation of nominal compounds. In: AAAI, pp. 310–312 (1980)

    Google Scholar 

  17. Giunchiglia, F., Shvaiko, P., Yatskevich, M.: S-match: an algorithm and an implementation of semantic matching. In: Semantic Interoperability and Integration (2005)

    Google Scholar 

  18. Lapata, M.: The disambiguation of nominalizations. Computational Linguistics 28(3), 357–388 (2002)

    Article  Google Scholar 

  19. Levi, J.N.: The Syntax and Semantics of Complex Nominals. Academic Press, New York (1978)

    Google Scholar 

  20. Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: VLDB, pp. 49–58 (2001)

    Google Scholar 

  21. Plag, I.: Word-Formation in English. Cambridge Textbooks in Linguistics. Cambridge University Press, New York (2003)

    Google Scholar 

  22. Ratinov, L., Gudes, E.: Abbreviation Expansion in Schema Matching and Web Integration. In: WI 2004, pp. 485–489 (2004)

    Google Scholar 

  23. Su, X., Gulla, J.A.: Semantic enrichment for ontology mapping. In: Meziane, F., Métais, E. (eds.) NLDB 2004. LNCS, vol. 3136, pp. 217–228. Springer, Heidelberg (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sorrentino, S., Bergamaschi, S., Gawinecki, M., Po, L. (2009). Schema Normalization for Improving Schema Matching. In: Laender, A.H.F., Castano, S., Dayal, U., Casati, F., de Oliveira, J.P.M. (eds) Conceptual Modeling - ER 2009. ER 2009. Lecture Notes in Computer Science, vol 5829. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04840-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04840-1_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04839-5

  • Online ISBN: 978-3-642-04840-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics