Deriving Sub-schema Similarities from Semantically Heterogeneous XML Sources

De Meo, Pasquale; Quattrone, Giovanni; Terracina, Giorgio; Ursino, Domenico

doi:10.1007/978-3-540-30468-5_15

Pasquale De Meo¹⁸,
Giovanni Quattrone¹⁸,
Giorgio Terracina¹⁹ &
…
Domenico Ursino¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3290))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

877 Accesses
2 Citations

Abstract

In this paper we propose a semi-automatic technique for deriving similarities between XML sub-schemas. The proposed technique is specific for XML, almost automatic and light. It consists of two phases: the former one selects the most promising pairs of sub-schemas; the latter one examines them and returns only the similar ones. In the paper we discuss some possible applications that can benefit of derived sub-schema similarities and we illustrate some experiments we have conducted for testing the validity of our approach. Finally, a comparison of the proposed approach with some related ones already presented in the literature, as well as a real example case aiming at better clarifying it, are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Batini, C., Lenzerini, M.: A methodology for data schema integration in the entity relationship model. IEEE Transactions on Software Engineering 10(6), 650–664 (1984)
Article Google Scholar
Castano, S., De Antonellis, V., De Capitani di Vimercati, S.: Global viewing of heterogeneous data sources. Transactions on Data and Knowledge Engineering 13(2), 277–297 (2001)
Article Google Scholar
Chua, C.E.H., Chiang, R.H.L., Lim, E.P.: Instance-based attribute identification in database integration. The International Journal on Very Large Databases 12(3), 228–243 (2003)
Article Google Scholar
De Meo, P., Quattrone, G., Terracina, G., Ursino, D.: “Almost automatic” and semantic integration of XML schemas at various “Severity” levels. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 4–21. Springer, Heidelberg (2003)
Chapter Google Scholar
De Meo, P., Quattrone, G., Terracina, G., Ursino, D.: Extraction of synonymies, hyponymies, overlappings and homonymies from XML Schemas at various “severity” levels. In: Proc. of the International Database Engineering and Applications Symposium (IDEAS 2004), Coimbra, Portugal (2004) (forthcoming)
Google Scholar
Dhamankar, R., Lee, Y., Doan, A., Halevy, A., Domingos, P.: iMAP: Discovering complex semantic matches between database schemas. In: Proc. of the ACM International Conference on Management of Data, SIGMOD 2004, Paris, France, ACM Press, New York (2004) (forthcoming)
Google Scholar
Do, H., Melnik, S., Rahm, E.: Comparison of schema matching evaluations. In: Proc. of the International Workshop on Web, Web-Services, and Database Systems, Erfurt, Germany, pp. 221–237. Springer, Heidelberg (2002)
Google Scholar
Do, H., Rahm, E.: COMA- a system for flexible combination of schema matching approaches. In: Proc. of the International Conference on Very Large Databases (VLDB 2002), Hong Kong, China, VLDB Endowment, pp. 610–621 (2002)
Google Scholar
Doan, A., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.: Learning to match ontologies on the Semantic Web. The International Journal on Very Large Databases 12(4), 303–319 (2003)
Article Google Scholar
Gal, A., Anaby-Tavor, A., Trombetta, A., Montesi, D.: A framework for modeling and evaluating automatic semantic reconciliation. The International Journal on Very Large Databases (2004) (forthcoming)
Google Scholar
Galil, Z.: Efficient algorithms for finding maximum matching in graphs. ACM Computing Surveys 18, 23–38 (1986)
Article MATH MathSciNet Google Scholar
Li, W., Clifton, C.: SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data and Knowledge Engineering 33(1), 49–84 (2000)
Article MATH Google Scholar
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with Cupid. In: Proc. of the International Conference on Very Large Data Bases (VLDB 2001), Roma, Italy, pp. 49–58. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity Flooding: A versatile graph matching algorithm and its application to schema matching. In: Proc. of the International Conference on Data Engineering (ICDE 2002), San Jose, California, USA, pp. 117–128. IEEE Computer Society Press, Los Alamitos (2002)
Chapter Google Scholar
Mitra, P., Wiederhold, G., Jannink, J.: Semi-automatic integration of knowledge sources. In: Proc. of Fusion 1999, Sunnyvale, California, USA (1999)
Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

DIMET, Università Mediterranea di Reggio Calabria, Via Graziella, Località Feo di Vito, 89060, Reggio Calabria, Italy
Pasquale De Meo, Giovanni Quattrone & Domenico Ursino
Dipartimento di Matematica, Università della Calabria, Via Pietro Bucci, 87036, Rende, CS, Italy
Giorgio Terracina

Authors

Pasquale De Meo
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Quattrone
View author publications
You can also search for this author in PubMed Google Scholar
Giorgio Terracina
View author publications
You can also search for this author in PubMed Google Scholar
Domenico Ursino
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Vrije Universiteit Brussel (VUB), STARLab, Bldg G/10, Pleinlaan 2, 1050, Brussels, Belgium
Robert Meersman
School of Computer Science and Information Technology, RMIT University, Bld 10.10, 376-392 Swanston Street, VIC 3001, Melbourne, Australia
Zahir Tari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

De Meo, P., Quattrone, G., Terracina, G., Ursino, D. (2004). Deriving Sub-schema Similarities from Semantically Heterogeneous XML Sources. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2004: CoopIS, DOA, and ODBASE. OTM 2004. Lecture Notes in Computer Science, vol 3290. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30468-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-30468-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23663-4
Online ISBN: 978-3-540-30468-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics