Deriving “Sub-source” Similarities from Heterogeneous, Semi-structured Information Sources
In this paper we propose a semi-automatic technique for deriving the similarity degree between two portions of heterogeneous, semistructured information sources (hereafter, sub-sources). The proposed technique consists of two phases: the first one selects the most promising pairs of sub-sources, whereas the second one computes the similarity degree relative to each promising pair. In addition, we show that the detection of sub-source similarities is a special case (and a very interesting one, for semi-structured information sources) of the more general problem of Scheme Match. Finally we discuss some possible applications which can benefit of derived sub-source similarities. A real example case is presented for better clarifying the proposed technique.
KeywordsInformation Source Bipartite Graph Similarity Degree Object Class Target Node
Unable to display preview. Download preview PDF.
- 1.P. A. Bernstein and E. Rahm. Data warehouse scenarios for model management. In Proc. of International Conference on Entity-Relationship Modeling (ER’00), pages 1–15, Salt Lake City, Utah, USA, 2000. Lecture Notes in Computer Science, Springer Verlag. 165Google Scholar
- 6.T. Milo and S. Zohar. Using schema matching to simplify heterogenous data translations. In Proc. of Conference on Very Large Data Bases (VLDB’98), pages 122–133, New York City, USA, 1998. Morgan Kaufmann. 163Google Scholar
- 7.L. Palopoli, D. Rosaci, G. Terracina, and D. Ursino. Un modello concettuale per rappresentare e derivare la semantica associata a sorgenti informative strutturate e semi-strutturate. Atti del Congresso sui Sistemi Evoluti per Basi di Dati (SEBD 2001). In Italian. Forthcoming., 2001. 166Google Scholar
- 8.L. Palopoli, D. Saccà, G. Terracina, and D. Ursino. A unified graph-based framework for deriving nominal interscheme properties, type conflicts and object cluster similarities. In Proc. of Fourth IFCIS Conference on Cooperative Information Systems (CoopIS’99), pages 34–45, Edinburgh, United Kingdom, 1999. IEEE Computer Society. 163Google Scholar
- 9.L. Palopoli, G. Terracina, and D. Ursino. A graph-based approach for extracting terminological properties of elements of XML documents. In Proc. of International Conference on Data Engineering (ICDE 2001), pages 330–340, Heildeberg, Germany, 2001. IEEE Computer Society. 165, 166, 167, 171Google Scholar
- 10.E. Rahm and P. A. Bernstein. On mathing schemas automatically. In Technical Report MSR-TR-2001-17, http://www.research.microsoft.com/scripts/pubs/view.asp?TR ID=MSR-TR-2001-17, 2001. 163, 165
- 11.N. Rishe, J. Yuan, R. Athauda, S.-C. Chen, X. Lu, X. Ma, A. Vaschillo, A. Shaposhnikov, and D. Vasilevsky. Semantic access: Semantic interface for querying databases. In Proc. of International Conference on Very Large Data Bases (VLDB 2000), pages 591–594, Il Cairo, Egypt, 2000. Morgan Kaufmann. 165Google Scholar
- 12.D. Rosaci, G. Terracina, and D. Ursino. An algorithm for obtaining a global representation from information sources having different nature and structure. In Proc. of International Conference on Database and Expert Systems Applications (DEXA 2001), Munich, Germany, 2001. Forthcoming. 165Google Scholar
- 14.G. Terracina and D. Ursino. Deriving synonymies and homonymies of object classes in semi-structured information sources. In Proc. of International Conference on Management of Data (COMAD 2000), pages 21–32, Pune, India, 2000. McGraw Hill. 164, 165, 166Google Scholar