Deriving “Sub-source” Similarities from Heterogeneous, Semi-structured Information Sources

  • Domenico Rosaci
  • Giorgio Terracina
  • Domenico Ursino
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2172)


In this paper we propose a semi-automatic technique for deriving the similarity degree between two portions of heterogeneous, semistructured information sources (hereafter, sub-sources). The proposed technique consists of two phases: the first one selects the most promising pairs of sub-sources, whereas the second one computes the similarity degree relative to each promising pair. In addition, we show that the detection of sub-source similarities is a special case (and a very interesting one, for semi-structured information sources) of the more general problem of Scheme Match. Finally we discuss some possible applications which can benefit of derived sub-source similarities. A real example case is presented for better clarifying the proposed technique.


Information Source Bipartite Graph Similarity Degree Object Class Target Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    P. A. Bernstein and E. Rahm. Data warehouse scenarios for model management. In Proc. of International Conference on Entity-Relationship Modeling (ER’00), pages 1–15, Salt Lake City, Utah, USA, 2000. Lecture Notes in Computer Science, Springer Verlag. 165Google Scholar
  2. 2.
    P. Fankhauser, M. Kracker, and E. J. Neuhold. Semantic vs. structural resemblance of classes. ACM SIGMOD RECORD, 20(4):59–63, 1991. 163CrossRefGoogle Scholar
  3. 3.
    Z. Galil. Efficient algorithms for finding maximum matching in graphs. ACM Computing Surveys, 18:23–38, 1986. 170zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    W. Gotthard, P. C. Lockemann, and A. Neufeld. System-guided view integration for object-oriented databases. IEEE Transactions on Knowledge and Data Engineering, 4(1):1–22, 1992. 163CrossRefGoogle Scholar
  5. 5.
    J. A. Larson, S. B. Navathe, and R. Elmastri. A theory of attribute equivalence in databases with application to schema integration. IEEE Transactions on Software Engineering, 15(4):449–463, 1989. 163zbMATHCrossRefGoogle Scholar
  6. 6.
    T. Milo and S. Zohar. Using schema matching to simplify heterogenous data translations. In Proc. of Conference on Very Large Data Bases (VLDB’98), pages 122–133, New York City, USA, 1998. Morgan Kaufmann. 163Google Scholar
  7. 7.
    L. Palopoli, D. Rosaci, G. Terracina, and D. Ursino. Un modello concettuale per rappresentare e derivare la semantica associata a sorgenti informative strutturate e semi-strutturate. Atti del Congresso sui Sistemi Evoluti per Basi di Dati (SEBD 2001). In Italian. Forthcoming., 2001. 166Google Scholar
  8. 8.
    L. Palopoli, D. Saccà, G. Terracina, and D. Ursino. A unified graph-based framework for deriving nominal interscheme properties, type conflicts and object cluster similarities. In Proc. of Fourth IFCIS Conference on Cooperative Information Systems (CoopIS’99), pages 34–45, Edinburgh, United Kingdom, 1999. IEEE Computer Society. 163Google Scholar
  9. 9.
    L. Palopoli, G. Terracina, and D. Ursino. A graph-based approach for extracting terminological properties of elements of XML documents. In Proc. of International Conference on Data Engineering (ICDE 2001), pages 330–340, Heildeberg, Germany, 2001. IEEE Computer Society. 165, 166, 167, 171Google Scholar
  10. 10.
    E. Rahm and P. A. Bernstein. On mathing schemas automatically. In Technical Report MSR-TR-2001-17, ID=MSR-TR-2001-17, 2001. 163, 165
  11. 11.
    N. Rishe, J. Yuan, R. Athauda, S.-C. Chen, X. Lu, X. Ma, A. Vaschillo, A. Shaposhnikov, and D. Vasilevsky. Semantic access: Semantic interface for querying databases. In Proc. of International Conference on Very Large Data Bases (VLDB 2000), pages 591–594, Il Cairo, Egypt, 2000. Morgan Kaufmann. 165Google Scholar
  12. 12.
    D. Rosaci, G. Terracina, and D. Ursino. An algorithm for obtaining a global representation from information sources having different nature and structure. In Proc. of International Conference on Database and Expert Systems Applications (DEXA 2001), Munich, Germany, 2001. Forthcoming. 165Google Scholar
  13. 13.
    S. Spaccapietra and C. Parent. View integration: A step forward in solving structural conflicts. IEEE Transactions on Knowledge and Data Engineering, 6(2):258–274, 1994. 163CrossRefGoogle Scholar
  14. 14.
    G. Terracina and D. Ursino. Deriving synonymies and homonymies of object classes in semi-structured information sources. In Proc. of International Conference on Management of Data (COMAD 2000), pages 21–32, Pune, India, 2000. McGraw Hill. 164, 165, 166Google Scholar
  15. 15.
    J. A. Wald and P. G. Sorenson. Explaining ambiguity in a formal query language. ACM Transaction on Database Systems, 15(2):125–161, 1990. 165CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Domenico Rosaci
    • 1
  • Giorgio Terracina
    • 1
  • Domenico Ursino
    • 1
  1. 1.Dipartimento di Informatica, Elettronica, Matematica e TrasportiUniversità degli Studi “Mediterranea” di Reggio CalabriaReggio CalabriaItaly

Personalised recommendations