Discovering Semantics from Data-Centric XML

  • Luochen Li
  • Thuy Ngoc Le
  • Huayu Wu
  • Tok Wang Ling
  • Stéphane Bressan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8055)


In database applications, the availability of a conceptual schema and semantics constitute invaluable leverage for improving the effectiveness, and sometimes the efficiency, of many tasks including query processing, keyword search and schema/data integration. The Object-Relationship-Attribute model for Semi-Structured data (ORA-SS) model is a conceptual model intended to capture the semantics of object classes, object identifiers, relationship types, etc., underlying XML schemas and data. We refer to the set of these semantic concepts as the ORA-semantics. In this work, we present a novel approach to automatically discover the ORA-semantics from data-centric XML. We also empirically and comparatively evaluate the effectiveness of the approach.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with COMA++. In: SIGMOD Conference, pp. 906–908 (2005)Google Scholar
  2. 2.
    Chen, Y.B., Ling, T.W., Lee, M.L.: Designing valid XML views. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 463–477. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Hegewald, J., Naumann, F., Weis, M.: Xstruct: Efficient schema extraction from multiple and large XML documents. In: ICDE Workshops, p. 81 (2006)Google Scholar
  4. 4.
    Kalashnikov, D.V., Mehrotra, S.: Domain-independent data cleaning via analysis of entity-relationship graph. ACM Trans. Database Syst. 31(2), 716–767 (2006)CrossRefGoogle Scholar
  5. 5.
    Li, L., Le, T.N., Wu, H., Ling, T.W., Bressan, S.: Discovering semantics from data-centric XML. Technical Report TRA6/13, National University of SingaporeGoogle Scholar
  6. 6.
    Ling, T.W., Lee, M.L., Dobbie, G.: Semistructured database design (2005)Google Scholar
  7. 7.
    Liu, Z., Chen, Y.: Identifying meaningful return information for XML keyword search. In: SIGMOD Conference, pp. 329–340 (2007)Google Scholar
  8. 8.
    Mfourga, N.: Extracting entity-relationship schemas from relational databases: A form-driven approach. In: WCRE, pp. 184–193 (1997)Google Scholar
  9. 9.
    Mizuta, S., Hanya, K.: Specifications of word set in linguistic approach for similarity estimation. In: BICoB, pp. 25–29 (2010)Google Scholar
  10. 10.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)Google Scholar
  11. 11.
    Y.S.: A personal perspective on keyword search over data graphs. In: ICDT (2013)Google Scholar
  12. 12.
    Xu, Y., Papakonstantinou, Y.: Efficient lca based keyword search in XML data. In: EDBT, pp. 535–546 (2008)Google Scholar
  13. 13.
    Yu, C., Jagadish, H.V.: XML schema refinement through redundancy detection and normalization. VLDB J. 17(2), 203–223 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Luochen Li
    • 1
  • Thuy Ngoc Le
    • 1
  • Huayu Wu
    • 2
  • Tok Wang Ling
    • 1
  • Stéphane Bressan
    • 1
  1. 1.School of ComputingNational University of SingaporeSingapore
  2. 2.Institute for Infocomm ResearchSingapore

Personalised recommendations