Skip to main content

A structure-based approach to querying semi-structured data

  • Query Languages for New Applications
  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1369))

Abstract

Several researchers have considered integrating multiple unstructured, semi-structured, and structured data sources by modeling all sources as edge labeled graphs. Data in this model is self-describing and dynamically typed, and captures both schema and data information. The labels are arbitrary atomic values, such as strings, integers, reals, etc., and the integrated data graph is stored in a unique data repository, as a relation of edges. The relation is dynamically typed, i.e. each edge label is tagged with its type.

Although the unique, labeled graph repository is flexible, it looses all static type information, and results in severe efficiency penalties compared to querying structured databases, such as relational or object-oriented databases. In this paper we propose an alternative method of storing and querying semi-structured data, using storage schemas, which are closely related to recently introduced graph schemas [BDFS97]. A storage schema splits the graph's edges into several relations, some of which may have labels of known types (such as strings or integers) while others may be still dynamically typed. We show here that all positive queries in UnQL, a query language for semistructured data, can be translated into conjunctive queries against the relations in the storage schema. This result may be surprising, because UnQL is a powerful language, featuring regular path expressions, restructuring queries, joins, and unions. We use this technique in order to translate queries on the integrated, semi-structured data into queries on the external sources. In this setting the integrated semi-structured data is not materialized but virtual and the problem is to translate a query against the integrated view, possibly involving regular path expressions and restructuring, into queries which can be answered by the external sources. Here we use again the storage schema in order to split the graph into relations according to their sources. Any positive UnQL query is decomposed based on these relations and translated into queries on the external sources.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Serge Abiteboul. Querying semi-structured data. In ICDT, 1997.

    Google Scholar 

  2. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison Wesley Publishing Co, 1995.

    Google Scholar 

  3. Serge Abiteboul and Victor Vianu. Queries and computation on the web. In ICDT, pages 262–275, Deplhi, Greece, 1997. Springer Verlag.

    Google Scholar 

  4. Peter Buneman, Susan Davidson, Mary Fernandez, and Dan Suciu. Adding structure to unstructured data. In ICDT, pages 336–350, Deplhi, Greece, 1997. Springer Verlag.

    Google Scholar 

  5. Peter Bunenyan, Susan Davidson, Gerd Hillebrand, and Dan Suciu. A query language, and optimization techniques for unstructured data. In SIGMOD, 1996.

    Google Scholar 

  6. Peter Buneman, Susan Davidson, Gerd Hillebrand, and Dan Suciu. A query language and optimization techniques for unstructured data. Technical Report 96-09, University of Pennsylvania, Computer and Information Science Department, February 1996.

    Google Scholar 

  7. P. Buneman, L. Libkin, D. Suciu, V. Tannery, and L. Wong. Comprehension syntax. SIGMOD Record, 23(1):87–96, March 1994.

    Google Scholar 

  8. M. Fernandez, D. Florescu, J. Kang, A. Levy, and D. Suciu. STRUDEL — a web-site management system. In SIGMOD, Tucson, Arizona, May 1997.

    Google Scholar 

  9. Y. Papakonstantinou, S. Abiteboul, and H. Garcia-Molina. Object fusion in mediator systems. In Proceedings of VLDB, September 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Sophie Cluet Rick Hull

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fernandez, M., Popa, L., Suciu, D. (1998). A structure-based approach to querying semi-structured data. In: Cluet, S., Hull, R. (eds) Database Programming Languages. DBPL 1997. Lecture Notes in Computer Science, vol 1369. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-64823-2_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-64823-2_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64823-9

  • Online ISBN: 978-3-540-68534-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics