Abstract
We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that both data and schema be represented as edge-labeled graphs. We develop notions of conformance between a graph database and a graph schema and show that there is a natural and efficiently computable ordering on graph schemas. We then examine certain subclasses of schemas and show that schemas are closed under query applications. Finally, we discuss how they may be used in query decomposition and optimization.
Preview
Unable to display preview. Download preview PDF.
References
Serge Abiteboul. Querying semi-structured data. In ICDT, 1997.
Peter Buneman, Susan Davidson, Mary Fernandez, and Dan Suciu. Adding structure to unstructured data. Technical Report MS-CIS-96-21, University of Pennsylvania, Computer and Information Science Department, 1996.
Peter Buneman, Susan Davidson, Gerd Hillebrand, and Dan Suciu. A query language and optimization techniques for unstructured data. In SIGMOD, 1996.
Peter Buneman, Susan Davidson, Gerd Hillebrand, and Dan Suciu. A query language and optimization techniques for unstructured data. Technical Report 96-09, University of Pennsylvania, Computer and Information Science Department, February 1996.
Peter Buneman, Susan Davidson, and Dan Suciu. Programming constructs for unstructured data. In Proceedings of DBPL'95, Gubbio, Italy, September 1995.
V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl. From structured documents to novel query facilities. In Richard Snodgrass and Marianne Winslett, editors, Proceedings of 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 1994.
V. Christophides, S. Cluet, and G. Moerkotte. Evaluating queries with generalized path expressions. In Proceedings of 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Canada, June 1996.
M. P. Consens and A. O. Mendelzon. Graphlog: A visual formalism for real life recursion. In Proc. ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Sys., Nashville, TN, April 1990.
Carl A. Gunter. Semantics of Programming Languages: Structures and Techniques. Foundations of Computing. MIT Press, 1992.
Monika Henzinger, Thomas Henzinger, and Peter Kopke. Computing simulations on finite and infinite graphs. In Proceedings of 20th Symposium on Foundations of Computer Science, pages 453–462, 1995.
David Konopnicki and Oded Shmueli. Draft of W3QS: a query system for the World-Wide Web. In Proc. of VLDB, 1995.
SuA. Mendelzon, G. Mihaila, and T. Milo. Querying the world wide web. Manuscript, available from http://www.cs.toronto.edu/ georgem/WebSQL.html, 1996.
D. Perrin. Finite automata. In Formal Models and Semantics, volume B of Handbook of Theoretical Computer Science, chapter 1, pages 1–57. Elsevier, Amsterdam, 1990.
Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In IEEE International Conference on Data Engineering, March 1995.
Robert Paige and Robert Tarjan. Three partition refinement algorithms. SIAM Journal of Computing, 16:973–988, 1987.
D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widom. Querying semistructure heterogeneous information. In International Conference on Deductive and Object Oriented Databases, 1995.
Dan Suciu. Query decomposition for unstructured query languages. In VLDB, September 1996.
J. Thierry-Mieg and R. Durbin. Syntactic Definitions for the ACEDB Data Base Manager. Technical Report MRC-LMB xx.92, MRC Laboratory for Molecular Biology, Cambridge,CB2 2QH, UK, 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Buneman, P., Davidson, S., Fernandez, M., Suciu, D. (1996). Adding structure to unstructured data. In: Afrati, F., Kolaitis, P. (eds) Database Theory — ICDT '97. ICDT 1997. Lecture Notes in Computer Science, vol 1186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62222-5_55
Download citation
DOI: https://doi.org/10.1007/3-540-62222-5_55
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62222-2
Online ISBN: 978-3-540-49682-3
eBook Packages: Springer Book Archive