Designing Good Semi-structured Databases
Semi-structured data has become prevalent with the growth of the Internet and other on-line information repositories. Many organizational databases are presented on the web as semi-structured data. Designing a “good” semi-structured database is increasingly crucial to prevent data redundancy, inconsistency and updating anomalies. In this paper, we define a semi-structured schema graph and identify the various anomalies that may occur in the graph. A normal form for semi-structured schema graph, S3-NF, is proposed. We present two approaches to design S3-NF database, namely, restructuring by decomposition and the ER approach. The first approach consists of a set of rules to decompose a semi-structured schema graph into S3-NF. The second approach uses the ER model to remove anomalies at the semantic level.
Unable to display preview. Download preview PDF.
- 1.S. Abiteboul, D. Quass, J. Widom, and J. Wiener. The lorel query language for semistructured data. International Journal on Digital Libraries, 1(1), 1997.Google Scholar
- 3.T. Bray, J. Paoli, and C. Sperberg-McQueen. Extensible markup language (xml) 1.0. W3C Recommendation available at http://www.w3.org/TR/1998, 1998.
- 4.P. Buneman, S. Davidson, M. Fernandez, and D. Suciu. Adding structure to semistructured data. In Int. Conference on Database Theory, 1997.Google Scholar
- 5.P. Buneman, S. Davidson, G. Hillebrand, and D. Suciu. A query language and optimization technique for unstructured data. In Proc. ACM SIGMOD, 1996.Google Scholar
- 6.E.F. Codd. Further normalization of the database relational model. Database Systems, edited by Randell Rustin, 1972.Google Scholar
- 7.M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A query language for a web-site management system. SIGMOD Record, 26(3), 1997.Google Scholar
- 8.R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In Proc. of the 23rd VLDB, 1997.Google Scholar
- 9.T.W. Ling. A normal form for entity-relationship diagrams. In Proc. of 4th Int. Conference on Entity-Relationship Approach, pages 24–35, 1985.Google Scholar
- 10.T.W. Ling. A normal form for sets of not-necessarily normalized relations. In Proc. of 22nd Hawaii Int. Conference on Systems Science, pages 578–586, 1989.Google Scholar
- 13.D. Maier. Theory of relational databases. Pitman, 1983.Google Scholar
- 14.A. Makinouchi. A consideration on normal form of not-necessarily normalized relation in the relational data model. In Proc. of 3rd VLDB, 1977.Google Scholar
- 15.J. McHugh, S. Abiteboul, R. Goldman, and J. Widom. Lore: A database management system for semistructured data. SIGMOD Record, 26(3), 1997.Google Scholar
- 16.S. Nestorov, J. Ullman, J. Wiener, and S. Chawathe. Objects: Concise representation of semistructured hierarchical data. In Proc. of the 13th Int. Conference on Data Engineering, 1997.Google Scholar
- 18.Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In IEEE International Conference on Data Engineering, pages 251–260, 1995.Google Scholar
- 19.M.A. Roth and H.F. Korth. The design of 1nf relational databases into nested normal form. In Proc. of ACM SIGMOD, 1987.Google Scholar
- 20.J.D. Ullman. Principles of database systems. Computer Science Press, 1983.Google Scholar
- 21.K. Wang and H.Q. Liu. Schema discovery from semistructured data. In Int. Conference on Knowledge Discovery and Data Mining, 1997.Google Scholar