Designing Good Semi-structured Databases

  • Sin Yeung Lee
  • Mong Li Lee
  • Tok Wang Ling
  • Leonid A. Kalinichenko
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1728)


Semi-structured data has become prevalent with the growth of the Internet and other on-line information repositories. Many organizational databases are presented on the web as semi-structured data. Designing a “good” semi-structured database is increasingly crucial to prevent data redundancy, inconsistency and updating anomalies. In this paper, we define a semi-structured schema graph and identify the various anomalies that may occur in the graph. A normal form for semi-structured schema graph, S3-NF, is proposed. We present two approaches to design S3-NF database, namely, restructuring by decomposition and the ER approach. The first approach consists of a set of rules to decompose a semi-structured schema graph into S3-NF. The second approach uses the ER model to remove anomalies at the semantic level.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    S. Abiteboul, D. Quass, J. Widom, and J. Wiener. The lorel query language for semistructured data. International Journal on Digital Libraries, 1(1), 1997.Google Scholar
  2. 2.
    P.A. Bernstein. Synthesizing third normal form relations form functional dependencies. ACM Transactions on Database Systems, 4(1):277–298, 1976.CrossRefGoogle Scholar
  3. 3.
    T. Bray, J. Paoli, and C. Sperberg-McQueen. Extensible markup language (xml) 1.0. W3C Recommendation available at, 1998.
  4. 4.
    P. Buneman, S. Davidson, M. Fernandez, and D. Suciu. Adding structure to semistructured data. In Int. Conference on Database Theory, 1997.Google Scholar
  5. 5.
    P. Buneman, S. Davidson, G. Hillebrand, and D. Suciu. A query language and optimization technique for unstructured data. In Proc. ACM SIGMOD, 1996.Google Scholar
  6. 6.
    E.F. Codd. Further normalization of the database relational model. Database Systems, edited by Randell Rustin, 1972.Google Scholar
  7. 7.
    M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A query language for a web-site management system. SIGMOD Record, 26(3), 1997.Google Scholar
  8. 8.
    R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In Proc. of the 23rd VLDB, 1997.Google Scholar
  9. 9.
    T.W. Ling. A normal form for entity-relationship diagrams. In Proc. of 4th Int. Conference on Entity-Relationship Approach, pages 24–35, 1985.Google Scholar
  10. 10.
    T.W. Ling. A normal form for sets of not-necessarily normalized relations. In Proc. of 22nd Hawaii Int. Conference on Systems Science, pages 578–586, 1989.Google Scholar
  11. 11.
    T.W. Ling, F.W. Tompa, and T. Kameda. An improved third normal form for relational databases. ACM Transactions on Database Systems, 2(6):329–346, 1981.CrossRefMathSciNetGoogle Scholar
  12. 12.
    T.W. Ling and L.L. Yan. Nf-nr: A practical normal form for nested relations. Journal of Systems Integration, 4:309–340, 1994.CrossRefGoogle Scholar
  13. 13.
    D. Maier. Theory of relational databases. Pitman, 1983.Google Scholar
  14. 14.
    A. Makinouchi. A consideration on normal form of not-necessarily normalized relation in the relational data model. In Proc. of 3rd VLDB, 1977.Google Scholar
  15. 15.
    J. McHugh, S. Abiteboul, R. Goldman, and J. Widom. Lore: A database management system for semistructured data. SIGMOD Record, 26(3), 1997.Google Scholar
  16. 16.
    S. Nestorov, J. Ullman, J. Wiener, and S. Chawathe. Objects: Concise representation of semistructured hierarchical data. In Proc. of the 13th Int. Conference on Data Engineering, 1997.Google Scholar
  17. 17.
    Z.M. Ozsoyoglu and L.Y. Yuan. A normal form for nested relations. ACM Transactions on Database Systems, 1(12):111–136, 1987.CrossRefMathSciNetGoogle Scholar
  18. 18.
    Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In IEEE International Conference on Data Engineering, pages 251–260, 1995.Google Scholar
  19. 19.
    M.A. Roth and H.F. Korth. The design of 1nf relational databases into nested normal form. In Proc. of ACM SIGMOD, 1987.Google Scholar
  20. 20.
    J.D. Ullman. Principles of database systems. Computer Science Press, 1983.Google Scholar
  21. 21.
    K. Wang and H.Q. Liu. Schema discovery from semistructured data. In Int. Conference on Knowledge Discovery and Data Mining, 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Sin Yeung Lee
    • 1
  • Mong Li Lee
    • 1
  • Tok Wang Ling
    • 1
  • Leonid A. Kalinichenko
    • 2
  1. 1.School of ComputingNational University of SingaporeSingapore
  2. 2.Russian Academy of SciencesInstitute for Problems of InformaticsRussia

Personalised recommendations