Skip to main content

Normalization Theory for XML

  • Conference paper
Database and XMLTechnologies (XSym 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4704))

Included in the following conference series:

Abstract

Specifications of XML documents typically consist of typing information (e.g., a DTD), and integrity constraints. Just like relational schema specifications, not all are good – some are prone to redundancies and update anomalies. In the relational world we have a well-developed theory of data design (also known as normalization). A few definitions of XML normal forms have been proposed, but the main question is why a particular design is good. In the XML world, we still lack universally accepted query languages such as relational algebra, or update languages that let us reason about storage redundancies, lossless decompositions, and update anomalies. A better approach, therefore, is to come up with notions of good design based on the intrinsic properties of the model itself. We present such an approach, based on Shannon’s information theory, and show how it applies to relational normal forms as well as to XML design, for both native and relational storage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arenas, M., Libkin, L.: A normal form for XML documents. ACM TODS 29, 195–232 (2004) Extended abstract in PODS 2002

    Google Scholar 

  2. Arenas, M., Libkin, L.: An information-theoretic approach to normal forms for relational and XML data. J. ACM 52(2), 246–283 (2005) Extended abstract in PODS 2003

    Google Scholar 

  3. Bernstein, P.A.: Synthesizing third normal form relations from functional dependencies. ACM TODS 1(4), 277–298 (1976)

    Article  Google Scholar 

  4. Buneman, P., Davidson, S., Fan, W., Hara, C., Tan, W.C.: Keys for XML. In: WWW 2001, pp. 201–210 (2001)

    Google Scholar 

  5. Codd, E.F.: Further normalization of the data base relational model. IBM Research Report (1971)

    Google Scholar 

  6. Codd, E.F.: Recent Investigations in Relational Data Base Systems. IFIP Congress 1974, pp. 1017–1021 (1974)

    Google Scholar 

  7. Dalkilic, M., Robertson, E.: Information dependencies. In: PODS 2000, pp. 245–253 (2000)

    Google Scholar 

  8. Embley, D.W., Mok, W.Y.: Developing XML documents with guaranteed “good” properties. In: Kunii, H.S., Jajodia, S., Sølvberg, A. (eds.) ER 2001. LNCS, vol. 2224, pp. 426–441. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  9. Fagin, R.: Multivalued dependencies and a new normal form for relational databases. ACM TODS 2(3), 262–278 (1977)

    Article  Google Scholar 

  10. Fan, W., Libkin, L.: On XML integrity constraints in the presence of DTDs. J. ACM 49(3), 368–406 (2002)

    Article  MathSciNet  Google Scholar 

  11. Kolahi, S.: Dependency-preserving normalization of relational and XML data. J. Comput. Syst. Sci. 73(4), 636–647 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  12. Kolahi, S., Libkin, L.: On redundancy vs dependency preservation in normalization: an information-theoretic study of 3NF. In: PODS 2006, pp. 114–123 (2006)

    Google Scholar 

  13. Kolahi, S., Libkin, L.: XML design for relational storage. In: WWW 2007, pp 1083–1092 (2007)

    Google Scholar 

  14. Lee, T.T.: An information-theoretic analysis of relational databases - Part I: Data dependencies and information metric. IEEE Trans. on Software Engineering 13(10), 1049–1061 (1987)

    Article  Google Scholar 

  15. Oracle’s General Database Design FAQ. http://www.orafaq.com/faqdesgn.htm

  16. Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: Limitations and opportunities. In: VLDB, pp. 302–314 (1999)

    Google Scholar 

  17. Vincent, M., Liu, J.: Multivalued dependencies and a 4NF for XML. In: Eder, J., Missikoff, M. (eds.) CAiSE 2003. LNCS, vol. 2681, pp. 14–29. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  18. Vincent, M., Liu, J., Liu, C.: Strong functional dependencies and their application to normal forms in XML. ACM TODS 29(3), 445–462 (2004)

    Article  Google Scholar 

  19. Wang, J., Topor, R.: Removing XML data redundancies using functional and equality-generating dependencies. In: ADC 2005, pp. 65–74 (2005)

    Google Scholar 

  20. Zaniolo, C.: A new normal form for the design of relational database schemata. ACM TODS 7, 489–499 (1982)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Denilson Barbosa Angela Bonifati Zohra Bellahsène Ela Hunt Rainer Unland

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Libkin, L. (2007). Normalization Theory for XML. In: Barbosa, D., Bonifati, A., Bellahsène, Z., Hunt, E., Unland, R. (eds) Database and XMLTechnologies. XSym 2007. Lecture Notes in Computer Science, vol 4704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75288-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75288-2_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75287-5

  • Online ISBN: 978-3-540-75288-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics