Abstract
Specifications of XML documents typically consist of typing information (e.g., a DTD), and integrity constraints. Just like relational schema specifications, not all are good – some are prone to redundancies and update anomalies. In the relational world we have a well-developed theory of data design (also known as normalization). A few definitions of XML normal forms have been proposed, but the main question is why a particular design is good. In the XML world, we still lack universally accepted query languages such as relational algebra, or update languages that let us reason about storage redundancies, lossless decompositions, and update anomalies. A better approach, therefore, is to come up with notions of good design based on the intrinsic properties of the model itself. We present such an approach, based on Shannon’s information theory, and show how it applies to relational normal forms as well as to XML design, for both native and relational storage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arenas, M., Libkin, L.: A normal form for XML documents. ACM TODS 29, 195–232 (2004) Extended abstract in PODS 2002
Arenas, M., Libkin, L.: An information-theoretic approach to normal forms for relational and XML data. J. ACM 52(2), 246–283 (2005) Extended abstract in PODS 2003
Bernstein, P.A.: Synthesizing third normal form relations from functional dependencies. ACM TODS 1(4), 277–298 (1976)
Buneman, P., Davidson, S., Fan, W., Hara, C., Tan, W.C.: Keys for XML. In: WWW 2001, pp. 201–210 (2001)
Codd, E.F.: Further normalization of the data base relational model. IBM Research Report (1971)
Codd, E.F.: Recent Investigations in Relational Data Base Systems. IFIP Congress 1974, pp. 1017–1021 (1974)
Dalkilic, M., Robertson, E.: Information dependencies. In: PODS 2000, pp. 245–253 (2000)
Embley, D.W., Mok, W.Y.: Developing XML documents with guaranteed “good” properties. In: Kunii, H.S., Jajodia, S., Sølvberg, A. (eds.) ER 2001. LNCS, vol. 2224, pp. 426–441. Springer, Heidelberg (2001)
Fagin, R.: Multivalued dependencies and a new normal form for relational databases. ACM TODS 2(3), 262–278 (1977)
Fan, W., Libkin, L.: On XML integrity constraints in the presence of DTDs. J. ACM 49(3), 368–406 (2002)
Kolahi, S.: Dependency-preserving normalization of relational and XML data. J. Comput. Syst. Sci. 73(4), 636–647 (2007)
Kolahi, S., Libkin, L.: On redundancy vs dependency preservation in normalization: an information-theoretic study of 3NF. In: PODS 2006, pp. 114–123 (2006)
Kolahi, S., Libkin, L.: XML design for relational storage. In: WWW 2007, pp 1083–1092 (2007)
Lee, T.T.: An information-theoretic analysis of relational databases - Part I: Data dependencies and information metric. IEEE Trans. on Software Engineering 13(10), 1049–1061 (1987)
Oracle’s General Database Design FAQ. http://www.orafaq.com/faqdesgn.htm
Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: Limitations and opportunities. In: VLDB, pp. 302–314 (1999)
Vincent, M., Liu, J.: Multivalued dependencies and a 4NF for XML. In: Eder, J., Missikoff, M. (eds.) CAiSE 2003. LNCS, vol. 2681, pp. 14–29. Springer, Heidelberg (2003)
Vincent, M., Liu, J., Liu, C.: Strong functional dependencies and their application to normal forms in XML. ACM TODS 29(3), 445–462 (2004)
Wang, J., Topor, R.: Removing XML data redundancies using functional and equality-generating dependencies. In: ADC 2005, pp. 65–74 (2005)
Zaniolo, C.: A new normal form for the design of relational database schemata. ACM TODS 7, 489–499 (1982)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Libkin, L. (2007). Normalization Theory for XML. In: Barbosa, D., Bonifati, A., Bellahsène, Z., Hunt, E., Unland, R. (eds) Database and XMLTechnologies. XSym 2007. Lecture Notes in Computer Science, vol 4704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75288-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-75288-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75287-5
Online ISBN: 978-3-540-75288-2
eBook Packages: Computer ScienceComputer Science (R0)