On the Difficulty of Finding Optimal Relational Decompositions for XML Workloads: A Complexity Theoretic Perspective
A key problem that arises in the context of storing XML documents in relational databases is that of finding an optimal relational decomposition for a given set of XML documents and a given set of XML queries over those documents. While there have been a number of ad hoc solutions proposed for this problem, to our knowledge this paper represents a first step toward formalizing the problem and studying its complexity. It turns out that to even define what one means by an optimal decomposition, one first needs to specify an algorithm to translate XML queries to relational queries, and a cost model to evaluate the quality of the resulting relational queries. By examining an interesting problem embedded in choosing a relational decomposition, we show that choices of different translation algorithms and cost models result in very different complexities for the resulting optimization problems. Our results suggest that, contrary to the trend in previous work, the eventual development of practical algorithms for finding relational decompositions for XML workloads will require judicious choices of cost models and translation algorithms, rather than an exclusive focus on the decomposition problem in isolation.
KeywordsCost Model Vertex Cover Grouping Problem Relational Query Cost Metrics
Unable to display preview. Download preview PDF.
- 1.P. Alimonti and V. Kann. Hardness of approximating problems on cubic graphs. In Proc. 3rd Italian Conf. on Algorithms and Complexity, Lecture Notes in Computer Science, 1203, pages 288–298. Springer-Verlag, 1997.Google Scholar
- 3.P. Bohannon, J. Freire, P. Roy, and J. Simeon. From xml schema to relations: A cost-based approach to xml storage. In ICDE, 2002.Google Scholar
- 4.A. Deutsch, M. Fernandez, and D. Suciu. Storing semistructured data with stored. In SIGMOD, pages 431–442, 1999.Google Scholar
- 5.I. Dinur and S. Safra. The importance of being biased. In Proceedings of the thiryfourth annual ACM symposium on Theory of computing, pages 33–42. ACM Press, 2002.Google Scholar
- 6.D. Florescu and D. Kossman. Storing and querying xml data using an rdbms. In Data Engineering Bulletin, volume 22, 1999.Google Scholar
- 7.M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco, 1979.Google Scholar
- 8.C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.Google Scholar
- 10.A. Schmidt, M. Kersten, M. Windhouwer, and F. Waas. Efficient relational storage and retrieval of xml documents. Lecture Notes in Computer Science, 1997, 2001.Google Scholar
- 11.A. R. Schmidt, F. Waas, M. L. Kersten, D. Florescu, I. Manolescu, M. J. Carey, and R. Busse. The XML Benchmark Project. Technical Report INS-R0103, CWI, Amsterdam, The Netherlands, April 2001.Google Scholar
- 12.J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. DeWitt, and J. Naughton. Relational databases for querying xml documents: Limitations and opportunities. In Proceedings of the VLDB Conference, 1999.Google Scholar