Skip to main content

Advertisement

Log in

Reasoning about integrity constraints for tree-structured data

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

We study a class of integrity constraints for tree-structured data modelled as data trees, whose nodes have a label from a finite alphabet and store a data value from an infinite data domain. The constraints require each tuple of nodes selected by a conjunctive query (using navigational axes and labels) to satisfy a positive combination of equalities and a positive combination of inequalities over the stored data values. Such constraints are instances of the general framework of XML-to-relational constraints proposed recently by Niewerth and Schwentick. They cover some common classes of constraints, including W3C XML Schema key and unique constraints, as well as domain restrictions and denial constraints, but cannot express inclusion constraints, such as reference keys. Our main result is that consistency of such integrity constraints with respect to a given schema (modelled as a tree automaton) is decidable. An easy extension gives decidability for the entailment problem. Equivalently, we show that validity and containment of unions of conjunctive queries using navigational axes, labels, data equalities and inequalities is decidable, as long as none of the conjunctive queries uses both equalities and inequalities; without this restriction, both problems are known to be undecidable. In the context of XML data exchange, our result can be used to establish decidability for a consistency problem for XML schema mappings. All the decision procedures are doubly exponential, with matching lower bounds. The complexity may be lowered to singly exponential, when conjunctive queries are replaced by tree patterns, and the number of data comparisons is bounded.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Provided by Michał Pilipczuk, during the Warsaw Automata Group’s research camp Autob ó z 2015.

References

  1. Arenas, M., Barceló, P., Libkin, L., Murlak, F.: Foundations of data exchange. Cambridge University Press (2014)

  2. Arenas, M., Fan, W., Libkin, L.: On the complexity of verifying consistency of XML specifications. SIAM J. Comput. 38(3), 841–880 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  3. Arenas, M., Libkin, L.: A normal form for XML documents. ACM Trans. Database Syst. 29, 195–232 (2004)

    Article  Google Scholar 

  4. Arenas, M., Libkin, L.: XML data exchange: Consistency and query answering. J. ACM, 55(2) (2008)

  5. Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. J. ACM, 55(2) (2008)

  6. Björklund, H., Martens, W., Schwentick, T.: Conjunctive query containment over trees using schema information. Acta Informatica, 1–40 (2016)

  7. Bojańczyk, M., Murlak, F., Witkowski, A.: Containment of monadic datalog programs via bounded clique-width. In: Proceedings of the ICALP 2015, pp. 427–439 (2015)

  8. Bojańczyk, M., Muscholl, A., Schwentick, T., Segoufin, L.: Two-variable logic on data trees and XML reasoning. J. ACM 56(3), 13:1–13:48 (2009)

    MathSciNet  MATH  Google Scholar 

  9. Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational data bases. In: Proceedings of the STOC 1977, pp. 77–90 (1977)

  10. Courcelle, B., Makowsky, J.A., Rotics, U.: Linear time solvable optimization problems on graphs of bounded clique-width. Theory Comput. Syst. 33(2), 125–150 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  11. Courcelle, B., Olariu, S.: Upper bounds to the clique width of graphs. Discret. Appl. Math. 101(1-3), 77–114 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  12. David, C., Gheerbrant, A., Libkin, L., Martens, W.: Containment of pattern-based queries over data trees. In: Proceedings of the ICDT 2013, pp. 201–212 (2013)

  13. David, C., Hofman, P., Murlak, F., Pilipczuk, M.: Synthesizing transformations from XML schema mappings. In: Proceedings of the ICDT 2014, pp. 61–71 (2014)

  14. Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  15. Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Composing schema mappings: Second-order dependencies to the rescue. ACM Trans. Database Syst. 30(4), 994–1055 (2005)

    Article  Google Scholar 

  16. Fagin, R., Vardi, M.Y.: The theory of data dependencies - a survey. In: Mathematics of information processing, volume 34 of proceedings of symposia in applied mathematics, pp. 19–71. American Mathematical Society, Providence, Rhode Island (1986)

  17. Figueira, D.: Alternating register automata on finite words and trees. Logical Methods Comput. Sci. 8(1), 2012

  18. Gao, S., Sperberg-McQueen, C.M., Thompson, H.S., Mendelsohn, N., Beech, D., Maloney, M.: W3C XML Schema Definition Language (XSD) 1.1, Part 1: Structures. Technical report, World Wide Web Consortium, April (2009)

  19. Gogacz, T., Marcinkowski, J.: All-instances termination of chase is undecidable. In: Proceedings of the ICALP 2014, pp. 293–304 (2014)

  20. Gogacz, T., Marcinkowski, J.: Red spider meets a rainworm: query finite determinacy is undecidable. In: Proceedings of the PODS 2016, pp. 121–134 (2016)

  21. Hartmann, S., Link, S.: More functional dependencies for XML. In: Proceedings of the ADBIS 2003, pp. 355–369 (2003)

  22. Hartmann, S., Link, S., Trinh, T.: Solving the implication problem for XML, functional dependencies with properties. In: Proceedings of the WoLLIC 2010, pp. 161–175 (2010)

  23. Jurdzinski, M., Lazic, R.: Alternating automata on data trees and XPath satisfiability. ACM Trans. Comput. Logic 12(3), 19:1–19:21 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  24. Lenzerini, M.: Data integration: A theoretical perspective. In: Proceedings of the PODS 2002, pp. 233–246 (2002)

  25. Miklau, G., Suciu, D: Containment and equivalence for a fragment of XPath. J. ACM, 51(1), 2–45 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  26. Neven, F., Schwentick, T.: On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Log. Meth. Comput. Sci., 2(3) (2006)

  27. Niewerth, M., Schwentick, T.: Reasoning XML constraints based on XML-to-relational mappings. In: Proceedings of the ICDT 2014, pp. 72–83 (2014)

  28. Vardi, M.Y.: Fundamentals of dependency theory. In: Borger, E. (ed.) Trends in theoretical computer science, pp. 171–224. Computer Science Press (1987)

Download references

Acknowledgements

We thank the anonymous referees of ICDT 2016 and TOCS for their insightful questions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Filip Murlak.

Additional information

This article is part of the Topical Collection on Special Issue on Database Theory

The First, third, and fourth author of this paper were supported by Poland’s National Science Centre grant 2013/11/D/ST6/03075.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Czerwiński, W., David, C., Murlak, F. et al. Reasoning about integrity constraints for tree-structured data. Theory Comput Syst 62, 941–976 (2018). https://doi.org/10.1007/s00224-017-9771-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-017-9771-z

Keywords

Navigation