Abstract
Despite a large body of work on XPath query processing in relational environment, systematic study of queries containing not-predicates have received little attention in the literature. Particularly, several xml supports of industrial-strength commercial rdbms fail to efficiently evaluate such queries. In this paper, we present an efficient and novel strategy to evaluate not -twig queries in a tree-unaware relational environment. not -twig queries are XPath queries with ancestor–descendant and parent–child axis and contain one or more not-predicates. We propose a novel Dewey-based encoding scheme called Andes (ANcestor Dewey-based Encoding Scheme), which enables us to efficiently filter out elements satisfying a not-predicate by comparing their ancestor group identifiers. In this approach, a set of elements under the same common ancestor at a specific level in the xml tree is assigned same ancestor group identifier. Based on this scheme, we propose a novel sql translation algorithm for not-twig query evaluation. Experiments carried out confirm that our proposed approach built on top of an off-the-shelf commercial rdbms significantly outperforms state-of-the-art relational and native approaches. We also explore the query plans selected by a commercial relational optimizer to evaluate our translated queries in different input cardinality. Such exploration further validates the performance benefits of Andes.
Similar content being viewed by others
References
Al-Khalifa, A., Jagadish, H.V.: Multi-level operator combination in XML query processing. In: ACM CIKM (2002)
Bamford, R., Vinayak et al.: XQuery reloaded. In: PVLDB (2009)
Bhowmick, S.S., Leonardi, E., Sun, H.: Efficient evaluation of high-selective xml twig patterns with parent child edges in tree-unaware RDBMS. In: ACM CIKM (2007)
Boncz, P., Grust, T. et al.: MonetDB/XQuery: a fast XQuery processor powered by a relational engine. In: Proceedings of the 2006 ACM SIGMOD international conference on management of data. ACM, New York (2006)
Boncz, P., Kersten, M.L.: MIL primitives for querying a fragmented world. VLDB J. 8(2) (1999)
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD (2002)
Franceschet, M.: XPathMark: an XPath benchmark for the XMark generated data. In XSym (2005)
Garakani, V., Izadi, S.K., Haghjoo, M., Harizi, M.: NTJFsat¬: A novel method for query with not-predicates on XML data. In: CIKM (2007)
Georgiadis, H., Vassalos, V.: Xpath on steroids: exploiting relational engines for Xpath performance. In: SIGMOD (2007)
Georgiadis, H., et al.: Cost-based plan selection for XPath. In: SIGMOD (2009)
Gou, G., Chirkova, R.: Efficiently querying large xml data repositories: a survey. IEEE TKDE 19(10) (2007)
Grust, T., Rittinger, J., Teubner, J.: Why off-the-shelf RDBMSs are better at XPath than you might expect. In: SIGMOD (2007)
Grust, T., van Keulen, M., Teubner, J.: Staircase join: teaching a relational DBMS to watch its (axis) steps. In VLDB (2003)
Jiao, E., Ling, T.-W., Chan, C.-Y.: PathStack : a holistic path join algorithm for path query with not-predicates on XML data. In: DASFAA (2005)
Li, H., Lee, M.-L., Hsu, W.: A path-based labeling scheme for efficient structural join. In: XSym (2005)
Li, H., Lee, M.-L., Hsu, W., Li, L.: A path-based approach for efficient structural join with not-predicates. In DASFAA (2007)
Li C., Ling T.W., Hu M.: Efficient updates in dynamic XML data: from binary string to quaternary string. VLDB J. 17: 573–601, (2008)
Lu, J., Ling, T.W., et al.: From region encoding to extended Dewey: on efficient processing of XML twig pattern matching. In: VLDB (2005)
Mayer, S., Grust, T. et al.: An injection with tree awareness: adding staircase join to PostgreSQL. In VLDB (2004)
O’Neal, P., O’Neal, E., Pal, S., et al.: ORDPATHs: insert-friendly XML node labels. In: SIGMOD (2004)
Pooja, H.D., Darera, N., Haritsa, J.R.: Identifying robust plans through plan diagram reduction. In: VLDB (2008)
Reddy, N., Haritsa, J.R.: Analyzing plan diagrams of database query optimizers. In: VLDB (2005)
Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: a benchmark for XML data management. In VLDB (2002)
Seah, B.-S., Widjanarko, K.G., Bhowmick, S.S., et al.: Efficient support for ordered XPath processing in tree-unaware commercial relational databases. In: DASFAA (2007)
Shanmugasundaram, J., Tufte, K., et al.: Relational databases for querying xml documents: limitations and opportunities. In VLDB (1999)
Soh, K.H., Bhowmick, S.S.: Efficient evaluation of not-twig queries in A tree-unaware RDBMS. In: DASFAA (2011)
Stonebraker, M., Abadi, D., et al.: C-store: a column-oriented DBMS. In: VLDB (2005)
Tatarinov, I., Viglas, S., et al.: Storing and querying ordered xml using a relational database system. In: SIGMOD (2002)
ToXGene—the ToX XML data generator. http://www.cs.toronto.edu/tox/toxgene/
Wu, X., Lee, M.L., Hsu, W.: A prime number labeling scheme for dynamic ordered XML trees. In: ICDE (2004)
Xu, L., Ling, T.W., Wu, H., Bao, Z.: DDE: from Dewey to a fully dynamic XML labeling scheme. In: SIGMOD (2009)
Yoshikawa, M., et al.: XRel: a path-based approach to storage and retrieval of xml documents using relational databases. ACM TOIT 1(1) (2001)
Yao, B., Özsu, M.T., Khandelwal, N.: XBench: benchmark and performance testing of XML DBMSs. In ICDE (2004)
Yu, T., Ling, T.-W., Lu, J.: TwigStackList¬: a holistic twig join algorithm for twig query with not-predicates on XML data. In: DASFAA (2006)
Zhang, C., Naughton, J., DeWitt, D., Luo, Q., Lohman, G.: On supporting containment queries in relational database management systems. In: SIGMOD (2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Soh, K.H., Truong, B.Q. & Bhowmick, S.S. ANDES: efficient evaluation of NOT-twig queries in relational databases. The VLDB Journal 21, 889–914 (2012). https://doi.org/10.1007/s00778-012-0275-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-012-0275-9