Clustered Chain Path Index for XML Document: Efficiently Processing Branch Queries

Wang, Hongqiang; Li, Jianzhong; Wang, Hongzhi

doi:10.1007/11912873_49

Clustered Chain Path Index for XML Document: Efficiently Processing Branch Queries

Hongqiang Wang²¹,
Jianzhong Li²¹ &
Hongzhi Wang²¹

Conference paper

617 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4255))

Abstract

Branch query processing is a core operation of XML query processing. In recent years, a number of stack based twig join algorithms have been proposed to process twig queries based on tag stream index. However, each element is labeled separately in tag stream index, similarity of same structured elements is ignored; besides, algorithms based on tag stream index perform worse on large document. In this paper, we propose a novel index Clustered Chain Path Index (CCPI for brief) based on a novel labeling scheme: Clustered Chain Path labeling. The index provides good properties for efficiently processing branch queries. It also has the same cardinality as 1-index against tree structured XML document. Based on CCPI, we design efficient algorithms KMP-Match-Path to process queries without branches and Related-Path-Segment-Join to process queries with branches. Experimental results show that proposed query processing algorithms based on CCPI outperform other algorithms and have good scalability.

This paper is partially supported by Natural Science Foundation of Heilongjiang Province, Grant No. zjg03-05 and National Natural Science Foundation of China, Grant No. 60473075 and Key Program of the National Natural Science Foundation of China, Grant No. 60533110.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

XML Path Language (XPath) 2.0, http://www.w3.org/TR/xpath20/
XQuery 1.0: An XML query language, http://www.w3.org/TR/xquery/
Bruno, N., Srivastava, D., Koudas, N.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD Conference, pp. 310–321 (2002)
Google Scholar
Jiang, H., et al.: Holistic twig joins on indexed XML documents. In: Proc. of VLDB, pp. 273–284 (2003)
Google Scholar
Lu, J.H., Chen, T., Ling, T.W.: Efficient processing of XML twig patterns with parent child edges: a look-ahead approach. In: Proceedings of CIKM Conference 2004, pp. 533–542 (2004)
Google Scholar
Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proc. of VLDB, pp. 361–370 (2001)
Google Scholar
Milo, T., Dan Suciu, D.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)
Chapter Google Scholar
Miklau, G., Suciu, D.: Containment and equivalence for an XPath fragment. In: PODS, pp. 65–76 (2002)
Google Scholar
Lu, J., Ling, T.W., Chan, C.Y., Chen, T.: From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching. In: Proc. of VLDB, pp. 193–204 (2003)
Google Scholar
Chen, Y., Davidson, S.B., Zheng, Y.: BLAS: An efficient XPath processing system. In: Proc. of SIGMOD, pp. 47–58 (2004)
Google Scholar
Jiang, H., Wang, W., Lu, H., Yu, J.X.: Holistic twig joins on indexed XML documents. In: Proceeding of VLDB 2003, pp. 273–284 (2003)
Google Scholar
Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for efficient indexing of paths in graph structured data. In: ICDE 2002 (2002)
Google Scholar
Qun, C., Lim, A., Ong, K.W.: D(k)-index: An adaptive structural summary for graph-structured data. In: ACM SIGMOD, pp. 134–144 (2003)
Google Scholar
He, H., Yang, J.: Multi resolution indexing of XML for frequent queries. In: ICDE 2004 (2004)
Google Scholar
Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: SIGMOD 2002 (2002)
Google Scholar
XMark: The XML-benchmark project, http://monetdb.cwi.nl/xml
Zhang, N., Kacholia, V., Özsu, M.T.: A succinct physical storage scheme for efficient evaluation of path queries in XML. In: ICDE 2004, pp. 54–65 (2004)
Google Scholar
U. of Washington XML Repository, http://www.cs.washington.edu/research/xmldatasets/
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press, Cambridge (2001)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001
Hongqiang Wang, Jianzhong Li & Hongzhi Wang

Authors

Hongqiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland
Karl Aberer
State Key Lab of Software Engineering, Wuhan University, 430072, wuhan, China
Zhiyong Peng
Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA
Elke A. Rundensteiner
Victoria University, Australia
Yanchun Zhang
Wuhan University, China
Xuhui Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Li, J., Wang, H. (2006). Clustered Chain Path Index for XML Document: Efficiently Processing Branch Queries. In: Aberer, K., Peng, Z., Rundensteiner, E.A., Zhang, Y., Li, X. (eds) Web Information Systems – WISE 2006. WISE 2006. Lecture Notes in Computer Science, vol 4255. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11912873_49

Download citation

DOI: https://doi.org/10.1007/11912873_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-48105-8
Online ISBN: 978-3-540-48107-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics