Skip to main content

VERT: A Semantic Approach for Content Search and Content Extraction in XML Query Processing

  • Conference paper
Conceptual Modeling - ER 2007 (ER 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4801))

Included in the following conference series:

Abstract

Processing a twig pattern query in XML document includes structural search and content search. Most existing algorithms only focus on structural search. They treat content nodes the same as element nodes during query processing with structural joins. Due to the high variety of contents, to mix content search and structural search suffers from management problem of contents and low performance. Another disadvantage is to find the actual values asked by a query, they have to rely on the original document. In this paper, we propose a novel algorithm Value Extraction with Relational Table (VERT) to overcome these limitations. The main technique of VERT is introducing relational tables to store document contents instead of treating them as nodes and labeling them. Tables in our algorithm are created based on semantic information of documents. As more semantics is captured, we can further optimize tables and queries to significantly enhance efficiency. Last, we show by experiments that besides solving different content problems, VERT also has superiority in performance of twig pattern query processing compared with existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al-Khalifa, S., Jagadish, H.V., Patel, J.M., Wu, Y., Koudas, N., Srivastava, D.: Structural joins: A primitive for efficient XML query pattern matching. In: Proc. of ICDE (2002)

    Google Scholar 

  2. Berglund, A., Chamberlin, D., Fernandez, M.F., Kay, M., Robie, J., Simeon, J.: XML Path Language (XPath) 2.0. W3C Working Draft (2003)

    Google Scholar 

  3. Boag, S., Chamberlin, D., Fernandez, M.F., Florescu, D., Robie, J., Simeon, J.: XQuery 1.0: An XML Query. W3C Working Draft (2003)

    Google Scholar 

  4. Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: Optimal XML pattern matching. In: Proc. of ACM SIGMOD, ACM Press, New York (2002)

    Google Scholar 

  5. Chen, T., Lu, J., Ling, T.W.: On boosting holism in XML twig pattern matching using structural indexing techniques. In: Proc. of SIGMOD Conference (2005)

    Google Scholar 

  6. Grust, T.: Accelerating XPath location steps. In: Proc. of SIGMOD Conference (2002)

    Google Scholar 

  7. Jiang, H., Lu, H., Wang, W.: Efficient processing of XML twig queries with OR-predicates. In: Proc. of SIGMOD Conference (2004)

    Google Scholar 

  8. Jiang, H., Wang, W., Lu, H., Yu, J.: Holistic twig joins on indexed XML documents. In: Proc. of VLDB Conference (2003)

    Google Scholar 

  9. Lu, J., Chen, T., Ling, T.W.: Efficient processing of XML twig patterns with parent child edges: a look-ahead approach. In: Proc. of CIKM (2004)

    Google Scholar 

  10. Lu, J., Ling, T.W., Chan, C., Chen, T.: From region encoding to extended dewey: On efficient processing of XML twig pattern matching. In: Proc. of VLDB Conference (2005)

    Google Scholar 

  11. Rao, P.R., Moon, B.: PRIX: Indexing and Querying XML Using Prufer Sequences. In: Proc. of ICDE (2004)

    Google Scholar 

  12. Wang, H., Park, S., Fan, W., Yu, P.S.: ViST: A Dynamic index method for querying XML data by tree structures. In: Proc. of SIGMOD Conference (2003)

    Google Scholar 

  13. Yu, T., Ling, T.W., Lu, J.: Twigstacklistnot: A holistic twig join algorithm for twig query with NOT-predicates on XML data. In: Lee, M.L., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, Springer, Heidelberg (2006)

    Google Scholar 

  14. Zhang, C., Naughton, J., Dewitt, D., Luo, Q., Lohman, G.: On supporting containment queries in relational database management systems. In: Proc. of ACM SIGMOD, ACM Press, New York (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Christine Parent Klaus-Dieter Schewe Veda C. Storey Bernhard Thalheim

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, H., Ling, T.W., Chen, B. (2007). VERT: A Semantic Approach for Content Search and Content Extraction in XML Query Processing. In: Parent, C., Schewe, KD., Storey, V.C., Thalheim, B. (eds) Conceptual Modeling - ER 2007. ER 2007. Lecture Notes in Computer Science, vol 4801. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75563-0_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75563-0_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75562-3

  • Online ISBN: 978-3-540-75563-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics