Skip to main content

HOPI: An Efficient Connection Index for Complex XML Document Collections

  • Conference paper
Advances in Database Technology - EDBT 2004 (EDBT 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2992))

Included in the following conference series:

Abstract

In this paper we present HOPI, a new connection index for XML documents based on the concept of the 2–hop cover of a directed graph introduced by Cohen et al. In contrast to most of the prior work on XML indexing we consider not only paths with child or parent relationships between the nodes, but also provide space– and time–efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in our XXL search engine. We improve the theoretical concept of a 2–hop cover by developing scalable methods for index creation on very large XML data collections with long paths and extensive cross–linkage. Our experiments show substantial savings in the query performance of the HOPI index over previously proposed index structures in combination with low space requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., et al.: Compact labeling schemes for ancestor queries. In: SODA 2001, pp. 547–556 (2001)

    Google Scholar 

  2. Alstrup, S., Rauhe, T.: Improved labeling scheme for ancestor queries. In: SODA 2002, pp. 947–953 (2002)

    Google Scholar 

  3. Bancilhon, F., Ramakrishnan, R.: An amateur’s introduction to recursive query processing strategies. In: SIGMOD 1986, pp. 16–52 (1986)

    Google Scholar 

  4. Blanken, H., Grabs, T., Schek, H.-J., Schenkel, R., Weikum, G. (eds.): Intelligent Search on XML Data. LNCS, vol. 2818. Springer, Heidelberg (2003)

    MATH  Google Scholar 

  5. Böhme, T., Rahm, E.: Multi-user evaluation of XML data management systems with XMach-1. In: EEXTT 2002, pp. 148–158 (2003)

    Google Scholar 

  6. Chung, C.-W., Min, J.-K., Shim, K.: APEX: An adaptive path index for XML data. In: SIGMOD 2002, pp. 121–132 (2002)

    Google Scholar 

  7. Ciarlet Jr, P., Lamour, F.: On the validity of a front oriented approach to partitioning lage sparse graphs with a connectivity constraint. Numerical Algorithms 12(1,2), 193–214 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  8. Cohen, E., et al.: Labeling dynamic XML trees. In: PODS 2002, pp. 271–281 (2002)

    Google Scholar 

  9. Cohen, E., et al.: Reachability and distance queries via 2-hop labels. In: SODA 2002, pp. 937–946 (2002)

    Google Scholar 

  10. Cooper, B., et al.: A fast index for semistructured data. In: VLDB 2001, pp. 341–350 (2001)

    Google Scholar 

  11. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 1st edn. MIT Press, Cambridge (1990)

    MATH  Google Scholar 

  12. DeRose, S., et al.: XML linking language (XLink), version 1.0. W3C recommendation (2001)

    Google Scholar 

  13. Farhat, C.: A simple and efficient automatic FEM domain decomposer. Computers and Structures 28(5), 579–602 (1988)

    Article  Google Scholar 

  14. Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: VLDB 1997, pp. 436–445 (1997)

    Google Scholar 

  15. Grust, T.: Accelerating XPath location steps. In: SIGMOD 2002, pp. 109–120 (2002)

    Google Scholar 

  16. Grust, T., van Keulen, M.: Tree awareness for relational DBMS kernels: Staircase join. In: Blanken et al. [4]

    Google Scholar 

  17. Kaplan, H., et al.: A comparison of labeling schemes for ancestor queries. In: SODA 2002, pp. 954–963 (2002)

    Google Scholar 

  18. Kaplan, H., Milo, T.: Short and simple labels for small distances and other functions. In: Dehne, F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001. LNCS, vol. 2125, pp. 246–257. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  19. Kaushik, R., et al.: Covering indexes for branching path queries. In: SIGMOD 2002, pp. 133–144 (2002)

    Google Scholar 

  20. Ley, M.: DBLP XML Records. Downloaded September 1 (2003)

    Google Scholar 

  21. Milo, T., Suciu, D.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  22. Qun, C., et al.: D(k)-index: An adaptive structural summary for graph-structured data. In: SIGMOD 2003, pp. 134–144 (2003)

    Google Scholar 

  23. Schenkel, R., Theobald, A., Weikum, G.: Ontology-enabled XML search. In: Blanken et al. [4]

    Google Scholar 

  24. Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 477–495. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  25. Theobald, A., Weikum, G.: The XXL search engine: Ranked retrieval of XML data using indexes and ontologies. In: SIGMOD 2002 (2002)

    Google Scholar 

  26. Zezula, P., Amato, G., Rabitti, F.: Processing XML queries with tree signatures. In: Blanken et al. [4]

    Google Scholar 

  27. Zezula, P., et al.: Tree signatures for XML querying and navigation. In: 1st Int. XML Database Symposium, pp. 149–163 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schenkel, R., Theobald, A., Weikum, G. (2004). HOPI: An Efficient Connection Index for Complex XML Document Collections. In: Bertino, E., et al. Advances in Database Technology - EDBT 2004. EDBT 2004. Lecture Notes in Computer Science, vol 2992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24741-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24741-8_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21200-3

  • Online ISBN: 978-3-540-24741-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics