HOPI: An Efficient Connection Index for Complex XML Document Collections

Schenkel, Ralf; Theobald, Anja; Weikum, Gerhard

doi:10.1007/978-3-540-24741-8_15

Ralf Schenkel¹¹,
Anja Theobald¹¹ &
Gerhard Weikum¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2992))

Included in the following conference series:

International Conference on Extending Database Technology

2124 Accesses
70 Citations

Abstract

In this paper we present HOPI, a new connection index for XML documents based on the concept of the 2–hop cover of a directed graph introduced by Cohen et al. In contrast to most of the prior work on XML indexing we consider not only paths with child or parent relationships between the nodes, but also provide space– and time–efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in our XXL search engine. We improve the theoretical concept of a 2–hop cover by developing scalable methods for index creation on very large XML data collections with long paths and extensive cross–linkage. Our experiments show substantial savings in the query performance of the HOPI index over previously proposed index structures in combination with low space requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abiteboul, S., et al.: Compact labeling schemes for ancestor queries. In: SODA 2001, pp. 547–556 (2001)
Google Scholar
Alstrup, S., Rauhe, T.: Improved labeling scheme for ancestor queries. In: SODA 2002, pp. 947–953 (2002)
Google Scholar
Bancilhon, F., Ramakrishnan, R.: An amateur’s introduction to recursive query processing strategies. In: SIGMOD 1986, pp. 16–52 (1986)
Google Scholar
Blanken, H., Grabs, T., Schek, H.-J., Schenkel, R., Weikum, G. (eds.): Intelligent Search on XML Data. LNCS, vol. 2818. Springer, Heidelberg (2003)
MATH Google Scholar
Böhme, T., Rahm, E.: Multi-user evaluation of XML data management systems with XMach-1. In: EEXTT 2002, pp. 148–158 (2003)
Google Scholar
Chung, C.-W., Min, J.-K., Shim, K.: APEX: An adaptive path index for XML data. In: SIGMOD 2002, pp. 121–132 (2002)
Google Scholar
Ciarlet Jr, P., Lamour, F.: On the validity of a front oriented approach to partitioning lage sparse graphs with a connectivity constraint. Numerical Algorithms 12(1,2), 193–214 (1996)
Article MATH MathSciNet Google Scholar
Cohen, E., et al.: Labeling dynamic XML trees. In: PODS 2002, pp. 271–281 (2002)
Google Scholar
Cohen, E., et al.: Reachability and distance queries via 2-hop labels. In: SODA 2002, pp. 937–946 (2002)
Google Scholar
Cooper, B., et al.: A fast index for semistructured data. In: VLDB 2001, pp. 341–350 (2001)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 1st edn. MIT Press, Cambridge (1990)
MATH Google Scholar
DeRose, S., et al.: XML linking language (XLink), version 1.0. W3C recommendation (2001)
Google Scholar
Farhat, C.: A simple and efficient automatic FEM domain decomposer. Computers and Structures 28(5), 579–602 (1988)
Article Google Scholar
Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: VLDB 1997, pp. 436–445 (1997)
Google Scholar
Grust, T.: Accelerating XPath location steps. In: SIGMOD 2002, pp. 109–120 (2002)
Google Scholar
Grust, T., van Keulen, M.: Tree awareness for relational DBMS kernels: Staircase join. In: Blanken et al. [4]
Google Scholar
Kaplan, H., et al.: A comparison of labeling schemes for ancestor queries. In: SODA 2002, pp. 954–963 (2002)
Google Scholar
Kaplan, H., Milo, T.: Short and simple labels for small distances and other functions. In: Dehne, F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001. LNCS, vol. 2125, pp. 246–257. Springer, Heidelberg (2001)
Chapter Google Scholar
Kaushik, R., et al.: Covering indexes for branching path queries. In: SIGMOD 2002, pp. 133–144 (2002)
Google Scholar
Ley, M.: DBLP XML Records. Downloaded September 1 (2003)
Google Scholar
Milo, T., Suciu, D.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)
Chapter Google Scholar
Qun, C., et al.: D(k)-index: An adaptive structural summary for graph-structured data. In: SIGMOD 2003, pp. 134–144 (2003)
Google Scholar
Schenkel, R., Theobald, A., Weikum, G.: Ontology-enabled XML search. In: Blanken et al. [4]
Google Scholar
Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 477–495. Springer, Heidelberg (2002)
Chapter Google Scholar
Theobald, A., Weikum, G.: The XXL search engine: Ranked retrieval of XML data using indexes and ontologies. In: SIGMOD 2002 (2002)
Google Scholar
Zezula, P., Amato, G., Rabitti, F.: Processing XML queries with tree signatures. In: Blanken et al. [4]
Google Scholar
Zezula, P., et al.: Tree signatures for XML querying and navigation. In: 1st Int. XML Database Symposium, pp. 149–163 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Max Planck Institut für Informatik, Saarbrücken, Germany
Ralf Schenkel, Anja Theobald & Gerhard Weikum

Authors

Ralf Schenkel
View author publications
You can also search for this author in PubMed Google Scholar
Anja Theobald
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Weikum
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Purdue University,
Elisa Bertino
Laboratory of Distributed Multimedia Information Systems and Applications, Technical University of Crete (MUSIC/TUC) Chania, 73100, Crete, Greece
Stavros Christodoulakis
Institute of Computer Science, FO.R.T.H., Vassilika Vouton, P.O. Box 1385, GR 71110, Heraklion, Greece
Dimitris Plexousakis
Department of Computer Science, University of Crete, P.O.Box 2208, GR 71409, Heraklion, Greece
Vassilis Christophides
National and Kapodistrian University of Athens, Greece
Manolis Koubarakis
IPD, Universität Karlsruhe, Am Fasanengarten 5, 76131, Karlsruhe,
Klemens Böhm
Department of Computer Science and Communication, University of Insubria, 22100, Varese, Italy
Elena Ferrari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schenkel, R., Theobald, A., Weikum, G. (2004). HOPI: An Efficient Connection Index for Complex XML Document Collections. In: Bertino, E., et al. Advances in Database Technology - EDBT 2004. EDBT 2004. Lecture Notes in Computer Science, vol 2992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24741-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-24741-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21200-3
Online ISBN: 978-3-540-24741-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics