Advertisement

On the Efficient Processing Regular Path Expressions of an Enormous Volume of XML Data

  • Michal Krátký
  • Radim Bača
  • Václav Snášel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4653)

Abstract

XML (Extensible Mark-up Language) has recently been embraced as a new approach to data modeling. Nowadays, more and more information is formatted as semi-structured data, i.e. articles in a digital library, documents on the web and so on. Implementation of an efficient system enabling storage and querying of XML documents requires development of new techniques. The indexing of an XML document is enabled by providing an efficient evaluation of a user query. XML query languages, like XPath or XQuery, apply a form of path expressions for composing more general queries. The evaluation process of regular path expressions is not efficient enough using the current approaches to indexing XML data. Most approaches index single elements and the query statement is processed by joining individual expressions. In this article we will introduce an approach which makes it possible to efficiently process a query defined by regular path expressions. This approach indexes all root-to-leaf paths and stores them in multi-dimensional data structures, allowing the indexing and efficient querying of an enormous volume of XML data.

Keywords

indexing XML data regular path expression multi-dimensional data structures 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Al-Khalifa, S., Jagadish, H.V., Koudas, N.: Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In: Proceedings of ICDE 2002, The IEEE International Conference on Data Engineering, San Jose, IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  2. 2.
    Bayer, R.: The Universal B-Tree for multidimensional indexing: General Concepts. In: Masuda, T., Tsukamoto, M., Masunaga, Y. (eds.) WWCA 1997. LNCS, vol. 1274, Springer, Heidelberg (1997)Google Scholar
  3. 3.
    Chen, Y., Davidson, S.B., Zheng, Y.: Blas: an efficient xpath processing system. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, Paris, France, pp. 47–58. ACM Press, New York (2004)CrossRefGoogle Scholar
  4. 4.
    Chen, Z., Korn, G., Koudas, F., Shanmugasundaram, N., Srivastava, J.: Index Structures for Matching XML Twigs Using Relational Query Processors. In: Proceedings of ICDE 2005, The IEEE International Conference on Data Engineering, Tokyo, Japan, pp. 1273–1273. IEEE Computer Society Press, Los Alamitos (2005)Google Scholar
  5. 5.
    Chung, C.-W., Min, J.-K., Shim, K.: Apex: an adaptive path index for xml data. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, pp. 121–132. ACM Press, New York (2002)CrossRefGoogle Scholar
  6. 6.
    Georgiadis, H., Vassalos, V.: Improving the Efficiency of XPath Execution on Relational Systems. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Boehm, K., Kemper, A., Grust, T., Boehm, C. (eds.) EDBT 2006. LNCS, vol. 3896, p. 570. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Grust, T.: Accelerating XPath Location Steps. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, ACM Press, New York (2002)Google Scholar
  8. 8.
    Guttman, A.: R-Trees: A Dynamic Index Structure for Spatial Searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, Boston, pp. 47–57. ACM Press, New York (1984)CrossRefGoogle Scholar
  9. 9.
    Li, W.H.H., Lee, M.L.: A path-based labeling scheme for efficient structural join. In: Bressan, S., Ceri, S., Hunt, E., Ives, Z.G., Bellahsène, Z., Rys, M., Unland, R. (eds.) XSym 2005. LNCS, vol. 3671, pp. 34–48. Springer, Heidelberg (2005)Google Scholar
  10. 10.
    Jiang, H., Lu, H., Wang, W., Ooi, B.: XR-Tree: Indexing XML Data for Efficient Structural Join. In: Proceedings of ICDE 2003, The IEEE International Conference on Data Engineering, India, IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  11. 11.
    Krátký, M., Pokorný, J., Snášel, V.: Implementation of XPath Axes in the Multi-dimensional Approach to Indexing XML Data. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, Springer, Heidelberg (2004)Google Scholar
  12. 12.
    Krátký, M., Skopal, T., Snášel, V.: Multidimensional Term Indexing for Efficient Processing of Complex Queries. Kybernetika, Journal 40(3), 381–396 (2004)Google Scholar
  13. 13.
    Krátký, M., Snášel, V., Zezula, P., Pokorný, J.: Efficient Processing of Narrow Range Queries in the R-Tree. In: Proceedings of International Database Engineering & Applications Symposium, IDEAS 2006, IEEE Computer Society Press, Los Alamitos (2006)Google Scholar
  14. 14.
    Krishnamurthy, R., Kaushik, R., Naughton, J.F.: Efficient XML-to-SQL Query Translation: Where to Add the Intelligence?. In: Proceedings of the 30th International Conference on Very Large Data Bases, VLDB 2004 (2004)Google Scholar
  15. 15.
    Li, Q., Moon, B.: Indexing and Querying XML Data for Regular Path Expressions. In: Proceedings of 27th International Conference on Very Large Data Bases, VLDB 2001 (2001)Google Scholar
  16. 16.
    Shimura, T., Yoshikawa, M., Amagasa, T., Uemura, S.: Xrel: a path-based approach to storage and retrieval of xml documents using relational databases. ACM Trans. Inter. Tech. 1(1), 110–141 (2001)CrossRefGoogle Scholar
  17. 17.
    Widom, J., Goldman, R.: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In: Proceedings of International Conference on Very Large Data Bases, VLDB 1997, pp. 436–445 (1997)Google Scholar
  18. 18.
    Shanmugasundaram, J., et al.: A general technique for querying XML documents using a relational database system. SIGMOD Rec. 30, 20–26 (2001)CrossRefGoogle Scholar
  19. 19.
    Shasha, D.: Algorithmics and Applications of Tree and Graph Searching, tutorial. In: Proceedings of ACM Symposium on Principles of Database Systems, PODS 2002, ACM Press, New York (2002)Google Scholar
  20. 20.
    Tatarinov, I., et al.: Storing and querying ordered XML using a relational database system. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, pp. 204–215. ACM Press, New York (2002)CrossRefGoogle Scholar
  21. 21.
    W3 Consortium. Extensible Markup Language (XML) 1.0, W3C Recommendation (February 10, 1998), http://www.w3.org/TR/REC-xml
  22. 22.
    W3 Consortium. XQuery 1.0: An XML Query Language, W3C Working Draft (November 12, 2003), http://www.w3.org/TR/xquery/
  23. 23.
    W3 Consortium. XML Path Language (XPath) Version 2.0, W3C Working Draft (November 15, 2002), http://www.w3.org/TR/xpath20/
  24. 24.
    Wang, H., Park, S., Fan, W., Yu, P.S.: ViST: a dynamic index method for querying XML data by tree structures. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, pp. 110–121. ACM Press, New York (2003)CrossRefGoogle Scholar
  25. 25.
    Zhang, C., Naughton, J., DeWitt, D., Luo, Q., Lohman, G.: On supporting containment queries in relational database management systems. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, Santa Barbara, pp. 425–436. ACM Press, New York, USA (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Michal Krátký
    • 1
  • Radim Bača
    • 1
  • Václav Snášel
    • 1
  1. 1.Department of Computer Science, VŠB – Technical University of Ostrava, 17. listopadu 15, 708 33 Ostrava–PorubaCzech Republic

Personalised recommendations