Advertisement

Key Concepts for Native XML Processing

  • Theo Härder
  • Christian Mathis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6462)

Abstract

Over the recent five years, we have designed, implemented, and optimized our prototype system XTC, a native XDBMS providing multi-user read/write transactions and supporting multi-lingual query interfaces (XQuery, XPath, DOM, SAX). We have compared competing concepts in various system layers and iteratively found salient solutions which drastically improved the overall XDBMS performance. XML query processing is critically affected by the smooth interplay of concepts and methods. Here, we focus on the physical level of XML processing: node labeling and mapping options for storage structures; design of suitable index mechanisms; enriched functionality of path processing operators, in particular, for holistic twig joins. In this survey, we outline our experiences gained during the evolution of XTC. We develop “key concepts” to enable fine-grained, effective, and efficient XML processing.

Keywords

Node Label Path Query Path Expression Path Class Twig Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Amer-Yahia 04]
    Amer-Yahia, S., Du, F., Freire, J.: A Comprehensive Solution to the XML-to-Relational Mapping Problem. In: Proc. WIDM, pp. 31–38 (2004)Google Scholar
  2. [Arion 08]
    Arion, A., Bonifati, A., Manolescu, I., Pugliese, A.: Path Summaries and Path Partitioning in Modern XML Databases. World Wide Web 11(1), 117–151 (2008)CrossRefGoogle Scholar
  3. [Beyer 05]
    Beyer, K., et al.: System RX: One Part relational, One Part XML. In: Proc. SIGMOD, pp. 347–358 (2005)Google Scholar
  4. [Beyer 06]
    Beyer, K., et al.: DB2 Goes Hybrid: Integrating Native XML and XQuery with Relational Data and SQL. IBM Systems Journal 45(2), 271–298 (2006)CrossRefGoogle Scholar
  5. [Bohannon 02]
    Bohannon, P., Freire, J., Roy, P., Siméon, J.: From XML Schema to Relations: A Cost-Based Approach to XML Storage. In: Proc. ICDE, pp. 64–73 (2002)Google Scholar
  6. [Boncz 06]
    Boncz, P., Grust, T., van Keulen, M., Manegold, S., Rittinger, J., Teubner, J.: MonetDB/XQuery: A Fast XQuery Processor Powered by a Relational Engine. In: Proc. SIGMOD, pp. 479–490 (2006)Google Scholar
  7. [Bruno 02]
    Bruno, N., Koudas, N., Srivastava, D.: Holistic Twig Joins: Optimal XML Pattern Matching. In: Proc. SIGMOD, pp. 310–321 (2002)Google Scholar
  8. [Chen 03a]
    Chen, Q., Lim, A., Ong, K.W.: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data. In: Proc. SIGMOD, pp. 134–144 (2003)Google Scholar
  9. [Chen 03b]
    Chen, Y., Davidson, S., Hara, C., Zheng, Y.: RRXS: Redundancy Reducing XML Storage in Relations. In: Proc. VLDB, pp. 189–200 (2003)Google Scholar
  10. [Chen 05]
    Chen, T., Lu, J., Ling, T.W.: On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques. In: Proc. SIGMOD, pp. 455–466 (2005)Google Scholar
  11. [Chen 06]
    Chen, S., Li, H.-G., Tatemura, J., Hsiung, W.-P., Agrawal, D., Selçuk Candan, K.: Twig2Stack: Bottom-Up Processing of Generalized-Tree-Pattern Queries over XML Documents. In: Proc. VLDB, pp. 283–294 (2006)Google Scholar
  12. [Cooper 01]
    Cooper, B., Sample, N., Franklin, M.J., Hjaltason, G.R., Shadmon, M.: A Fast Index for Semistructured Data. In: Proc. VLDB, pp. 341–350 (2001)Google Scholar
  13. [DeHaan 03]
    DeHaan, D., Toman, D., Consens, M.P., Özsu, M.T.: A Comprehensive XQuery to SQL Translation using Dynamic Interval Encoding. In: Proc. SIGMOD, pp. 623–634 (2003)Google Scholar
  14. [Fiebig 02]
    Fiebig, T., et al.: Anatomy of a Native XML Base Management System. VLDB Journal 11(4), 292–314 (2002)CrossRefzbMATHGoogle Scholar
  15. [Florescu 99]
    Florescu, D., Kossmann, D.: Storing and Querying XML Data using an RDBMS. Bulletin of the Technical Committee on Data Engineering 22(3), 27–34 (1999)Google Scholar
  16. [Fontoura 05]
    Fontoura, M., Josifovski, V., Shekita, E.J., Yang, B.: Optimizing Cursor Movement in Holistic Twig Joins. In: Proc. CIKM, pp. 784–791 (2005)Google Scholar
  17. [Georgiadis 07]
    Georgiadis, H., Vassalos, V.: XPath on Steroids: Exploiting Relational Engines for XPath Performance. In: Proc. SIGMOD, pp. 317–328 (2007)Google Scholar
  18. [Goldman 97]
    Goldman, R., Widom, J.: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In: Proc. VLDB, pp. 436–445 (1997)Google Scholar
  19. [Grinev 06]
    Grinev, M., Fomichev, A., Kuznetsov, S.: Sedna: A Native XML DBMS. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2006. LNCS, vol. 3831, pp. 272–281. Springer, Heidelberg (2006)Google Scholar
  20. [Härder 07a]
    Härder, T., Haustein, M.P., Mathis, C., Wagner, M.: Node Labeling Schemes for Dynamic XML Documents Reconsidered. Data & Knowledge Engineering 60(1), 126–149 (2007)CrossRefGoogle Scholar
  21. [Härder 07b]
    Härder, T., Mathis, C., Schmidt, K.: Comparison of Complete and Elementless Native Storage of XML Documents. In: Proc. IDEAS, pp. 102–113 (2007)Google Scholar
  22. [Härder 10]
    Härder, T., Mathis, C., Bächle, S., Schmidt, K., Weiner, A.M.: Essential Performance Drivers in Native XML DBMSs (keynote paper). In: van Leeuwen, J., Muscholl, A., Peleg, D., Pokorný, J., Rumpe, B. (eds.) SOFSEM 2010. LNCS, vol. 5901, pp. 29–46. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. [Haustein 07]
    Haustein, M.P., Härder, T.: An Efficient Infrastructure for Native transactional XML Processing. Data & Knowledge Engineering 61(3), 500–523 (2007)CrossRefGoogle Scholar
  24. [He 04]
    He, H., Yang, J.: Multiresolution Indexing of XML for Frequent Queries. In: Proc. ICDE, pp. 683–692 (2004)Google Scholar
  25. [Jagadish 02]
    Jagadish, H.V., et al.: TIMBER: A Native XML Database. VLDB Journal 11(4), 274–291 (2002)CrossRefzbMATHGoogle Scholar
  26. [Jiang 02]
    Jiang, H., Lu, H., Wang, W., Yu, J.X.: Path Materialization Revisited: An Efficient Storage Model for XML Data. Australian Comp. Science Comm. 24(2), 85–94 (2002)Google Scholar
  27. [Jiang 03a]
    Jiang, H., Lu, H., Wang, W., Ooi, B.C.: XR-Tree: Indexing XML Data for Efficient Structural Joins. In: Proc. ICDE, 253–264 (2003)Google Scholar
  28. [Jiang 03b]
    Jiang, H., Wang, W., Lu, H., Yu, J.X.: Holistic Twig Joins on Indexed XML Documents. In: Proc. VLDB, pp. 273–284 (2003)Google Scholar
  29. [Jiao 05]
    Jiao, E., Ling, T.W., Chan, C.Y.: PathStack¬: A Holistic Path Join Algorithm for Path Query with Not-Predicates on XML Data. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 113–124. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  30. [Kaushik 02a]
    Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering Indexes for Branching Path Queries. In: Proc. SIGMOD, pp. 133–144 (2002)Google Scholar
  31. [Kaushik 02b]
    Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting Local Similarity for Indexing Paths in Graph-Structured Data. In: Proc. ICDE, pp. 129–138 (2002)Google Scholar
  32. [Kaushik 04]
    Kaushik, R., Krishnamurthy, R., Naughton, J.F., Ramakrishnan, R.: On the Integration of Structure Indexes and Inverted Lists. In: Proc. SIGMOD, pp. 779–790 (2004)Google Scholar
  33. [Kwon 05]
    Kwon, J., Rao, P., Moon, B., Lee, S.: FiST: Scalable XML Document Filtering by Sequencing Twig Patterns. In: Proc. VLDB, pp. 217–228 (2005)Google Scholar
  34. [Lee 00]
    Lee, D., Chu, W.W.: Constraints-Preserving Transformation from XML Document Type Definition to Relational Schema. In: Laender, A.H.F., Liddle, S.W., Storey, V.C. (eds.) ER 2000. LNCS, vol. 1920, pp. 641–654. Springer, Heidelberg (2000)Google Scholar
  35. [Li 01]
    Li, Q., Moon, B.: Indexing and Querying XML Data for Regular Path Expressions. In: Proc. VLDB, pp. 361–370 (2001)Google Scholar
  36. [Li 06]
    Li, H.-G., Alireza Aghili, S., Agrawal, D., El Abbadi, A.: FLUX: Content and Structure Matching of XPath Queries with Range Predicates. In: Amer-Yahia, S., Bellahsène, Z., Hunt, E., Unland, R., Yu, J.X. (eds.) XSym 2006. LNCS, vol. 4156, pp. 61–76. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  37. [Li 08]
    Li, C., Ling, T.W., Hu, M.: Efficient Updates in Dynamic XML Data: from Binary String to Quaternary String. VLDB Journal 17(3), 573–601 (2008)CrossRefGoogle Scholar
  38. [Loeser 09]
    Loeser, H., Nicola, M., Fitzgerald, J.: Index Challenges in Native XML Database systems. In: Proc. BTW. LNI, vol. 144, pp. 508–523 (2009)Google Scholar
  39. [Lu 04]
    Lu, J., Chen, T., Ling, T.W.: Efficient Processing of XML Twig Patterns with Parent Child Edges: a Look-Ahead Approach. In: Proc. CIKM, pp. 533–542 (2004)Google Scholar
  40. [Lu 05]
    Lu, J., Chen, T., Ling, T.W.: TJFast: Effective Processing of XML Twig Pattern Matching. In: Proc. WWW, pp. 1118–1119 (2005)Google Scholar
  41. [Mang 03]
    Mang, X., Wang, Y., Luo, D., Lu, S., An, J., Chen, Y., Ou, J., Jiang, Y.: OrientX: A Schema-based Native XML Database System. In: Proc. VDLB, pp. 1057–1060 (2003)Google Scholar
  42. [Mathis 09]
    Mathis, C.: Storing, Indexing, and Querying XML Documents in Native XML Database Management Systems. Ph. D. Thesis, Verlag Dr. Hut (2009)Google Scholar
  43. [May 06]
    May, N., Brantner, M., Böhm, A., Kanne, C.-C., Moerkotte, G.: Index vs. Navigation in XPath Evaluation. In: Amer-Yahia, S., Bellahsène, Z., Hunt, E., Unland, R., Yu, J.X. (eds.) XSym 2006. LNCS, vol. 4156, pp. 16–30. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  44. [Mchugh 97]
    Mchugh, J., Abiteboul, S.: Lore: A Database Management System for Semistructured Data. In: SIGMOD Record, vol. 26, pp. 54–66 (1997)Google Scholar
  45. [Meier 02]
    Meier, W.: eXist: An Open Source Native XML Database. In: Chaudhri, A.B., Jeckle, M., Rahm, E., Unland, R. (eds.) NODe-WS 2002. LNCS, vol. 2593, pp. 169–183. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  46. [Miklau 09]
    Miklau, G.: XML Data Repository (Feburary 2009), http://www.cs.washington.edu/research/xmldatasets/
  47. [Milo 99]
    Milo, T., Suciu, D.: Index Structures for Path Expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  48. [O’Neil 04]
    O’Neil, P., O’Neil, E., Pal, S., Cseri, I., Schaller, G., Westbury, N.: ORDPATHs: Insert-Friendly XML Node Labels. In: Proc. SIGMOD, pp. 903–908 (2004)Google Scholar
  49. [Prakash 06]
    Prakash, S., Bhowmick, S.S., Madria, S.: Efficient Recursive XML Query Processing Using Relational Database Systems. Data & Knowledge Engineering 58(3), 207–242 (2006)CrossRefGoogle Scholar
  50. [Prasad 05]
    Hima Prasad, K., Sreenivasa Kumar, P.: Efficient Indexing and Querying of XML Data Using Modified Prüfer Sequences. In: Proc. CIKM, pp. 397–404 (2005)Google Scholar
  51. [Projects 08]
    Financial XML Projects.: XML on Wall Street (2008), http://lighthouse-partners.com/xml
  52. [Qin 07]
    Qin, L., Yu, J.X., Ding, B.: TwigList: Make Twig Pattern Matching Fast. In: Proc. DASFAA, pp. 850–862 (2007)Google Scholar
  53. [Rao 04]
    Rao, P., Moon, B.: PRIX: Indexing And Querying XML Using Prüfer Sequences. In: Proc. ICDE, pp. 288–297 (2004)Google Scholar
  54. [Schmidt 08]
    Schmidt, K., Härder, T.: Usage-driven Storage Structures for Native XML Databases. In: Proc. IDEAS, pp. 169–178 (2008)Google Scholar
  55. [Schmidt 10]
    Schmidt, K., Härder, T.: On the Use of Query-driven XML Auto-Indexing. In: Proc. SMDB Workshop, Long Beach, pp. 1–6 (2010)Google Scholar
  56. [Tatarinov 02]
    Tatarinov, I., et al.: Storing and Querying Ordered XML Using a Relational Database System. In: Proc SIGMOD, pp. 204–215 (2002)Google Scholar
  57. [Wang 03]
    Wang, H., Park, S., Fan, W., Yu, P.S.: ViST: A Dynamic Index Method for Querying XML Data by Tree Structures. In: Proc. SIGMOD, pp. 110–121 (2003)Google Scholar
  58. [Wang 05]
    Wang, W., Jiang, H., Wang, H., Lin, X., Lu, H., Li, J.: Efficient processing of XML Path Queries Using the Disk-Based F&B Index. In: Proc. VLDB, pp. 145–156 (2005)Google Scholar
  59. [Yoshikawa 01]
    Yoshikawa, M., et al.: XRel: A Path-Based Approach to Storage and Retrieval of XML Documents Using Relational Databases. ACM Transact. on Internet Technology 1(1), 110–141 (2001)CrossRefGoogle Scholar
  60. [Yu 06]
    Yu, T., Ling, T.W., Lu, J.: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-Predicates on XML Data. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 249–263. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  61. [Zhang 04]
    Zhang, N., Kacholia, V., Tamer Özsu, M.: A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML. In: Proc. ICDE, pp. 54–63 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Theo Härder
    • 1
  • Christian Mathis
    • 2
  1. 1.University of KaiserslauternGermany
  2. 2.SAPGermany

Personalised recommendations