Skip to main content

Enabling XPath Optional Axes Cardinality Estimation Using Path Synopses

  • Conference paper
Book cover Advances in Databases and Information Systems (ADBIS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5207))

  • 480 Accesses

Abstract

The effective support for XML query languages is becoming increasingly important with the emergence of new applications that access large volumes of XML data. The efficient query execution, especially in the distributed case, requires estimating of the path expression cardinalities. In this paper, we propose two novel techniques for the cardinality estimation of the simple path expressions with optional axes (following/preceding): the document order grouping (DG) and the neighborhood grouping (NG). Both techniques summarize the structure of source XML data in compact graph structures (path synopses) and use these summaries for cardinality estimation. We experimentally evaluated accuracy of the techniques, size of the summaries and studied performance of the prototypes. The wide range of source data was used in order to study the behavior of the structures and the area of techniques application.

Research supported by the Hewlett-Packard Labs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barta, A., Consens, M.P., Mendelzon, A.O.: Benefits of path summaries in an XML query optimizer supporting multiple access methods. In: VLDB 2005: Proceedings of the 31st international conference on Very large data bases, VLDB Endowment, pp. 133–144 (2005)

    Google Scholar 

  2. Goldman, R., Widom, J.: Enabling query formulation and optimization in semistructured databases. In: VLDB 1997: Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 436–445. Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

  3. W3C HTML Working Group: XHTML 1.0 The Extensible HyperText Markup Language (Second Edition), W3C Recommendation (August 1, 2002), http://www.w3.org/TR/xhtml1/

  4. W3C XML Working Group: Extensible Markup Language (XML) 1.0 (Second Edition), W3C Recommendation (6 October 2000), http://www.w3.org/TR/2000/REC-xml-20001006.html

  5. Doan, A.: University of Washington XML Data Repository, University Courses, http://www.cs.washington.edu/research/xmldatasets/www/repository.html

  6. Bosak, J.: The Plays of Shakespeare, http://metalab.unc.edu/bosak/xml/eg/shaks200.zip

  7. Lim, L., Wang, M., Padmanabhan, S., Vitter, J., Parr, R.: XPathLearner: An On-line Self-Tuning Markov Histogram for XML Path Selectivity Estimation. In: VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases. Morgan Kaufmann,San Fransico (2002)

    Google Scholar 

  8. Lim, L., Wang, M., Vitter, J.S.: Cxhist: an on-line classification-based histogram for Xml string selectivity estimation. In: VLDB 2005: Proceedings of the 31st international conference on Very large data bases, VLDB Endowment, pp. 1187–1198 (2005)

    Google Scholar 

  9. Aboulnaga, A., Alameldeen, A.R., Naughton, J.F.: Estimating the selectivity of XML path expressions for internet scale applications. The VLDB Journal, 591–600 (2001)

    Google Scholar 

  10. Sartiani, C.: A framework for estimating Xml query cardinality. In: Christophides, V., Freire, J. (eds.) International Workshop on Web and Databases, pp. 43–48 (2003)

    Google Scholar 

  11. Chen, Z., Jagadish, H.V., Korn, F., Koudas, N., Muthukrishnan, S., Ng, R.T., Srivastava, D.: Counting twig matches in a tree. In: Proceedings of the 17th International Conference on Data Engineering, pp. 595–604. IEEE Computer Society, Washington (2001)

    Chapter  Google Scholar 

  12. Polyzotis, N., Garofalakis, M.: Structure and Value Synopses for XML Data Graphs. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 466–477. Morgan Kaufmann, San Francisco (2002)

    Chapter  Google Scholar 

  13. Polyzotis, N., Garofalakis, M.: Statistical synopses for graph-structured Xml databases. In: SIGMOD 2002: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pp. 358–369. ACM Press, New York (2002)

    Chapter  Google Scholar 

  14. Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Selectivity estimation for Xml twigs. In: ICDE 2004: Proceedings of the 20th International Conference on Data Engineering, p. 264. IEEE Computer Society, Washington (2004)

    Chapter  Google Scholar 

  15. Wu, Y., Patel, J.M., Jagadish, H.V.: Estimating answer sizes for Xml queries. In: Jensen, C.S., Jeffery, K.G., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 590–608. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  16. Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Approximate Xml query answers. In: SIGMOD 2004: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pp. 263–274. ACM Press, New York (2004)

    Chapter  Google Scholar 

  17. Freire, J., Haritsa, J.R., Ramanath, M., Roy, P., Simeon, J.: Statix: making Xml count. In: SIGMOD 2002: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pp. 181–191. ACM Press, New York (2002)

    Chapter  Google Scholar 

  18. Wang, W., Jiang, H., Lu, H., Yu, J.X.: Bloom histogram: Path selectivity estimation for Xml data with updates. In: Nascimento, M.A., Özsu, M.T., Kossmann, D., Miller, R.J., Blakeley, J.A., Schiefer, K.B. (eds.) VLDB 2004: Proceedings of the Thirtieth International Conference on Very Large Data Bases, pp. 240–251. Morgan Kaufmann, San Francisco (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Paolo Atzeni Albertas Caplinskas Hannu Jaakkola

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Soldak, Y., Lukichev, M. (2008). Enabling XPath Optional Axes Cardinality Estimation Using Path Synopses. In: Atzeni, P., Caplinskas, A., Jaakkola, H. (eds) Advances in Databases and Information Systems. ADBIS 2008. Lecture Notes in Computer Science, vol 5207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85713-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85713-6_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85712-9

  • Online ISBN: 978-3-540-85713-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics