Skip to main content

Fractional XSketch Synopses for XML Databases

  • Conference paper
Book cover Database and XML Technologies (XSym 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3186))

Included in the following conference series:

Abstract

A key step in the optimization of declarative queries over XML data is estimating the selectivity of path expressions, i.e., the number of elements reached by a specific navigation pattern through the XML data graph. Recent studies have introduced XSketch structural graph synopses as an effective, space-efficient tool for the compile-time estimation of complex path-expression selectivities over graph-structured, schema-less XML data. Briefly, XSketches exploit localized graph stability and well-founded statistical assumptions to accurately approximate the path and branching distribution in the underlying XML data graph. Empirical results have demonstrated the effectiveness of XSketch summaries over real-life and synthetic data sets, and for a variety of path-expression workloads.

In this paper, we introduce fractional XSketches (fXSketches) a simple, yet intuitive and very effective generalization of the basic XSketch summarization mechanism. In a nutshell, our fXSketch synopsis extends the conventional notion of binary stability (employed in XSketches) with that of fractional stability, essentially recording more detailed path/branching distribution information on individual synopsis edges. As we demonstrate, this natural extension results in several key benefits over conventional XSketches, including (a) a simplified estimation framework, (b) reduced run-time complexity for the synopsis-construction algorithm, and (c) lifting the need for critical uniformity assumptions during estimation (thus resulting in more accurate estimates). Results from an extensive experimental study show that our fXSketch synopses yield significantly better selectivity estimates than conventional XSketches, especially in the context of complex path expressions with branching predicates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Clark, J., DeRose, S.: XML Path Language (XPath), Version 1.0. W3C Recommendation (1999), available from http://www.w3.org/TR/xpath/

  2. Aboulnaga, A., Alameldeen, A.R., Naughton, J.F.: Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In: Proceedings of the 27th Intl. Conf. on Very Large Data Bases (2001)

    Google Scholar 

  3. Freire, J., Haritsa, J.R., Ramanath, M., Roy, P., Siméon, J.: StatiX: Making XML Count. In: Proceedings of the 2002 ACM SIGMOD Intl. Conf. on Management of Data (2002)

    Google Scholar 

  4. Lim, L., Wang, M., Padmanabhan, S., Vitter, J., Parr, R.: XPathLearner: An On-Line Self-Tuning Markov Histogram for XML Path Selectivity Estimation. In: Proceedings of the 28th Intl. Conf. on Very Large Data Bases (2002)

    Google Scholar 

  5. Polyzotis, N., Garofalakis, M.: Statistical Synopses for Graph Structured XML Databases. In: Proceedings of the 2002 ACM SIGMOD Intl. Conf. on Management of Data (2002)

    Google Scholar 

  6. Polyzotis, N., Garofalakis, M.: Structure and Value Synopses for XML Data Graphs. In: Proceedings of the 28th Intl. Conf. on Very Large Data Bases (2002)

    Google Scholar 

  7. Wang, W., Jiang, H., Lu, H., Yu, J.X.: Containment join size estimation: Models and methods. In: Proceedings of the 2003 ACM SIGMOD Intl. Conf. on Management of Data (2003)

    Google Scholar 

  8. Wu, Y., Patel, J.M., Jagadish, H.: Estimating Answer Sizes for XML Queries. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 590. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  9. Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting Local Similarity for Efficient Indexing of Paths in Graph Structured Data. In: Proceedings of the Eighteenth Intl. Conf. on Data Engineering, San Jose, California (2002)

    Google Scholar 

  10. Milo, T., Suciu, D.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  11. Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E.: Extensible Markup Language(XML) 1.0 (Second Edn.). W3C Recommendation (2000), available from http://www.w3.org/TR/REC-xml/

  12. DeRose, S., Maler, E., Orchard, D.: XML Linking Language (XLink), Version 1.0. W3C Recommendation (2001), available from http://www.w3.org/TR/xlink/

  13. McHugh, J., Widom, J.: Query Optimization for XML. In: Proceedings of the 25th Intl. Conf. on Very Large Data Bases (1999)

    Google Scholar 

  14. Chamberlin, D., Clark, J., Florescu, D., Robie, J., Siméon, J., Stefanescu, M.: XQuery 1.0: An XML Query Language. W3C Working Draft 07 (2001), available from http://www.w3.org/TR/xquery/

  15. Paige, R., Tarjan, R.E.: Three Partition Refinement Algorithms. SIAM Journal on Computing 16 (1987)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Drukh, N., Polyzotis, N., Garofalakis, M., Matias, Y. (2004). Fractional XSketch Synopses for XML Databases. In: Bellahsène, Z., Milo, T., Rys, M., Suciu, D., Unland, R. (eds) Database and XML Technologies. XSym 2004. Lecture Notes in Computer Science, vol 3186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30081-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30081-6_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22969-8

  • Online ISBN: 978-3-540-30081-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics