Skip to main content

SF-Tree: An Efficient and Flexible Structure for Estimating Selectivity of Simple Path Expressions with Statistical Accuracy Guarantee

  • Conference paper
  • 969 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2973))

Abstract

Estimating the selectivity of a simple path expression (SPE) is essential for selecting the most efficient evaluation plans for XML queries. To estimate selectivity, we need an efficient and flexible structure to store a summary of the path expressions that are present in an XML document collection. In this paper we propose a new structure called SF-Treeto address the selectivity estimation problem. SF-Tree provides a flexible way for the users to choose among accuracy, space requirement and selectivity retrieval speed. It makes use of signature files to store the SPEs in a tree form to increase the selectivity retrieval speed and the accuracy of the retrieved selectivity. Our analysis shows that the probability that a selectivity estimation error occurs decreases exponentially with respect to the error size.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aboulnaga, A., Alameldeen, A., Naughton, J.: Estimating the selectivity of XML path expressions for internet scale applications. In: VLDB, pp. 591–600 (2001)

    Google Scholar 

  2. Chen, Z., Jagadish, H.V., Korn, F., Koudas, N., Muthukrishnan, S., Ng, R., Srivastava, D.: Counting twig matches in a tree. In: ICDE, pp. 595–604 (2001)

    Google Scholar 

  3. Diaz, A.L., Lovell, D.: XML data generator (September 1999), http://www.alphaworks.ibm.com/tech/xmlgenerator

  4. Faloutsos, C., Christodoulakis, S.: Signature files: An access method for documents and its analytical performance evaluation. ACM TOIS 2(4), 267–288 (1984)

    Article  Google Scholar 

  5. Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: VLDB, pp. 436–445 (1997)

    Google Scholar 

  6. Ho, W.-S., Kao, B., Cheung, D.W., Chi Lap [Beta], Y., Lo, E.: SF-Tree: An efficient and flexible structure for selectivity estimation. Technical Report TR-2003-08, The University of Hong Kong (December 2003)

    Google Scholar 

  7. Knuth, D.E.: The Art of Computer Programming, vol. 3. Addison-Wesley, Reading (1973)

    Google Scholar 

  8. Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: VLDB, pp. 361–370 (2001)

    Google Scholar 

  9. Lim, L., Wang, M., Padmanabhan, S., Vitter, J.S., Parr, R.: XPathLearner: an on-line self-tuning markov histogram for XML path selectivity estimation. In: VLDB, pp. 442–453 (2002)

    Google Scholar 

  10. Milo, T., Suciu, D.: Index structures for path expressions. In: ICDT 1999, pp. 277–295 (1999)

    Google Scholar 

  11. Polyzotis, N., Garofalakis, M.: Statistical synopses for graph-structured XML databases. In: SIGMOD, pp. 358–369 (2002)

    Google Scholar 

  12. Polyzotis, N., Garofalakis, M.: Structure and value synopses for XML data graphs. In: VLDB, pp. 466–477 (2002)

    Google Scholar 

  13. Schmidt, A.R., Waas, F., Kersten, M.L., Florescu, D., Manolescu, I., Carey, M.J., Busse, R.: The XML Benchmark Project. Technical Report INS-R0103, CWI, Amsterdam, The Netherlands (April 2001)

    Google Scholar 

  14. Stephen, G.A.: Suffix Trees. In: String Searching Algorithms. Lecture Notes Series on Computing, vol. 3, pp. 87–110. World Scientific, Singapore (1994)

    Google Scholar 

  15. W3C. Extensible markup language (XML) 1.0 (February 1998), http://www.w3.org/TR/1998/REC-xml-19980210

  16. W3C. XML path language (XPath) version 1.0 (November 1999)

    Google Scholar 

  17. W3C. XQuery 1.0: An XML query language (June 2001), http://www.w3.org/TR/xquery

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ho, WS., Kao, B., Cheung, D.W., Lap, Y.C., Lo, E. (2004). SF-Tree: An Efficient and Flexible Structure for Estimating Selectivity of Simple Path Expressions with Statistical Accuracy Guarantee. In: Lee, Y., Li, J., Whang, KY., Lee, D. (eds) Database Systems for Advanced Applications. DASFAA 2004. Lecture Notes in Computer Science, vol 2973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24571-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24571-1_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21047-4

  • Online ISBN: 978-3-540-24571-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics