Abstract
Estimating the selectivity of a simple path expression (SPE) is essential for selecting the most efficient evaluation plans for XML queries. To estimate selectivity, we need an efficient and flexible structure to store a summary of the path expressions that are present in an XML document collection. In this paper we propose a new structure called SF-Treeto address the selectivity estimation problem. SF-Tree provides a flexible way for the users to choose among accuracy, space requirement and selectivity retrieval speed. It makes use of signature files to store the SPEs in a tree form to increase the selectivity retrieval speed and the accuracy of the retrieved selectivity. Our analysis shows that the probability that a selectivity estimation error occurs decreases exponentially with respect to the error size.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aboulnaga, A., Alameldeen, A., Naughton, J.: Estimating the selectivity of XML path expressions for internet scale applications. In: VLDB, pp. 591–600 (2001)
Chen, Z., Jagadish, H.V., Korn, F., Koudas, N., Muthukrishnan, S., Ng, R., Srivastava, D.: Counting twig matches in a tree. In: ICDE, pp. 595–604 (2001)
Diaz, A.L., Lovell, D.: XML data generator (September 1999), http://www.alphaworks.ibm.com/tech/xmlgenerator
Faloutsos, C., Christodoulakis, S.: Signature files: An access method for documents and its analytical performance evaluation. ACM TOIS 2(4), 267–288 (1984)
Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: VLDB, pp. 436–445 (1997)
Ho, W.-S., Kao, B., Cheung, D.W., Chi Lap [Beta], Y., Lo, E.: SF-Tree: An efficient and flexible structure for selectivity estimation. Technical Report TR-2003-08, The University of Hong Kong (December 2003)
Knuth, D.E.: The Art of Computer Programming, vol. 3. Addison-Wesley, Reading (1973)
Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: VLDB, pp. 361–370 (2001)
Lim, L., Wang, M., Padmanabhan, S., Vitter, J.S., Parr, R.: XPathLearner: an on-line self-tuning markov histogram for XML path selectivity estimation. In: VLDB, pp. 442–453 (2002)
Milo, T., Suciu, D.: Index structures for path expressions. In: ICDT 1999, pp. 277–295 (1999)
Polyzotis, N., Garofalakis, M.: Statistical synopses for graph-structured XML databases. In: SIGMOD, pp. 358–369 (2002)
Polyzotis, N., Garofalakis, M.: Structure and value synopses for XML data graphs. In: VLDB, pp. 466–477 (2002)
Schmidt, A.R., Waas, F., Kersten, M.L., Florescu, D., Manolescu, I., Carey, M.J., Busse, R.: The XML Benchmark Project. Technical Report INS-R0103, CWI, Amsterdam, The Netherlands (April 2001)
Stephen, G.A.: Suffix Trees. In: String Searching Algorithms. Lecture Notes Series on Computing, vol. 3, pp. 87–110. World Scientific, Singapore (1994)
W3C. Extensible markup language (XML) 1.0 (February 1998), http://www.w3.org/TR/1998/REC-xml-19980210
W3C. XML path language (XPath) version 1.0 (November 1999)
W3C. XQuery 1.0: An XML query language (June 2001), http://www.w3.org/TR/xquery
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ho, WS., Kao, B., Cheung, D.W., Lap, Y.C., Lo, E. (2004). SF-Tree: An Efficient and Flexible Structure for Estimating Selectivity of Simple Path Expressions with Statistical Accuracy Guarantee. In: Lee, Y., Li, J., Whang, KY., Lee, D. (eds) Database Systems for Advanced Applications. DASFAA 2004. Lecture Notes in Computer Science, vol 2973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24571-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-24571-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21047-4
Online ISBN: 978-3-540-24571-1
eBook Packages: Springer Book Archive