Skip to main content

Estimating the Selectivity of XML Path Expression with Predicates by Histograms

  • Conference paper
Book cover Advances in Web-Age Information Management (WAIM 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3129))

Included in the following conference series:

Abstract

Selectivity estimation of path expressions in querying XML data plays an important role in query optimization. A path expression may contain multiple branches with predicates, each of which having its impact on the selectivity of the entire query. In this paper, we propose a novel method based on 2-dimensional value histograms to estimate the selectivity of path expressions embedded with predicates. The value histograms capture the correlation between the structures and the values in the XML data. We define a set of operations on the value histograms as well as on the traditional histograms that capture nodes positional distribution. We then construct a cost tree based on such operations. The selectivity of any node (or branch) in a path expression can be estimated by executing the cost tree. Compared with previous methods (which ignore value distribution) our method offers much better estimation accuracy.

This research was partially supported by the grants from 863 High Technology Foundation of China under grant number 2002AA116030, the Natural Science Foundation of China(NSFC) under grant number 60073014, 60273018, the Key Project of Chinese Ministry of Education (No.03044) and the Excellent Young Teachers Program of M0EPRC (EYTP)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aboulnaga, Alameldeen, A.R., Naughton, J.: Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In: Proc. of the 27th VLDB Conf., Italy, pp. 591–600 (2001)

    Google Scholar 

  2. Chan, Felber, P., Garofalakis, M., Rastogi, R.: Efficient Filtering of XML Documents with Xpath Expressions. In: Proc. of ICDE Conf., USA, pp. 235–244 (2002)

    Google Scholar 

  3. Chen, Z., Jagadish, H.V., Korn, F., Koudas, N., Muthukrishnan, S., Ng, R., Srivastava, D.: Counting Twig Matches in a Tree. In: Proc. of ICDE Conf., pp. 595–604 (2001)

    Google Scholar 

  4. Choi, Fernandez, M., Simen, J.: The XQuery Formal Semantics: A Foundation for Implementation and Optimization. Technical Report, http://www.cis.upenn.edu/kkchoi/galas.pdf (May 31, 2002)

  5. Cooper, B., Sample, N., Franklin, M.J., Hjaltason, G.R., Shadmon, M.: A Fast Index for Semistructured Data. In: Proc. of 27th VLDB Conference, Roma, pp. 341–350 (2001)

    Google Scholar 

  6. Freire, J., Jayant, R., Ramanath, M., Roy, P., Simeon, J.: StatiX: Making XML Count. In: Proc. of ACM SIGMOD 2002, USA, pp. 181–191 (2002)

    Google Scholar 

  7. Jagadish, V.H., Al-Khalifa, S., Lakshmanan, L., Nierman, A., Paparizos, S., Patel, J., Srivastava, D., Wu, Y.: Timber: A native XML Database. Technical report, University of Michigan (2002)

    Google Scholar 

  8. Li, Q., Moon, B.: Indexing and Querying XML Data for Regular Path Expressions. In: Proc. of 27th VLDB, Roma, Italy, pp. 341–350 (2001)

    Google Scholar 

  9. McHugh, Widom, J.: Query Optimization for XML. In: Proc. of 25th VLDB Conference, UK, pp. 315–326 (1999)

    Google Scholar 

  10. Milo, T., Suciu, D.: Index structures for path expression. In: Proc. of 7th International Conference on Database Theory, pp. 277–295 (1999)

    Google Scholar 

  11. Polyzotis, N., Garofalakis, M.: Statistical Synopses for Graph-Structured XML Databases. In: Proc. of ACM SIGMOD Conf., pp. 358–369 (2002)

    Google Scholar 

  12. Wang, W., Jiang, H., Lu, H., Xu, J.F.: Containment Join Size Estimation: Models and Methods. In: Proc. of ACM SIGMOD, USA, pp. 145–156 (2003)

    Google Scholar 

  13. Wu, Y., Patel, J., Jagadish, H.: Estimating Answer Sizes for XML Queries. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 590–608. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  14. Zhang, Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On Supporting Containment Queries in Relational Database Management Systems. In: Proc. of ACM SIGMOD Conf., pp. 425–436 (2001)

    Google Scholar 

  15. Wang, Y., Meng, X., Wang, S.: Estimating the selectivity of PWP by histograms. Technical Report, ftp://202.112.116.99/pub/paper

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, Y., Wang, H., Meng, X., Wang, S. (2004). Estimating the Selectivity of XML Path Expression with Predicates by Histograms. In: Li, Q., Wang, G., Feng, L. (eds) Advances in Web-Age Information Management. WAIM 2004. Lecture Notes in Computer Science, vol 3129. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27772-9_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27772-9_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22418-1

  • Online ISBN: 978-3-540-27772-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics