Skip to main content

XML Selectivity Estimation

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems

Synonyms

XML cardinality estimation

Definition

Selectivity estimation in database systems refers to the task of estimating the number of results that will be output for a given query. Selectivity estimates are crucial in query optimization, since they enable optimizers to select efficient query plans. They are also employed in interactive data exploration as timely feedback about the expected outcome of user queries, and can even serve as approximate answers for count queries.

Selectivity estimators apply an estimation procedure on a synopsis of the data. Due to the stringent time and space constraints of query optimization, of which selectivity estimation is only one of the steps, selectivity estimators are faced with two, often conflicting, requirements: they have to accurately and efficiently estimate the cardinality of queries while keeping the synopsis size to a minimum.

While there is a large body of literature on selectivity estimation in the context of relational databases, the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Aboulnaga A, Alameldeen AR, Naughton J. Estimating the selectivity of XML path expressions for internet scale applications. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 591–600.

    Google Scholar 

  2. Chen Z, Jagadish HV, Korn F, Koudas N, Muthukrishnan S, Ng RT, Srivastava D. Counting twig matches in a tree. In: Proceedings of the 17th International Conference on Data Engineering; 2001. p. 453–62.

    Google Scholar 

  3. Freire J, Haritsa J, Ramanath M, Roy P, Siméon J. StatiX: making XML count. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2002. p. 181–91.

    Google Scholar 

  4. Goldman R, Widom J. Dataguides: enabling query formulation and optimization in semistructured databases. In: Proceedings of the 23th International Conference on Very Large Data Bases; 1997. p. 436–45.

    Google Scholar 

  5. Lim L, Wang M, Padmanabhan S, Vitter J, Parr R. XPathLearner: an on-line self-tuning markov histogram for XML path selectivity estimation. In: Proceedings of the 28th International Conference on Very Large Data Bases; 2002. p. 442–53.

    Chapter  Google Scholar 

  6. Lim L, Wang M, Vitter J. CXHist: an on-line classification-based histogram for XML string selectivity estimation. In: Proceedings of the 31st International Conference on Very Large Data Bases; 2005. p. 1187–98.

    Google Scholar 

  7. McHugh J, Abiteboul S, Goldman R, Quass D, Widom J. A database management system for semistructured data. ACM SIGMOD Rec. 1997;26(3):54–66.

    Article  Google Scholar 

  8. Milo T, Suciu D. Index structures for path expressions. In: Proceedings of the 7th International Conference on Database Theory; 1999. p. 277–95.

    Google Scholar 

  9. Nestorov S, Ullman J, Wiener J, Chawathe S. Representative objects: concise representations of semistructured, hierarchical data. In: Proceedings of the 13th International Conference on Data Engineering; 1997. p. 79–90.

    Google Scholar 

  10. Polyzotis N, Garofalakis M. XCluster synopses for structured XML content. In: Proceedings of the 22nd International Conference on Data Engineering; 2006. p. 63.

    Google Scholar 

  11. Polyzotis N, Garofalakis M. XSketch synopses for XML data graphs. ACM Trans Database Syst. 2006;31(3):1014–63.

    Article  Google Scholar 

  12. Polyzotis N, Garofalakis M, Ioannidis Y. Approximate XML query answers. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004. p. 263–74.

    Google Scholar 

  13. Ramanath M, Zhang L, Freire J, Haritsa J. IMAX: incremental maintenance of schema-based XML statistics. In: Proceedings of the 21st International Conference on Data Engineering; 2005. p. 273–84.

    Google Scholar 

  14. Rao P, Moon B. Sketchtree: approximate tree pattern counts over streaming labeled trees. In: Proceedings of 22nd International Conference on Data Engineering; 2006. p. 80.

    Google Scholar 

  15. Sartiani C. A framework for estimating XML query cardinality. In: Proceedings of the 6th International Workshop on the World Wide Web and Databases; 2003. p. 43–48.

    Google Scholar 

  16. Wang W, Jiang H, Lu H, Yu JX. Containment join size estimation: models and methods. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2003. p. 145–56.

    Google Scholar 

  17. Wang W, Jiang H, Lu H, Yu JX. Bloom histogram: path selectivity estimation for XML data with updates. In Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 240–51.

    Google Scholar 

  18. Wu Y, Patel JM, Jagadish HV. Estimating answer sizes for XML queries. In: Advances in database technology, Proceedings of the 8th International Conference on Extending Database Technology; 2002. p. 590–608.

    Chapter  Google Scholar 

  19. Zhang N, Özsu MT, Aboulnaga A, Ilyas IF. XSEED: accurate and fast cardinality estimation for XPath queries. In: Proceedings of the 22nd International Conference on Data Engineering; 2006. p. 61.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maya Ramanath .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Ramanath, M., Freire, J., Polyzotis, N. (2018). XML Selectivity Estimation. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_801

Download citation

Publish with us

Policies and ethics