XML Selectivity Estimation

Ramanath, Maya; Freire, Juliana; Polyzotis, Neoklis

doi:10.1007/978-1-4614-8265-9_801

Maya Ramanath³,
Juliana Freire^4,5,6 &
Neoklis Polyzotis⁷

22 Accesses

Synonyms

XML cardinality estimation

Definition

Selectivity estimation in database systems refers to the task of estimating the number of results that will be output for a given query. Selectivity estimates are crucial in query optimization, since they enable optimizers to select efficient query plans. They are also employed in interactive data exploration as timely feedback about the expected outcome of user queries, and can even serve as approximate answers for count queries.

Selectivity estimators apply an estimation procedure on a synopsis of the data. Due to the stringent time and space constraints of query optimization, of which selectivity estimation is only one of the steps, selectivity estimators are faced with two, often conflicting, requirements: they have to accurately and efficiently estimate the cardinality of queries while keeping the synopsis size to a minimum.

While there is a large body of literature on selectivity estimation in the context of relational databases, the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

Aboulnaga A, Alameldeen AR, Naughton J. Estimating the selectivity of XML path expressions for internet scale applications. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 591–600.
Google Scholar
Chen Z, Jagadish HV, Korn F, Koudas N, Muthukrishnan S, Ng RT, Srivastava D. Counting twig matches in a tree. In: Proceedings of the 17th International Conference on Data Engineering; 2001. p. 453–62.
Google Scholar
Freire J, Haritsa J, Ramanath M, Roy P, Siméon J. StatiX: making XML count. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2002. p. 181–91.
Google Scholar
Goldman R, Widom J. Dataguides: enabling query formulation and optimization in semistructured databases. In: Proceedings of the 23th International Conference on Very Large Data Bases; 1997. p. 436–45.
Google Scholar
Lim L, Wang M, Padmanabhan S, Vitter J, Parr R. XPathLearner: an on-line self-tuning markov histogram for XML path selectivity estimation. In: Proceedings of the 28th International Conference on Very Large Data Bases; 2002. p. 442–53.
Chapter Google Scholar
Lim L, Wang M, Vitter J. CXHist: an on-line classification-based histogram for XML string selectivity estimation. In: Proceedings of the 31st International Conference on Very Large Data Bases; 2005. p. 1187–98.
Google Scholar
McHugh J, Abiteboul S, Goldman R, Quass D, Widom J. A database management system for semistructured data. ACM SIGMOD Rec. 1997;26(3):54–66.
Article Google Scholar
Milo T, Suciu D. Index structures for path expressions. In: Proceedings of the 7th International Conference on Database Theory; 1999. p. 277–95.
Google Scholar
Nestorov S, Ullman J, Wiener J, Chawathe S. Representative objects: concise representations of semistructured, hierarchical data. In: Proceedings of the 13th International Conference on Data Engineering; 1997. p. 79–90.
Google Scholar
Polyzotis N, Garofalakis M. XCluster synopses for structured XML content. In: Proceedings of the 22nd International Conference on Data Engineering; 2006. p. 63.
Google Scholar
Polyzotis N, Garofalakis M. XSketch synopses for XML data graphs. ACM Trans Database Syst. 2006;31(3):1014–63.
Article Google Scholar
Polyzotis N, Garofalakis M, Ioannidis Y. Approximate XML query answers. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004. p. 263–74.
Google Scholar
Ramanath M, Zhang L, Freire J, Haritsa J. IMAX: incremental maintenance of schema-based XML statistics. In: Proceedings of the 21st International Conference on Data Engineering; 2005. p. 273–84.
Google Scholar
Rao P, Moon B. Sketchtree: approximate tree pattern counts over streaming labeled trees. In: Proceedings of 22nd International Conference on Data Engineering; 2006. p. 80.
Google Scholar
Sartiani C. A framework for estimating XML query cardinality. In: Proceedings of the 6th International Workshop on the World Wide Web and Databases; 2003. p. 43–48.
Google Scholar
Wang W, Jiang H, Lu H, Yu JX. Containment join size estimation: models and methods. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2003. p. 145–56.
Google Scholar
Wang W, Jiang H, Lu H, Yu JX. Bloom histogram: path selectivity estimation for XML data with updates. In Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 240–51.
Google Scholar
Wu Y, Patel JM, Jagadish HV. Estimating answer sizes for XML queries. In: Advances in database technology, Proceedings of the 8th International Conference on Extending Database Technology; 2002. p. 590–608.
Chapter Google Scholar
Zhang N, Özsu MT, Aboulnaga A, Ilyas IF. XSEED: accurate and fast cardinality estimation for XPath queries. In: Proceedings of the 22nd International Conference on Data Engineering; 2006. p. 61.
Google Scholar

Download references

Author information

Authors and Affiliations

Max-Planck Institute for Informatics, Saarbrücken, Germany
Maya Ramanath
NYU Tandon School of Engineering, Brooklyn, NY, USA
Juliana Freire
NYU Center for Data Science, New York, NY, USA
Juliana Freire
New York University, New York, NY, USA
Juliana Freire
University of California Santa Cruz, Santa Cruz, CA, USA
Neoklis Polyzotis

Authors

Maya Ramanath
View author publications
You can also search for this author in PubMed Google Scholar
Juliana Freire
View author publications
You can also search for this author in PubMed Google Scholar
Neoklis Polyzotis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maya Ramanath .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Section Editor information

Laboratoire d'Informatique de Grenoble, CNRS and LIG, Grenoble, France
Sihem Amer-Yahia

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Ramanath, M., Freire, J., Polyzotis, N. (2018). XML Selectivity Estimation. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_801

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_801
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics