A General Framework for Estimating XML Query Cardinality

  • Carlo Sartiani
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2921)


In the context of XML data management systems, the estimation of query cardinality is becoming more and more important: the information provided by a query result estimator can be used as input to the query optimizer, as an early feedback to user queries, as well as input for determining an optimal storage schema, and it may be helpful in embedded query execution.

Existing estimation models for XML queries focus on particular aspects of XML querying, such as the estimation of path and twig expression cardinality, and they do not deal with the problem of predicting the cardinality of general XQuery queries. This paper presents a framework for estimating XML query cardinality. The framework provides facilities for estimating result size of FLWR queries, hence allowing the model designer to concentrate her efforts on the development of adequate and accurate, while concise, statistic summaries for XML data. The framework can also be used for extending existing models to a wider class of XML queries.


Selectivity Factor Path Query Path Expression Query Workload Twig Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, San Francisco (1999)Google Scholar
  2. 2.
    Aboulnaga, A., Alameldeen, A.R., Naughton, J.F.: Estimating the selectivity of XML path expressions for internet scale applications. In: Apers, P.M.G., Atzeni, P., Ceri, S., Paraboschi, S., Ramamohanarao, K., Snodgrass, R.T. (eds.) VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, Roma, Italy, September 11-14, pp. 591–600. Morgan Kaufmann, San Francisco (2001)Google Scholar
  3. 3.
    Chen, Z., Jagadish, H.V., Korn, F., Koudas, N., Muthukrishnan, S., Ng, R.T., Srivastava, D.: Counting twig matches in a tree. In: Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, April 2-6, pp. 595–604. IEEE Computer Society, Los Alamitos (2001)CrossRefGoogle Scholar
  4. 4.
    Ferragina, P., Koudas, N., Muthukrishnan, S., Srivastava, D.: Two dimensional substring indexing. In: Proceedings of the Twenteenth ACM-SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems, Santa Barbara, California, USA, May 21-23 (2001)Google Scholar
  5. 5.
    Freire, J., Haritsa, J.R., Ramanath, M., Roy, P., Siméon, J.: Statix: Making XML count. In: SIGMOD 2002, Proceedings ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, June 3-6. ACM Press, New York (2002)Google Scholar
  6. 6.
    Sartiani, C.: Efficient Management of Semistructured XML Data (2003) (manuscript draft)Google Scholar
  7. 7.
    Schmidt, A., Waas, F., Kersten, M., Florescu, D., Manolescu, I., Carey, M.J., Busse, R.: The XML Benchmark Project. Technical report, Centrum voor Wiskunde en Informatica (April 2001)Google Scholar
  8. 8.
    Wu, Y., Patel, J.M., Jagadish, H.V.: Estimating answer sizes for XML queries. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 590–608. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Carlo Sartiani
    • 1
  1. 1.Dipartimento di InformaticaUniversità di PisaPisaItaly

Personalised recommendations