Abstract
Database query optimizers rely on data statistics in selecting query execution plans and rule-based systems can greatly benefit from such optimizations as well. To this end, one first needs to collect data statistics for base and propagate them to derived predicates. However, there are two difficulties: dependencies among arguments and recursion. Earlier we developed an algorithm, called SDP, for estimating Datalog query sizes efficiently by estimating statistical dependency for both base and derived predicates [16]. Base predicate statistics were summarized as dependency matrices, while the statistics for derived predicate were estimated by abstract evaluation of rules over the dependency matrices. This previous work had several limitations. First, it only considered Datalog predicates. Second, only predicates of arity at most 2 were allowed—a very serious limitation of the approach. The present paper extends SDP to general rules and n-ary predicates. It also handles negation and mutual recursions as well as other operations. We also report on our experiments with SDP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Acharya, S., Poosala, V., Ramaswamy, S.: Selectivity estimation in spatial databases. In: SIGMOD 1999: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 13–24. ACM, New York (1999)
Baddeley, A., Turner, R.: Spatstat: an R package for analyzing spatial point patterns. Journal of Statistical Software 12(6), 1–42 (2005), http://www.jstatsoft.org
Bowman, I.T., Paulley, G.N.: Join enumeration in a memory-constrained environment. In: Proceedings of the 16th International Conference on Data Engineering, pp. 645–654. IEEE Computer Society, Washington, DC (2000)
Nicolas, B., Surajit, C.: Exploiting statistics on query expressions for optimization. In: SIGMOD 2002: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 263–274. ACM, New York (2002)
Christodoulakis, S.: Implications of certain assumptions in database performance evauation. ACM Trans. Database Syst. 9(2), 163–186 (1984)
DeHaan, D., Tompa, F.W.: Optimal top-down join enumeration. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 785–796. ACM, New York (2007)
Amol, D., Minos, G., Rajeev, R.: Independence is good: dependency-based histogram synopses for high-dimensional data. SIGMOD Rec. 30(2), 199–210 (2001)
Furtado, P., Madeira, H.: Summary grids: Building accurate multidimensional histograms (1999)
Gassner, P., Lohman, G.M., Schiefer, K.B., Wang, Y.: Query optimization in the ibm db2 family. IEEE Data Eng. Bull. 16(4), 4–18 (1993)
Ioannidis, Y.: The history of histograms (abridged). In: Proc. of VLDB Conference. Morgan Kaufmann, Berlin (2003)
Ioannidis, Y.E.: Universality of serial histograms. In: VLDB 1993: Proceedings of the 19th International Conference on Very Large Data Bases, pp. 256–267. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Ioannidis Yannis, E., Christodoulakis, S.: On the propagation of errors in the size of join results. SIGMOD Rec. 20(2), 268–277 (1991)
Ioannidis Yannis, E., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: SIGMOD 1995: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 233–244. ACM, New York (1995)
Kifer, M., Bernstein, A., Lewis, P.M.: Database Systems: An Application Oriented Approach, Compete Version. Addison-Wesley, Boston (2006)
Liang, S.: Non-termination analysis and cost-based query optimization of logic programs. Ph.D. Dissertation (2012), http://www.cs.stonybrook.edu/~sliang
Liang, S., Kifer, M.: Deriving predicate statistics in datalog. In: Kutsia, T., Schreiner, W., Fernández, M. (eds.) PPDP, pp. 45–56. ACM (2010)
Lipton, R.J., Naughton, J.F.: Estimating the size of generalized transitive closures. In: VLDB 1989: Proceedings of the 15th International Conference on Very Large Data Bases, pp. 165–171. Morgan Kaufmann Publishers Inc., San Francisco (1989)
Moerkotte, G., Neumann, T.: Dynamic programming strikes back. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 539–552. ACM, New York (2008)
Muralikrishna, M., DeWitt, D.J.: Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. In: Boral, H., Larson, P.Å. (eds.) Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, June 1-3, pp. 28–36. ACM Press (1988)
Ono, K., Lohman, G.M.: Measuring the complexity of join enumeration in query optimization. In: Proceedings of the Sixteenth International Conference on Very Large Databases, pp. 314–325. Morgan Kaufmann Publishers Inc., San Francisco (1990), http://portal.acm.org/citation.cfm?id=94362.94436
Poosala, V., Haas, P.J., Ioannidis, Y.E., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. In: SIGMOD 1996: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 294–305. ACM, New York (1996)
Viswanath, P., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: VLDB 1997: Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 486–495. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Ramakrishnan, R., Srivastava, D., Sudarshan, S., Seshadri, P.: The coral deductive system. VLDB J. 3(2), 161–210 (1994)
Sagonas, K.F., Swift, T., Warren, D.S.: An abstract machine for efficiently computing queries to well-founded models. J. Log. Program. 45(1-3), 1–41 (2000)
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD 1979: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 23–34. ACM, New York (1979)
Sereni, D., Avgustinov, P., de Moor, O.: Adding magic to an optimising datalog compiler. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 553–566. ACM, New York (2008)
Seshadri, S., Naughton, J.F.: On the expected size of recursive datalog queries. In: PODS 1991: Proceedings of the tenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 268–279. ACM, New York (1991)
Spiegel, J., Polyzotis, N.: Graph-based synopses for relational selectivity estimation. In: SIGMOD 2006: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 205–216. ACM, New York (2006)
Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: Leo - db2’s learning optimizer. In: VLDB 2001: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 19–28. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Swift, T., Warren, D.S.: Xsb: Extending prolog with tabled logic programming. CoRR abs/1012.5123 (2010)
Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic multidimensional histograms. In: SIGMOD 2002: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 428–439. ACM, New York (2002)
Vance, B., Maier, D.: Rapid bushy join-order optimization with cartesian products. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, SIGMOD 1996, pp. 35–46. ACM, New York (1996)
The SILK project: Semantic Inferencing on Large Knowledge. The FLORA-2 Web Site, http://silk.semwebcentral.org/
Yang, G., Kifer, M., Zhao, C.: Flora-2: A Rule-Based Knowledge Representation and Inference Infrastructure for the Semantic Web. In: Meersman, R., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 671–688. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liang, S., Kifer, M. (2012). Deriving Predicate Statistics for Logic Rules. In: Krötzsch, M., Straccia, U. (eds) Web Reasoning and Rule Systems. RR 2012. Lecture Notes in Computer Science, vol 7497. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33203-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-33203-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33202-9
Online ISBN: 978-3-642-33203-6
eBook Packages: Computer ScienceComputer Science (R0)