Skip to main content

Deriving Predicate Statistics for Logic Rules

  • Conference paper
Web Reasoning and Rule Systems (RR 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7497))

Included in the following conference series:

  • 691 Accesses

Abstract

Database query optimizers rely on data statistics in selecting query execution plans and rule-based systems can greatly benefit from such optimizations as well. To this end, one first needs to collect data statistics for base and propagate them to derived predicates. However, there are two difficulties: dependencies among arguments and recursion. Earlier we developed an algorithm, called SDP, for estimating Datalog query sizes efficiently by estimating statistical dependency for both base and derived predicates [16]. Base predicate statistics were summarized as dependency matrices, while the statistics for derived predicate were estimated by abstract evaluation of rules over the dependency matrices. This previous work had several limitations. First, it only considered Datalog predicates. Second, only predicates of arity at most 2 were allowed—a very serious limitation of the approach. The present paper extends SDP to general rules and n-ary predicates. It also handles negation and mutual recursions as well as other operations. We also report on our experiments with SDP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Acharya, S., Poosala, V., Ramaswamy, S.: Selectivity estimation in spatial databases. In: SIGMOD 1999: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 13–24. ACM, New York (1999)

    Chapter  Google Scholar 

  2. Baddeley, A., Turner, R.: Spatstat: an R package for analyzing spatial point patterns. Journal of Statistical Software 12(6), 1–42 (2005), http://www.jstatsoft.org

    Google Scholar 

  3. Bowman, I.T., Paulley, G.N.: Join enumeration in a memory-constrained environment. In: Proceedings of the 16th International Conference on Data Engineering, pp. 645–654. IEEE Computer Society, Washington, DC (2000)

    Chapter  Google Scholar 

  4. Nicolas, B., Surajit, C.: Exploiting statistics on query expressions for optimization. In: SIGMOD 2002: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 263–274. ACM, New York (2002)

    Google Scholar 

  5. Christodoulakis, S.: Implications of certain assumptions in database performance evauation. ACM Trans. Database Syst. 9(2), 163–186 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  6. DeHaan, D., Tompa, F.W.: Optimal top-down join enumeration. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 785–796. ACM, New York (2007)

    Chapter  Google Scholar 

  7. Amol, D., Minos, G., Rajeev, R.: Independence is good: dependency-based histogram synopses for high-dimensional data. SIGMOD Rec. 30(2), 199–210 (2001)

    Article  Google Scholar 

  8. Furtado, P., Madeira, H.: Summary grids: Building accurate multidimensional histograms (1999)

    Google Scholar 

  9. Gassner, P., Lohman, G.M., Schiefer, K.B., Wang, Y.: Query optimization in the ibm db2 family. IEEE Data Eng. Bull. 16(4), 4–18 (1993)

    Google Scholar 

  10. Ioannidis, Y.: The history of histograms (abridged). In: Proc. of VLDB Conference. Morgan Kaufmann, Berlin (2003)

    Google Scholar 

  11. Ioannidis, Y.E.: Universality of serial histograms. In: VLDB 1993: Proceedings of the 19th International Conference on Very Large Data Bases, pp. 256–267. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  12. Ioannidis Yannis, E., Christodoulakis, S.: On the propagation of errors in the size of join results. SIGMOD Rec. 20(2), 268–277 (1991)

    Article  Google Scholar 

  13. Ioannidis Yannis, E., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: SIGMOD 1995: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 233–244. ACM, New York (1995)

    Chapter  Google Scholar 

  14. Kifer, M., Bernstein, A., Lewis, P.M.: Database Systems: An Application Oriented Approach, Compete Version. Addison-Wesley, Boston (2006)

    Google Scholar 

  15. Liang, S.: Non-termination analysis and cost-based query optimization of logic programs. Ph.D. Dissertation (2012), http://www.cs.stonybrook.edu/~sliang

  16. Liang, S., Kifer, M.: Deriving predicate statistics in datalog. In: Kutsia, T., Schreiner, W., Fernández, M. (eds.) PPDP, pp. 45–56. ACM (2010)

    Google Scholar 

  17. Lipton, R.J., Naughton, J.F.: Estimating the size of generalized transitive closures. In: VLDB 1989: Proceedings of the 15th International Conference on Very Large Data Bases, pp. 165–171. Morgan Kaufmann Publishers Inc., San Francisco (1989)

    Google Scholar 

  18. Moerkotte, G., Neumann, T.: Dynamic programming strikes back. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 539–552. ACM, New York (2008)

    Chapter  Google Scholar 

  19. Muralikrishna, M., DeWitt, D.J.: Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. In: Boral, H., Larson, P.Å. (eds.) Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, June 1-3, pp. 28–36. ACM Press (1988)

    Google Scholar 

  20. Ono, K., Lohman, G.M.: Measuring the complexity of join enumeration in query optimization. In: Proceedings of the Sixteenth International Conference on Very Large Databases, pp. 314–325. Morgan Kaufmann Publishers Inc., San Francisco (1990), http://portal.acm.org/citation.cfm?id=94362.94436

    Google Scholar 

  21. Poosala, V., Haas, P.J., Ioannidis, Y.E., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. In: SIGMOD 1996: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 294–305. ACM, New York (1996)

    Chapter  Google Scholar 

  22. Viswanath, P., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: VLDB 1997: Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 486–495. Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

  23. Ramakrishnan, R., Srivastava, D., Sudarshan, S., Seshadri, P.: The coral deductive system. VLDB J. 3(2), 161–210 (1994)

    Article  Google Scholar 

  24. Sagonas, K.F., Swift, T., Warren, D.S.: An abstract machine for efficiently computing queries to well-founded models. J. Log. Program. 45(1-3), 1–41 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  25. Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD 1979: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 23–34. ACM, New York (1979)

    Chapter  Google Scholar 

  26. Sereni, D., Avgustinov, P., de Moor, O.: Adding magic to an optimising datalog compiler. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 553–566. ACM, New York (2008)

    Chapter  Google Scholar 

  27. Seshadri, S., Naughton, J.F.: On the expected size of recursive datalog queries. In: PODS 1991: Proceedings of the tenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 268–279. ACM, New York (1991)

    Chapter  Google Scholar 

  28. Spiegel, J., Polyzotis, N.: Graph-based synopses for relational selectivity estimation. In: SIGMOD 2006: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 205–216. ACM, New York (2006)

    Chapter  Google Scholar 

  29. Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: Leo - db2’s learning optimizer. In: VLDB 2001: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 19–28. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  30. Swift, T., Warren, D.S.: Xsb: Extending prolog with tabled logic programming. CoRR abs/1012.5123 (2010)

    Google Scholar 

  31. Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic multidimensional histograms. In: SIGMOD 2002: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 428–439. ACM, New York (2002)

    Chapter  Google Scholar 

  32. Vance, B., Maier, D.: Rapid bushy join-order optimization with cartesian products. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, SIGMOD 1996, pp. 35–46. ACM, New York (1996)

    Chapter  Google Scholar 

  33. The SILK project: Semantic Inferencing on Large Knowledge. The FLORA-2 Web Site, http://silk.semwebcentral.org/

  34. Yang, G., Kifer, M., Zhao, C.: Flora-2: A Rule-Based Knowledge Representation and Inference Infrastructure for the Semantic Web. In: Meersman, R., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 671–688. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liang, S., Kifer, M. (2012). Deriving Predicate Statistics for Logic Rules. In: Krötzsch, M., Straccia, U. (eds) Web Reasoning and Rule Systems. RR 2012. Lecture Notes in Computer Science, vol 7497. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33203-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33203-6_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33202-9

  • Online ISBN: 978-3-642-33203-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics