Machine Learning

, Volume 73, Issue 1, pp 25–53 | Cite as

Learning to assign degrees of belief in relational domains



A recurrent problem in the development of reasoning agents is how to assign degrees of beliefs to uncertain events in a complex environment. The standard knowledge representation framework imposes a sharp separation between learning and reasoning; the agent starts by acquiring a “model” of its environment, represented into an expressive language, and then uses this model to quantify the likelihood of various queries. Yet, even for simple queries, the problem of evaluating probabilities from a general purpose representation is computationally prohibitive. In contrast, this study embarks on the learning to reason (L2R) framework that aims at eliciting degrees of belief in an inductive manner. The agent is viewed as an anytime reasoner that iteratively improves its performance in light of the knowledge induced from its mistakes. Indeed, by coupling exponentiated gradient strategies in learning and weighted model counting techniques in reasoning, the L2R framework is shown to provide efficient solutions to relational probabilistic reasoning problems that are provably intractable in the classical paradigm.


Learning to reason Online learning Relational probabilistic reasoning Exponentiated gradient learning Markov networks Weighted model counting 


  1. Abadi, M., & Halpern, J. Y. (1994). Decidability and expressiveness for first-order logics of probability. Information and Computation, 112(1), 1–36. MATHCrossRefMathSciNetGoogle Scholar
  2. Angluin, D. (1988). Queries and concept learning. Machine Learning, 2(4), 319–342. Google Scholar
  3. Bacchus, F., Grove, A. J., Halpern, J. Y., & Koller, D. (1996). From statistical knowledge bases to degrees of belief. Artificial Intelligence, 87(1–2), 75–143. CrossRefMathSciNetGoogle Scholar
  4. Büning, H. K., & Zhao, X. (2001). Satisfiable formulas closed under replacement. Electronic Notes in Discrete Mathematics, 9, 48–58. CrossRefGoogle Scholar
  5. Bylander, T. (1998). Worst-case analysis of the perceptron and exponentiated update algorithms. Artificial Intelligence, 106(2), 335–352. MATHCrossRefMathSciNetGoogle Scholar
  6. Cesa-Bianchi, N. (1999). Analysis of two gradient-based algorithms for on-line regression. Journal of Computer and System Sciences, 59(3), 392–411. MATHCrossRefMathSciNetGoogle Scholar
  7. Chavira, M., & Darwiche, A. (2008, to appear). On probabilistic inference by weighted model counting. Artificial Intelligence. Google Scholar
  8. Chavira, M., Darwiche, A., & Jaeger, M. (2006). Compiling relational Bayesian networks for exact inference. International Journal of Approximate Reasoning, 42(1–2), 4–20. MATHCrossRefMathSciNetGoogle Scholar
  9. Chen, J., Kanj, I. A., & Xia, G. (2005). Simplicity is beauty: improved upper bounds for vertex cover (Tech. Rep. TR05-008). De Paul University, Chicago, IL. Google Scholar
  10. Costa, V. S., Page, D., Qazi, M., & Cussens, J. (2003). CLP(BN): constraint logic programming for probabilistic knowledge. In Proceedings of the nineteenth conference in uncertainty in artificial intelligence (pp. 517–524). Acapulco: Morgan Kaufmann. Google Scholar
  11. Cox, R. T. (1946). Probability, frequency, and reasonable expectation. American Journal of Physics, 14, 1–13. MATHCrossRefMathSciNetGoogle Scholar
  12. Cumby, C. M., & Roth, D. (2000). Relational representations that facilitate learning. In Proceedings or the seventeenth international conference on the principles of knowledge representation and reasoning (pp. 425–434). Breckenridge: Morgan Kaufmann. Google Scholar
  13. Darwiche, A. (2003). A differential approach to inference in Bayesian networks. Journal of the ACM, 50(3), 280–305. CrossRefMathSciNetGoogle Scholar
  14. Darwiche, A., & Marquis, P. (2002). A knowledge compilation map. Journal of Artificial Intelligence Research, 17, 229–264. MATHMathSciNetGoogle Scholar
  15. De Raedt, L., & Kersting, K. (2004). Probabilistic inductive logic programming. In Proceedings of the fifteenth international conference on algorithmic learning theory (pp. 19–36). Padova: Springer. Google Scholar
  16. Del Val, A. (2005). First order LUB approximations: characterization and algorithms. Artificial Intelligence, 162(1–2), 7–48. MATHMathSciNetGoogle Scholar
  17. Friedman, N., Getoor, L., Koller, D., & Pfeffer, A. (1999). Learning probabilistic relational models. In Proceedings of the sixteenth international joint conference on artificial intelligence (pp. 1300–1309). Stockholm: Morgan Kaufmann. Google Scholar
  18. Gentile, C. (2003). The robustness of the p-norm algorithms. Machine Learning, 53(3), 265–299. MATHCrossRefMathSciNetGoogle Scholar
  19. Getoor, L., & Taskar, B. (2007). Introduction to statistical relational learning. Cambridge: Cambridge University Press. MATHGoogle Scholar
  20. Greiner, R., Grove, A. J., & Schuurmans, D. (1997). Learning Bayesian nets that perform well. In Proceedings of the thirteenth conference on uncertainty in artificial intelligence (pp. 198–207). Providence: Morgan Kaufmann. Google Scholar
  21. Grove, A. J., Halpern, J. Y., & Koller, D. (1994). Random worlds and maximum entropy. Journal of Artificial Intelligence Research, 2, 33–88. MATHMathSciNetGoogle Scholar
  22. Grove, A. J., Littlestone, N., & Schuurmans, D. (2001). General convergence results for linear discriminant updates. Machine Learning, 43(3), 173–210. MATHCrossRefGoogle Scholar
  23. Halpern, J. Y. (2003). Reasoning about uncertainty. Cambridge: MIT Press. MATHGoogle Scholar
  24. Helmbold, D. P., Schapire, R. E., Singer, Y., & Warmuth, M. K. (1997). A comparison of new and old algorithms for a mixture estimation problem. Machine Learning, 27(1), 97–119. CrossRefGoogle Scholar
  25. Jaeger, M. (1997). Relational Bayesian networks. In Proceedings of the thirteenth conference on uncertainty in artificial intelligence (pp. 266–273). Providence: Morgan Kaufmann. Google Scholar
  26. Jaeger, M. (2000). On the complexity of inference about probabilistic relational models. Artificial Intelligence, 117(2), 297–308. MATHCrossRefMathSciNetGoogle Scholar
  27. Kersting, K. (2006). Frontiers in artificial intelligence and applications: Vol. 148. An inductive logic programming approach to statistical relational learning. Amsterdam: IOS Press. MATHGoogle Scholar
  28. Khardon, R. (1999). Learning to take actions. Machine Learning, 35(1), 57–90. MATHCrossRefGoogle Scholar
  29. Khardon, R., & Roth, D. (1997). Learning to reason. Journal of the ACM, 44(5), 697–725. MATHCrossRefMathSciNetGoogle Scholar
  30. Khardon, R., & Roth, D. (1999). Learning to reason with a restricted view. Machine Learning, 35(2), 95–116. MATHCrossRefGoogle Scholar
  31. Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1), 1–63. MATHCrossRefMathSciNetGoogle Scholar
  32. Kok, S., & Domingos, P. (2005). Learning the structure of Markov logic networks. In Proceedings of the twenty-second international conference in machine learning (pp. 441–448). Bonn: ACM. Google Scholar
  33. Liberatore, P. (1998). Compilation of intractable problems and its application to artificial intelligence. PhD thesis, Dipartimento di Informatica e Sistemistica, Università di Roma “La Sapienza”, Rome, Italy. Google Scholar
  34. Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Machine Learning, 2(4), 285–318. Google Scholar
  35. Littlestone, N. (1989). Mistake bounds and logarithmic linear-threshold learning algorithms. PhD thesis, University of California, Santa Cruz, CA. Google Scholar
  36. Manzano, M. (2005). Extensions of first-order logic. Cambridge: Cambridge University Press. Google Scholar
  37. Mihalkova, L., Huynh, T., & Mooney, R. J. (2007). Mapping and revising Markov logic networks for transfer learning. In Proceedings of the twenty-second AAAI conference on artificial intelligence (pp. 608–614). Vancouver: AAAI Press. Google Scholar
  38. Muggleton, S. (1996). Stochastic logic programs. In L. D. Readt (Ed.), Advances in inductive logic programming (pp. 254–264). Amsterdam: IOS Press. Google Scholar
  39. Ngo, L., & Haddawy, P. (1997). Answering queries from context-sensitive probabilistic knowledge bases. Theoretical Computer Science, 171(1–2), 147–177. MATHCrossRefMathSciNetGoogle Scholar
  40. Nishimura, N., Ragde, P., & Szeider, S. (2006). Solving #SAT using vertex covers. In Proceedings of the ninth international conference in theory and applications of satisfiability testing (pp. 396–409). Seattle: Springer. CrossRefGoogle Scholar
  41. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. San Mateo: Morgan Kaufmann. Google Scholar
  42. Pfeffer, A. (2000). Probabilistic reasoning for complex systems. PhD thesis, Computer Science Department, Stanford University, CA. Google Scholar
  43. Poole, D. (1993). Probabilistic horn abduction and Bayesian networks. Artificial Intelligence, 64, 81–129. MATHCrossRefGoogle Scholar
  44. Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine Learning, 62(1–2), 107–136. CrossRefGoogle Scholar
  45. Roth, D. (1996). On the hardness of approximate reasoning. Artificial Intelligence, 82(1–2), 273–302. CrossRefMathSciNetGoogle Scholar
  46. Ruan, Y., Kautz, H. A., & Horvitz, E. (2004). The backdoor key: a path to understanding problem hardness. In Proceedings of the nineteenth national conference on artificial intelligence (pp. 124–130). San Jose: AAAI Press. Google Scholar
  47. Sang, T., Beame, P., & Kautz, H. A. (2005). Performing Bayesian inference by weighted model counting. In Proceedings of the twentieth national conference on artificial intelligence (pp. 475–482). Pittsburgh: AAAI Press. Google Scholar
  48. Selman, B., & Kautz, H. A. (1996). Knowledge compilation and theory approximation. Journal of the ACM, 43(2), 193–224. MATHCrossRefMathSciNetGoogle Scholar
  49. Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Proceedings of the eighteenth conference in uncertainty in artificial intelligence (pp. 485–492). Edmonton: Morgan Kaufmann. Google Scholar
  50. Taskar, B., Chatalbashev, V., & Koller, D. (2004). Learning associative Markov networks. In Proceedings of the twenty-first international conference in machine learning. Banff: ACM. Google Scholar
  51. Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. In Advances in neural information processing systems 16. Vancouver: MIT Press. Google Scholar
  52. Valiant, L. G. (1994). Circuits of the mind. New York: Oxford University Press. MATHGoogle Scholar
  53. Valiant, L. G. (2000a). A neuroidal architecture for cognitive computation. Journal of the ACM, 47(5), 854–882. CrossRefMathSciNetGoogle Scholar
  54. Valiant, L. G. (2000b). Robust logics. Artificial Intelligence, 117(2), 231–253. MATHCrossRefMathSciNetGoogle Scholar
  55. Williams, R., Gomes, C. P., & Selman, B. (2003). Backdoors to typical case complexity. In Proceedings of the eighteenth international joint conference on artificial intelligence (pp. 1173–1178). Acapulco: Morgan Kaufmann. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.LIRMMUniversité Montpellier IIMontpellier Cedex 5France

Personalised recommendations