Advertisement

Machine Learning

, Volume 62, Issue 1–2, pp 107–136 | Cite as

Markov logic networks

  • Matthew Richardson
  • Pedro Domingos
Article

Abstract

We propose a simple approach to combining first-order logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a first-order knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a first-order formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudo-likelihood measure. Optionally, additional clauses are learned using inductive logic programming techniques. Experiments with a real-world database and knowledge base in a university domain illustrate the promise of this approach.

Keywords

Statistical relational learning Markov networks Markov random fields Log-linear models Graphical models First-order logic Satisfiability Inductive logic programming Knowledge-based model construction Markov chain Monte Carlo Pseudo-likelihood Link prediction 

References

  1. Bacchus, F. (1990). Representing and reasoning with probabilistic knowledge. Cambridge, MA: MIT Press.Google Scholar
  2. Bacchus, F., Grove, A. J., Halpern, J. Y., & Koller, D. (1996). From statistical knowledge bases to degrees of belief. Artificial Intelligence, 87, 75–143.MathSciNetCrossRefGoogle Scholar
  3. Bergadano, F., & Giordana, A. (1988). A knowledge-intensive approach to concept induction. Proceedings of the Fifth International Conference on Machine Learning (pp. 305–317). Ann Arbor, MI: Morgan Kaufmann.Google Scholar
  4. Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web. Scientific American, 284: 5, 34–43.Google Scholar
  5. Besag, J. (1975). Statistical analysis of non-lattice data. The Statistician, 24, 179–195.CrossRefGoogle Scholar
  6. Buntine, W. (1994). Operations for learning with graphical models. Journal of Artificial Intelligence Research, 2, 159–225.Google Scholar
  7. Byrd, R. H., Lu, P., & Nocedal, J. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific and Statistical Computing, 16, 1190–1208.MATHMathSciNetCrossRefGoogle Scholar
  8. Chakrabarti, S., Dom, B., & Indyk, P. (1998). Enhanced hypertext categorization using hyperlinks. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (pp. 307–318). Seattle, WA: ACM Press.Google Scholar
  9. Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing. Philadelphia, PA.Google Scholar
  10. Cumby, C., & Roth, D. (2003). Feature extraction languages for propositionalized relational learning. Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (pp. 24–31). Acapulco, Mexico: IJCAII.Google Scholar
  11. Cussens, J. (1999). Loglinear models for first-order probabilistic reasoning. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 126–133). Stockholm, Sweden: Morgan Kaufmann.Google Scholar
  12. Cussens, J. (2003). Individuals, relations and structures in probabilistic models. InProceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (pp. 32–36). Acapulco, Mexico: IJCAII.Google Scholar
  13. De Raedt, L., & Dehaspe, L. (1997). Clausal discovery. Machine Learning, 26, 99–146.MATHCrossRefGoogle Scholar
  14. DeGroot, M. H., & Schervish, M. J. (2002). Probability and statistics. Boston, MA: AddisonWesley. 3rd edition.Google Scholar
  15. Dehaspe, L. (1997). Maximum entropy modeling with clausal constraints. Proceedings of the Seventh International Workshop on Inductive Logic Programming (pp. 109–125). Prague, Czech Republic: Springer.Google Scholar
  16. Della Pietra, S., Della Pietra, V., & Lafferty, J. (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 380–392.CrossRefGoogle Scholar
  17. Dietterich, T., Getoor, L., & Murphy, K. (Eds.). (2003). Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields. Banff, Canada: IMLS.Google Scholar
  18. Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130.MATHCrossRefGoogle Scholar
  19. Džeroski, S., & Blockeel, H. (Eds.). (2004). Proceedings of the Third International Workshop on Multi-Relational Data Mining. Seattle, WA: ACM Press.Google Scholar
  20. Džeroski, S., & De Raedt, L. (2003). Special issue on multi-relational data mining: The current frontiers. SIGKDD Explorations, 5.Google Scholar
  21. Džeroski, S., De Raedt, L., & Wrobel, S. (Eds.). (2002). Proceedings of the First International Workshop on Multi-Relational Data Mining. Edmonton, Canada: ACM Press.Google Scholar
  22. Džeroski, S., De Raedt, L., & Wrobel, S. (Eds.). (2003). Proceedings of the Second International Workshop on Multi-Relational Data Mining. Washington, DC: ACM Press.Google Scholar
  23. Edwards, R., & Sokal, A. (1988). Generalization of the Fortuin-Kasteleyn-Swendsen-Wang representation and Monte Carlo algorithm. Physics Review D (pp. 2009–2012).Google Scholar
  24. Flake, G. W., Lawrence, S., & Giles, C. L. (2000). Efficient identification of Web communities. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 150–160). Boston, MA: ACM Press.Google Scholar
  25. Friedman, N., Getoor, L., Koller, D., & Pfeffer, A. (1999). Learning probabilistic relational models. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (pp. 1300–1307). Stockholm, Sweden: Morgan Kaufmann.Google Scholar
  26. Genesereth, M. R., & Nilsson, N. J. (1987). Logical foundations of artificial intelligence. San Mateo, CA: Morgan Kaufmann.Google Scholar
  27. Getoor, L., & Jensen, D. (Eds.). (2000).InProceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data. Austin, TX: AAAI Press.Google Scholar
  28. Getoor, L., & Jensen, D. (Eds.). (2003). In Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data. Acapulco, Mexico: IJCAII.Google Scholar
  29. Geyer, C. J., & Thompson, E. A. (1992). Constrained Monte Carlo maximum likelihood for dependent data. Journal of the Royal Statistical Society, Series B, 54, 657–699.Google Scholar
  30. Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (Eds.). (1996). Markov chain Monte Carlo in practice. London, UK: Chapman and Hall.Google Scholar
  31. Halpern, J. (1990). An analysis of first-order logics of probability. Artificial Intelligence, 46, 311–350.MATHMathSciNetCrossRefGoogle Scholar
  32. Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R., & Kadie, C. (2000). Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1, 49–75.CrossRefGoogle Scholar
  33. Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.MATHGoogle Scholar
  34. Heckerman, D., Meek, C., & Koller, D. (2004). Probabilistic entity-relationship models, PRMs, and plate models. In Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (pp. 55–60). Banff, Canada: IMLS.Google Scholar
  35. Hulten, G., & Domingos, P. (2002). Mining complex models from arbitrarily large databases in constant time. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 525–531). Edmonton, Canada: ACM Press.Google Scholar
  36. Jaeger, M. (1998). Reasoning about infinite random structures with relational Bayesian networks. Proceedings of the Sixth International Conference on Principles of Knowledge Representation and Reasoning. Trento, Italy: Morgan Kaufmann.Google Scholar
  37. Jaeger, M. (2000). On the complexity of inference about probabilistic relational models. Artificial Intelligence, 117, 297–308.MATHMathSciNetCrossRefGoogle Scholar
  38. Kautz, H., Selman, B., & Jiang, Y. (1997). A general stochastic approach to solving problems with hard and soft constraints. In D. Gu, J. Du & P. Pardalos (Eds.), The satisfiability problem: Theory and applications, (pp. 573–586). New York, NY: American Mathematical Society.Google Scholar
  39. Kersting, K., & De Raedt, L. (2001). Towards combining inductive logic programming with Bayesian networks. In Proceedings of the Eleventh International Conference on Inductive Logic Programming (pp. 118–131). Strasbourg, France: Springer.Google Scholar
  40. Laffar, J., & Lassez, J. (1987). Constraint logic programming. Proceedings of the Fourteenth ACM Conference on Principles of Programming Languages (pp. 111–119). Munich, Germany: ACM Press.Google Scholar
  41. Lavrač, N., & Džeroski, S. (1994). Inductive Logic Programming: Techniques and Applications. Chichester, UK: Ellis Horwood.Google Scholar
  42. Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45, 503–528.MATHMathSciNetCrossRefGoogle Scholar
  43. Lloyd, J. W. (1987). Foundations of logic programming. Berlin, Germany: Springer.Google Scholar
  44. Lloyd-Richardson, E., Kazura, A., Stanton, C., Niaura, R., & Papandonatos, G. (2002). Differentiating stages of smoking intensity among adolescents: Stage-specific psychological and social influences. Journal of Consulting and Clinical Psychology, 70.Google Scholar
  45. Milch, B., Marthi, B., & Russell, S. (2004). BLOG: Relational modeling with unknown objects. Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (pp. 67–73). Banff, Canada: IMLS.Google Scholar
  46. Muggleton, S. (1996). Stochastic logic programs. In L. De Raedt (Ed.), Advances in inductive logic programming (pp.254–264). Amsterdam, Netherlands: IOS Press.Google Scholar
  47. Neville, J., & Jensen, D. (2003). Collective classification with relational dependency networks. Proceedings of the Second International Workshop on Multi-Relational Data Mining (pp. 77–91). Washington, DC: ACM Press.Google Scholar
  48. Ngo, L., & Haddawy, P. (1997). Answering queries from context-sensitive probabilistic knowledge bases. Theoretical Computer Science, 171, 147–177.MATHMathSciNetCrossRefGoogle Scholar
  49. Nilsson, N. (1986). Probabilistic logic. Artificial Intelligence, 28, 71–87.MATHMathSciNetCrossRefGoogle Scholar
  50. Nocedal, J., & Wright, S. J. (1999). Numerical Optimization. New York, NY: Springer.Google Scholar
  51. Ourston, D., & Mooney, R. J. (1994). Theory refinement combining analytical and empirical methods. Artificial Intelligence, 66, 273–309.MATHMathSciNetCrossRefGoogle Scholar
  52. Parag, & Domingos, P. (2004). Multi-relational record linkage. In Proceedings of the Third International Workshop on Multi-Relational Data Mining. Seattle, WA: ACM Press.Google Scholar
  53. Paskin, M. (2002). Maximum entropy probabilistic logic (Technical Report UCB/CSD-01-1161). Computer Science Division, University of California, Berkeley, CA.Google Scholar
  54. Pasula, H., & Russell, S. (2001). Approximate inference for first-order probabilistic languages. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (pp. 741–748). Seattle, WA: Morgan Kaufmann.Google Scholar
  55. Pazzani, M., & Kibler, D. (1992). The utility of knowledge in inductive learning. Machine Learning, 9, 57–94.Google Scholar
  56. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann.Google Scholar
  57. Poole, D. (1993). Probabilistic Horn abduction and Bayesian networks. Artificial Intelligence, 64, 81–129.MATHCrossRefGoogle Scholar
  58. Poole, D. (2003). First-order probabilistic inference. Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (pp. 985–991). Acapulco, Mexico: Morgan Kaufmann.Google Scholar
  59. Popescul, A., & Ungar, L. H. (2003). Structural logistic regression for link analysis. In Proceedings of the Second International Workshop on Multi-Relational Data Mining (pp. 92–106). Washington, DC: ACM Press.Google Scholar
  60. Puech, A., & Muggleton, S. (2003). A comparison of stochastic logic programs and Bayesian logic programs. Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (pp. 121–129). Acapulco, Mexico: IJCAII.Google Scholar
  61. Richardson, M., & Domingos, P. (2003). Building large knowledge bases by mass collaboration. Proceedings of the Second International Conference on Knowledge Capture (pp. 129–137). Sanibel Island, FL: ACM Press.Google Scholar
  62. Riezler, S. (1998). Probabilistic constraint logic programming. Doctoral dissertation, University of Tubingen, Tubingen, Germany.Google Scholar
  63. Robinson, J. A. (1965). A machine-oriented logic based on the resolution principle. Journal of the ACM, 12, 23–41.MATHCrossRefGoogle Scholar
  64. Roth, D. (1996). On the hardness of approximate reasoning. Artificial Intelligence, 82, 273–302.MathSciNetCrossRefGoogle Scholar
  65. Sanghai, S., Domingos, P., & Weld, D. (2003). Dynamic probabilistic relational models. Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (pp. 992–997). Acapulco, Mexico: Morgan Kaufmann.Google Scholar
  66. Santos Costa, V., Page, D., Qazi, M., & Cussens, J. (2003). CLP(BN): Constraint logic programming for probabilistic knowledge. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (pp. 517–524). Acapulco, Mexico: Morgan Kaufmann.Google Scholar
  67. Sato, T., & Kameya, Y. (1997). PRISM: A symbolic-statistical modeling language. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (pp. 1330–1335). Nagoya, Japan: Morgan Kaufmann.Google Scholar
  68. Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (pp. 485–492). Edmonton, Canada: Morgan Kaufmann.Google Scholar
  69. Towell, G. G., & Shavlik, J. W. (1994). Knowledge-based artificial neural networks. Artificial Intelligence, 70, 119–165.MATHCrossRefGoogle Scholar
  70. Wasserman, S., & Faust, K. (1994). social Network Analysis: Methods and Applications. Cambridge, UK: Cambridge University Press.Google Scholar
  71. Wellman, M., Breese, J. S., & Goldman, R. P. (1992). From knowledge bases to decision models. Knowledge Engineering Review, 7.Google Scholar
  72. Winkler, W. (1999). The state of record linkage and current research problems. Technical Report, Statistical Research Division, U.S. Census Bureau.Google Scholar
  73. Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2001). Generalized belief propagation. In T. Leen, T. Dietterich and V. Tresp (Eds.), Advances in neural information processing systems 13, 689–695. Cambridge, MA: MIT Press.Google Scholar
  74. Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. (1997). Algorithm 778: L-BFGSB, FORTRAN routines for large scale bound constrained optimization. ACM Transactions on Mathematical Software, 23, 550–560.MATHMathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, Inc. 2006

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringUniversity of WashingtonSeattleUSA

Personalised recommendations