Studia Logica

, Volume 107, Issue 5, pp 991–1023 | Cite as

Causal Learning with Occam’s Razor

  • Oliver SchulteEmail author


Occam’s razor directs us to adopt the simplest hypothesis consistent with the evidence. Learning theory provides a precise definition of the inductive simplicity of a hypothesis for a given learning problem. This definition specifies a learning method that implements an inductive version of Occam’s razor. As a case study, we apply Occam’s inductive razor to causal learning. We consider two causal learning problems: learning a causal graph structure that presents global causal connections among a set of domain variables, and learning context-sensitive causal relationships that hold not globally, but only relative to a context. For causal graph learning, Occam’s inductive razor directs us to adopt the model that explains the observed correlations with a minimum number of direct causal connections. For expanding a causal graph structure to include context-sensitive relationships, Occam’s inductive razor directs us to adopt the expansion that explains the observed correlations with a minimum number of free parameters. This is equivalent to explaining the correlations with a minimum number of probabilistic logical rules. The paper provides a gentle introduction to the learning-theoretic definition of inductive simplicity and the application of Occam’s razor for causal learning.


Causal graph Bayesian network Formal learning theory Mind change bounds Probabilistic clauses 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



This research was supported by an NSERC discovery grant to the author. Preliminary results were presented at the Center for Formal Epistemology at Carnegie Mellon University. The author is grateful to the audience at the Center for helpful comments.


  1. 1.
    Boutilier, C., T. L. Dean, and S. Hanks, Decision-theoretic planning: Structural assumptions and computational leverage, Journal of Artificial Intelligence Research (JAIR) 11:1–94, 1999.CrossRefGoogle Scholar
  2. 2.
    Boutilier, C., N. Friedman, M. Goldszmidt, and D. Koller, Context-specific independence in Bayesian networks, in UAI, 1996, pp. 115–123.Google Scholar
  3. 3.
    Case, J., and C. Smith, Comparison of identification criteria for machine inductive inference, Theoretical Computer Science 25:193–220, 1983.CrossRefGoogle Scholar
  4. 4.
    Chickering, D., Optimal structure identification with greedy search, Journal of Machine Learning Research 3:507–554, 2003.Google Scholar
  5. 5.
    Cooper, G., An overview of the representation and discovery of causal relationships using Bayesian networks, in C. Glymour and G. Cooper, (eds.), Computation, Causation, and Discovery, AAAI Press/The MIT Press, Cambridge, 1999, pp. 4–62.Google Scholar
  6. 6.
    de Campos, L. M., A scoring function for learning Bayesian networks based on mutual information and conditional independence tests, Journal of Machine Learning Research 7:2149–2187, 2006.Google Scholar
  7. 7.
    Dowe, D. L., MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness, in Handbook of Philosophy of Science, volume 7: Handbook of Philosophy of Statistics, Elsevier, 2011.Google Scholar
  8. 8.
    Friedman, N., and M. Goldszmidt, Learning Bayesian networks with local structures, in Proceedings of the NATO Advanced Study Institute on Learning in graphical models, Norwell, MA, USA, Kluwer Academic Publishers, 1998, pp. 421–459.Google Scholar
  9. 9.
    Geiger, D., and D. Heckerman, Knowledge representation and inference in similarity networks and Bayesian multinets, Artificial Intelligence 82(1-2):45–74, 1996.CrossRefGoogle Scholar
  10. 10.
    Genin, K., and K. T. Kelly, The topology of statistical verifiability, in Proceedings Conference on Theoretical Aspects of Rationality and Knowledge, TARK, 2017, pp. 236–250.CrossRefGoogle Scholar
  11. 11.
    Giere, R. N., The significance test controversy, The British Journal for the Philosophy of Science 23(2):170–181, 1972.CrossRefGoogle Scholar
  12. 12.
    Glymour, C., On the methods of cognitive neuropsychology, British Journal for the Philosophy of Science 45:815–835, 1994.CrossRefGoogle Scholar
  13. 13.
    Gold, E. M., Language identification in the limit, Information and Control 10(5):447–474, 1967.CrossRefGoogle Scholar
  14. 14.
    Heckerman, D., A tutorial on learning with Bayesian networks, in Proceedings of the NATO Advanced Study Institute on Learning in graphical models, 1998, pp. 301–354.CrossRefGoogle Scholar
  15. 15.
    Jain, S., D. Osherson, J. S. Royer, and A. Sharma, Systems that Learn, 2 edition, MIT Press, Cambridge, 1999.Google Scholar
  16. 16.
    Kelly, K., The Logic of Reliable Inquiry, Oxford University Press, Oxford, 1996.Google Scholar
  17. 17.
    Kelly, K., Justification as truth-finding efficiency: How Ockham’s razor works, Minds and Machines 14(4):485–505, 2004.CrossRefGoogle Scholar
  18. 18.
    Kelly, K., Why probability does not capture the logic of scientific justification, in C. Hitchcock, (ed.), Contemporary Debates in the philosophy of Science, Wiley-Blackwell, London, 2004, pp. 94–114.Google Scholar
  19. 19.
    Kelly, K. T., and C. Mayo-Wilson, Causal conclusions that flip repeatedly and their justification, in UAI, 2010, pp. 277–285.Google Scholar
  20. 20.
    Khosravi, H., O. Schulte, J. Hu, and T. Gao, Learning compact Markov logic networks with decision trees, Machine Learning 89(3):257–277, 2012.CrossRefGoogle Scholar
  21. 21.
    Lauritzen, S. L., and D. J. Spiegelhalter, Local computations with probabilities on graphical structures and their application to expert systems, Journal of the Royal Statistics Society B 50(2):157–194, 1988.Google Scholar
  22. 22.
    Lucas, J. F., Introduction to Abstract Mathematics, Rowman & Littlefield, Lanham, 1990.Google Scholar
  23. 23.
    Luo, W., Learning Bayesian networks in semi-deterministic systems, in Canadian AI 2006, number 4013 in LNAI, Springer-Verlag, 2006, pp. 230–241.Google Scholar
  24. 24.
    Luo, W., and O. Schulte, Mind change efficient learning, Information and Computation 204:989–1011, 2006.CrossRefGoogle Scholar
  25. 25.
    Martin, E., and D. N. Osherson, Elements of Scientific Inquiry, The MIT Press, Cambridge, Massachusetts, 1998.Google Scholar
  26. 26.
    Meek, C., Graphical Models: Selecting causal and statistical models, Ph.D. thesis, Carnegie Mellon University, 1997.Google Scholar
  27. 27.
    Ngo, L., and P. Haddawy, Answering queries from context-sensitive probabilistic knowledge bases, Theoretical Computer Science 171(1-2):147–177, 1997.CrossRefGoogle Scholar
  28. 28.
    Pearl, J., Probabilistic Reasoning in Intelligent Systems, Morgan Kauffmann, San Mateo, CA, 1988.Google Scholar
  29. 29.
    Pearl, J., Causality: Models, Reasoning, and Inference, Cambridge university press, Cambridge, 2000.Google Scholar
  30. 30.
    Provost, F. J., and P. Domingos, Tree induction for probability-based ranking, Machine Learning 52(3):199–215, 2003.CrossRefGoogle Scholar
  31. 31.
    Putnam, H., Trial and error predicates and the solution to a problem of Mostowski, The Journal of Symbolic Logic 30(1):49–57, 1965.CrossRefGoogle Scholar
  32. 32.
    Schulte, O., Means-ends epistemology epistemology, The British Journal for the Philosophy of Science 79(1):141–147, 1996.Google Scholar
  33. 33.
    Schulte, O., Discussion. What to believe and what to take seriously: A reply to David Chart concerning the riddle of induction, The British Journal for the Philosophy of Science 51(1):151–153, 2000.CrossRefGoogle Scholar
  34. 34.
    Schulte, O., The co-discovery of conservation laws and particle families, Studies in the History and Philosophy of Modern Physics 39(2):288–314, 2008.CrossRefGoogle Scholar
  35. 35.
    Schulte, O., G. Frigo, R. Greiner, and H. Khosravi, The IMAP hybrid method for learning Gaussian Bayes nets, in A. Farzindar, and V. Keselj, (eds.), Canadian Conference on AI, volume 6085 of Lecture Notes in Computer Science, Springer, 2010, pp. 123–134.Google Scholar
  36. 36.
    Schulte, O., W. Luo, and R. Greiner, Mind-change optimal learning of Bayes net structure from dependency and independency data, Information and Computation 208:63–82, 2010.CrossRefGoogle Scholar
  37. 37.
    Spirtes, P., C. Glymour, and R. Scheines, Causation, prediction, and search, MIT Press, Cambridge, 2000.Google Scholar
  38. 38.
    Studeny, M., Probabilistic Conditional Independence Structures, Springer, Berlin, 2005.Google Scholar
  39. 39.
    Tsamardinos, I., L. E. Brown, and C. Aliferis, The max-min hill-climbing Bayesian network structure learning algorithm, Machine Learning 65(1):31–78, 2006.CrossRefGoogle Scholar
  40. 40.
    Verma, T. S., and J. Pearl, Equivalence and synthesis of causal models, in Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI 1990), 1990, pp. 220–227.Google Scholar
  41. 41.
    Xiang, Y., S. K. Wong, and N. Cercone, Critical remarks on single link search in learning belief networks, in Proceedings of the 12th Annual Conference on Uncertainty in Artificial Intelligence (UAI 1996), 1996, pp. 564–571.Google Scholar

Copyright information

© Springer Nature B.V. 2018

Authors and Affiliations

  1. 1.School of Computing ScienceSimon Fraser UniversityBurnabyCanada

Personalised recommendations