Abstract
Occam’s razor directs us to adopt the simplest hypothesis consistent with the evidence. Learning theory provides a precise definition of the inductive simplicity of a hypothesis for a given learning problem. This definition specifies a learning method that implements an inductive version of Occam’s razor. As a case study, we apply Occam’s inductive razor to causal learning. We consider two causal learning problems: learning a causal graph structure that presents global causal connections among a set of domain variables, and learning context-sensitive causal relationships that hold not globally, but only relative to a context. For causal graph learning, Occam’s inductive razor directs us to adopt the model that explains the observed correlations with a minimum number of direct causal connections. For expanding a causal graph structure to include context-sensitive relationships, Occam’s inductive razor directs us to adopt the expansion that explains the observed correlations with a minimum number of free parameters. This is equivalent to explaining the correlations with a minimum number of probabilistic logical rules. The paper provides a gentle introduction to the learning-theoretic definition of inductive simplicity and the application of Occam’s razor for causal learning.
Keywords
Causal graph Bayesian network Formal learning theory Mind change bounds Probabilistic clausesPreview
Unable to display preview. Download preview PDF.
Notes
Acknowledgements
This research was supported by an NSERC discovery grant to the author. Preliminary results were presented at the Center for Formal Epistemology at Carnegie Mellon University. The author is grateful to the audience at the Center for helpful comments.
References
- 1.Boutilier, C., T. L. Dean, and S. Hanks, Decision-theoretic planning: Structural assumptions and computational leverage, Journal of Artificial Intelligence Research (JAIR) 11:1–94, 1999.CrossRefGoogle Scholar
- 2.Boutilier, C., N. Friedman, M. Goldszmidt, and D. Koller, Context-specific independence in Bayesian networks, in UAI, 1996, pp. 115–123.Google Scholar
- 3.Case, J., and C. Smith, Comparison of identification criteria for machine inductive inference, Theoretical Computer Science 25:193–220, 1983.CrossRefGoogle Scholar
- 4.Chickering, D., Optimal structure identification with greedy search, Journal of Machine Learning Research 3:507–554, 2003.Google Scholar
- 5.Cooper, G., An overview of the representation and discovery of causal relationships using Bayesian networks, in C. Glymour and G. Cooper, (eds.), Computation, Causation, and Discovery, AAAI Press/The MIT Press, Cambridge, 1999, pp. 4–62.Google Scholar
- 6.de Campos, L. M., A scoring function for learning Bayesian networks based on mutual information and conditional independence tests, Journal of Machine Learning Research 7:2149–2187, 2006.Google Scholar
- 7.Dowe, D. L., MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness, in Handbook of Philosophy of Science, volume 7: Handbook of Philosophy of Statistics, Elsevier, 2011.Google Scholar
- 8.Friedman, N., and M. Goldszmidt, Learning Bayesian networks with local structures, in Proceedings of the NATO Advanced Study Institute on Learning in graphical models, Norwell, MA, USA, Kluwer Academic Publishers, 1998, pp. 421–459.Google Scholar
- 9.Geiger, D., and D. Heckerman, Knowledge representation and inference in similarity networks and Bayesian multinets, Artificial Intelligence 82(1-2):45–74, 1996.CrossRefGoogle Scholar
- 10.Genin, K., and K. T. Kelly, The topology of statistical verifiability, in Proceedings Conference on Theoretical Aspects of Rationality and Knowledge, TARK, 2017, pp. 236–250.CrossRefGoogle Scholar
- 11.Giere, R. N., The significance test controversy, The British Journal for the Philosophy of Science 23(2):170–181, 1972.CrossRefGoogle Scholar
- 12.Glymour, C., On the methods of cognitive neuropsychology, British Journal for the Philosophy of Science 45:815–835, 1994.CrossRefGoogle Scholar
- 13.Gold, E. M., Language identification in the limit, Information and Control 10(5):447–474, 1967.CrossRefGoogle Scholar
- 14.Heckerman, D., A tutorial on learning with Bayesian networks, in Proceedings of the NATO Advanced Study Institute on Learning in graphical models, 1998, pp. 301–354.CrossRefGoogle Scholar
- 15.Jain, S., D. Osherson, J. S. Royer, and A. Sharma, Systems that Learn, 2 edition, MIT Press, Cambridge, 1999.Google Scholar
- 16.Kelly, K., The Logic of Reliable Inquiry, Oxford University Press, Oxford, 1996.Google Scholar
- 17.Kelly, K., Justification as truth-finding efficiency: How Ockham’s razor works, Minds and Machines 14(4):485–505, 2004.CrossRefGoogle Scholar
- 18.Kelly, K., Why probability does not capture the logic of scientific justification, in C. Hitchcock, (ed.), Contemporary Debates in the philosophy of Science, Wiley-Blackwell, London, 2004, pp. 94–114.Google Scholar
- 19.Kelly, K. T., and C. Mayo-Wilson, Causal conclusions that flip repeatedly and their justification, in UAI, 2010, pp. 277–285.Google Scholar
- 20.Khosravi, H., O. Schulte, J. Hu, and T. Gao, Learning compact Markov logic networks with decision trees, Machine Learning 89(3):257–277, 2012.CrossRefGoogle Scholar
- 21.Lauritzen, S. L., and D. J. Spiegelhalter, Local computations with probabilities on graphical structures and their application to expert systems, Journal of the Royal Statistics Society B 50(2):157–194, 1988.Google Scholar
- 22.Lucas, J. F., Introduction to Abstract Mathematics, Rowman & Littlefield, Lanham, 1990.Google Scholar
- 23.Luo, W., Learning Bayesian networks in semi-deterministic systems, in Canadian AI 2006, number 4013 in LNAI, Springer-Verlag, 2006, pp. 230–241.Google Scholar
- 24.Luo, W., and O. Schulte, Mind change efficient learning, Information and Computation 204:989–1011, 2006.CrossRefGoogle Scholar
- 25.Martin, E., and D. N. Osherson, Elements of Scientific Inquiry, The MIT Press, Cambridge, Massachusetts, 1998.Google Scholar
- 26.Meek, C., Graphical Models: Selecting causal and statistical models, Ph.D. thesis, Carnegie Mellon University, 1997.Google Scholar
- 27.Ngo, L., and P. Haddawy, Answering queries from context-sensitive probabilistic knowledge bases, Theoretical Computer Science 171(1-2):147–177, 1997.CrossRefGoogle Scholar
- 28.Pearl, J., Probabilistic Reasoning in Intelligent Systems, Morgan Kauffmann, San Mateo, CA, 1988.Google Scholar
- 29.Pearl, J., Causality: Models, Reasoning, and Inference, Cambridge university press, Cambridge, 2000.Google Scholar
- 30.Provost, F. J., and P. Domingos, Tree induction for probability-based ranking, Machine Learning 52(3):199–215, 2003.CrossRefGoogle Scholar
- 31.Putnam, H., Trial and error predicates and the solution to a problem of Mostowski, The Journal of Symbolic Logic 30(1):49–57, 1965.CrossRefGoogle Scholar
- 32.Schulte, O., Means-ends epistemology epistemology, The British Journal for the Philosophy of Science 79(1):141–147, 1996.Google Scholar
- 33.Schulte, O., Discussion. What to believe and what to take seriously: A reply to David Chart concerning the riddle of induction, The British Journal for the Philosophy of Science 51(1):151–153, 2000.CrossRefGoogle Scholar
- 34.Schulte, O., The co-discovery of conservation laws and particle families, Studies in the History and Philosophy of Modern Physics 39(2):288–314, 2008.CrossRefGoogle Scholar
- 35.Schulte, O., G. Frigo, R. Greiner, and H. Khosravi, The IMAP hybrid method for learning Gaussian Bayes nets, in A. Farzindar, and V. Keselj, (eds.), Canadian Conference on AI, volume 6085 of Lecture Notes in Computer Science, Springer, 2010, pp. 123–134.Google Scholar
- 36.Schulte, O., W. Luo, and R. Greiner, Mind-change optimal learning of Bayes net structure from dependency and independency data, Information and Computation 208:63–82, 2010.CrossRefGoogle Scholar
- 37.Spirtes, P., C. Glymour, and R. Scheines, Causation, prediction, and search, MIT Press, Cambridge, 2000.Google Scholar
- 38.Studeny, M., Probabilistic Conditional Independence Structures, Springer, Berlin, 2005.Google Scholar
- 39.Tsamardinos, I., L. E. Brown, and C. Aliferis, The max-min hill-climbing Bayesian network structure learning algorithm, Machine Learning 65(1):31–78, 2006.CrossRefGoogle Scholar
- 40.Verma, T. S., and J. Pearl, Equivalence and synthesis of causal models, in Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI 1990), 1990, pp. 220–227.Google Scholar
- 41.Xiang, Y., S. K. Wong, and N. Cercone, Critical remarks on single link search in learning belief networks, in Proceedings of the 12th Annual Conference on Uncertainty in Artificial Intelligence (UAI 1996), 1996, pp. 564–571.Google Scholar