Feature Importance in Causal Inference for Numerical and Categorical Variables

  • Bram Minnaert
Part of the The Springer Series on Challenges in Machine Learning book series (SSCML)


Predicting whether A causes B (write A → B ) or B causes A from samples (X, Y) is a challenging task. Several methods have already been proposed when both A and B are numerical. However, when A and/or B are categorical, few studies have already been performed.

This paper aims to learn the causal direction between two variables by fitting the regressions of X on Y and Y on X with machine learning algorithm and giving preference to the direction that yields a better fit.

This paper will investigate which features are the most important when A/B is numerical/categorical. Via an ensemble method, it finds that the features that are important heavily depend on the different combination of numerical/categorical.


Causal inference Deterministic causal relations Random forest regression Graphical models Feature selection 



I would like to thank Kaggle and Chalearn to stir my interest into this topic [7] and I thank Isabelle Guyon and Mehreen Saeed for their assistance to make my source code portable.


  1. 1.
    Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001. ISSN 0885-6125. URL
  2. 2.
    Rich Caruana and Alexandru Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pages 161–168, New York, NY, USA, 2006. ACM. ISBN 1-59593-383-2. URL
  3. 3.
    Povilas Daniušis, Dominik Janzing, Joris M. Mooij, Jakob Zscheischler, Bastian Steudel, Kun Zhang, and Bernhard Schölkopf. Inferring deterministic causal relations. In Proceedings of the 26th Annual Conference on Uncertainty in Artificial Intelligence (UAI-10), 2010. URL
  4. 4.
    Isabelle Guyon et al. Results and analysis of the 2013 chalearn cause-effect pair challenge. 2014.Google Scholar
  5. 5.
    Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189–1232, 2000.MathSciNetCrossRefGoogle Scholar
  6. 6.
    Jerome H. Friedman. Stochastic gradient boosting. Comput. Stat. Data Anal., 38(4):367–378, February 2002. ISSN 0167-9473. URL
  7. 7.
    Isbelle Guyon. Cause-effect pairs challenge, 2013. Isabelle Guyon (ChaLearn) and Ben Hamner (Kaggle) and Alexander Statnikov (NYU) and Mikael Henaff (NYU) and Vincent Lemaire (Orange) and Bernhard Shoelkopf (MPI).Google Scholar
  8. 8.
    Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. Nonlinear causal discovery with additive noise models. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21 (NIPS*2008), pages 689–696, 2009.Google Scholar
  9. 9.
    J. D. Hunter. Matplotlib: A 2d graphics environment. Computing In Science & Engineering, 9(3):90–95, 2007.CrossRefGoogle Scholar
  10. 10.
    Dominik Janzing, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniušis, Bastian Steudel, and Bernhard Schölkopf. Information-geometric approach to inferring causal directions. Artif. Intell., 182–183:1–31, May 2012. ISSN 0004-3702. URL
  11. 11.
    Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientific tools for Python, 2001–. URL
  12. 12.
    Joris M. Mooij, Oliver Stegle, Dominik Janzing, Kun Zhang, and Bernhard Schölkopf. Probabilistic latent variable models for distinguishing between cause and effect. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23 (NIPS*2010), pages 1687–1695, 2010. URL
  13. 13.
    Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. A linear non-gaussian acyclic model for causal discovery. J. Mach. Learn. Res., 7:2003–2030, December 2006. ISSN 1532-4435. URL
  14. 14.
    Xiaohai Sun, Dominik Janzing, and Bernhard Schölkopf. Causal inference by choosing graphs with most plausible Markov kernels. In ISAIM, 2006. URL
  15. 15.
    K Zhang and A Hyvärinen. Distinguishing causes from effects using nonlinear acyclic causal models. In I Guyon, D Janzing, and B Schölkopf, editors, JMLR Workshop and Conference Proceedings, Volume 6, pages 157–164, Cambridge, MA, USA, 2010. MIT Press. URL.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Bram Minnaert
    • 1
  1. 1.ArcelorMittalGhentBelgium

Personalised recommendations