Feature Importance in Causal Inference for Numerical and Categorical Variables

  • Bram Minnaert
Part of the The Springer Series on Challenges in Machine Learning book series (SSCML)


Predicting whether A causes B (write A → B ) or B causes A from samples (X, Y) is a challenging task. Several methods have already been proposed when both A and B are numerical. However, when A and/or B are categorical, few studies have already been performed.

This paper aims to learn the causal direction between two variables by fitting the regressions of X on Y and Y on X with machine learning algorithm and giving preference to the direction that yields a better fit.

This paper will investigate which features are the most important when A/B is numerical/categorical. Via an ensemble method, it finds that the features that are important heavily depend on the different combination of numerical/categorical.


Causal inference Deterministic causal relations Random forest regression Graphical models Feature selection 



I would like to thank Kaggle and Chalearn to stir my interest into this topic [7] and I thank Isabelle Guyon and Mehreen Saeed for their assistance to make my source code portable.


Authors and Affiliations

  • Bram Minnaert
    • 1
  1. 1.ArcelorMittalGhentBelgium

