Feature Importance in Causal Inference for Numerical and Categorical Variables

Minnaert, Bram

doi:10.1007/978-3-030-21810-2_13

Feature Importance in Causal Inference for Numerical and Categorical Variables

Bram Minnaert⁷

Chapter
First Online: 23 October 2019

1038 Accesses

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

Abstract

Predicting whether A causes B (write A → B ) or B causes A from samples (X, Y) is a challenging task. Several methods have already been proposed when both A and B are numerical. However, when A and/or B are categorical, few studies have already been performed.

This paper aims to learn the causal direction between two variables by fitting the regressions of X on Y and Y on X with machine learning algorithm and giving preference to the direction that yields a better fit.

This paper will investigate which features are the most important when A/B is numerical/categorical. Via an ensemble method, it finds that the features that are important heavily depend on the different combination of numerical/categorical.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001. ISSN 0885-6125. URL http://dx.doi.org/10.1023/A:1010933404324.
Rich Caruana and Alexandru Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pages 161–168, New York, NY, USA, 2006. ACM. ISBN 1-59593-383-2. URL http://doi.acm.org/10.1145/1143844.1143865.
Povilas Daniušis, Dominik Janzing, Joris M. Mooij, Jakob Zscheischler, Bastian Steudel, Kun Zhang, and Bernhard Schölkopf. Inferring deterministic causal relations. In Proceedings of the 26th Annual Conference on Uncertainty in Artificial Intelligence (UAI-10), 2010. URL http://event.cwi.nl/uai2010/papers/UAI2010_0121.pdf.
Isabelle Guyon et al. Results and analysis of the 2013 chalearn cause-effect pair challenge. 2014.
Google Scholar
Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189–1232, 2000.
Article MathSciNet Google Scholar
Jerome H. Friedman. Stochastic gradient boosting. Comput. Stat. Data Anal., 38(4):367–378, February 2002. ISSN 0167-9473. URL http://dx.doi.org/10.1016/S0167-9473(01)00065-2.
Isbelle Guyon. Cause-effect pairs challenge, 2013. Isabelle Guyon (ChaLearn) and Ben Hamner (Kaggle) and Alexander Statnikov (NYU) and Mikael Henaff (NYU) and Vincent Lemaire (Orange) and Bernhard Shoelkopf (MPI).
Google Scholar
Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. Nonlinear causal discovery with additive noise models. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21 (NIPS*2008), pages 689–696, 2009.
Google Scholar
J. D. Hunter. Matplotlib: A 2d graphics environment. Computing In Science & Engineering, 9(3):90–95, 2007.
Article Google Scholar
Dominik Janzing, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniušis, Bastian Steudel, and Bernhard Schölkopf. Information-geometric approach to inferring causal directions. Artif. Intell., 182–183:1–31, May 2012. ISSN 0004-3702. URL http://dx.doi.org/10.1016/j.artint.2012.01.002.
Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientific tools for Python, 2001–. URL http://www.scipy.org/.
Joris M. Mooij, Oliver Stegle, Dominik Janzing, Kun Zhang, and Bernhard Schölkopf. Probabilistic latent variable models for distinguishing between cause and effect. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23 (NIPS*2010), pages 1687–1695, 2010. URL http://books.nips.cc/papers/files/nips23/NIPS2010_1270.pdf.
Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. A linear non-gaussian acyclic model for causal discovery. J. Mach. Learn. Res., 7:2003–2030, December 2006. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=1248547.1248619.
Xiaohai Sun, Dominik Janzing, and Bernhard Schölkopf. Causal inference by choosing graphs with most plausible Markov kernels. In ISAIM, 2006. URL http://dblp.uni-trier.de/db/conf/isaim/isaim2006.html#SunJS06.
K Zhang and A Hyvärinen. Distinguishing causes from effects using nonlinear acyclic causal models. In I Guyon, D Janzing, and B Schölkopf, editors, JMLR Workshop and Conference Proceedings, Volume 6, pages 157–164, Cambridge, MA, USA, 2010. MIT Press. URL.
Google Scholar

Download references

Acknowledgements

I would like to thank Kaggle and Chalearn to stir my interest into this topic [7] and I thank Isabelle Guyon and Mehreen Saeed for their assistance to make my source code portable.

Author information

Authors and Affiliations

ArcelorMittal, Ghent, Belgium
Bram Minnaert

Authors

Bram Minnaert
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Team TAU - CNRS, INRIA, Université Paris Sud, Université Paris Saclay, Orsay France, ChaLearn, Berkeley, CA, USA
Isabelle Guyon
SoFi, San Francisco, CA, USA
Alexander Statnikov
University of Paris-Sud, Paris-Saclay, Paris, France
Berna Bakir Batu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Minnaert, B. (2019). Feature Importance in Causal Inference for Numerical and Categorical Variables. In: Guyon, I., Statnikov, A., Batu, B. (eds) Cause Effect Pairs in Machine Learning. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-21810-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-21810-2_13
Published: 23 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21809-6
Online ISBN: 978-3-030-21810-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics