Evaluation Methods of Cause-Effect Pairs

Guyon, Isabelle; Goudet, Olivier; Kalainathan, Diviyan

doi:10.1007/978-3-030-21810-2_2

Evaluation Methods of Cause-Effect Pairs

Isabelle Guyon^7,8,
Olivier Goudet⁹ &
Diviyan Kalainathan

Chapter
First Online: 23 October 2019

960 Accesses
1 Citations

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

Abstract

This chapter addresses the problem of benchmarking causal models or validating particular putative causal relationships, in the limited setting of cause-effect pairs, when empirical “observational” data are available. We do not address experimental validations e.g. via randomized controlled trials. Our goal is to compare methods, which provide a score C(X, Y ), called causation coefficient, rating a pair of variable (X, Y ) for being in a potential causal relationship X → Y . Causation coefficients may be used for various purposes, including to prioritize experiments, which may be costly or risky, or guiding decision makers in domains in which experiments are infeasible or unethical. We provide a methodology to evaluate their reliability. We take three points of views: (1) that of algorithm developers who must justify the soundness of their method, particularly with respect to identifiability and consistency, (2) that of practitioners who seek to understand on what basis algorithms make their decisions and evaluate their statistical significance, and (3) that of benchmark organizers who desire to make fair evaluations to compare methods. We adopt the framework of pattern recognition in which pairs of variable (X, Y ) and their ground truth causal graph are drawn i.i.d. from a “mother distribution”. This leads us to define new notions of probabilistic identifiability, Bayes optimal causation coefficients, and multi-part statistical tests. These new notions are evaluated on the data of the first cause-effect pair challenge. We also compile a list of resources, including datasets of real or synthetic pairs, and data generative models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In case of time series, {(x ₁, y ₁), (x ₂, y ₂), ⋯ , (x _n, y _n)} are time ordered and not drawn i.i.d..
2.
The difference with the general case is outlined in blue: the noise is additive. It is usually assumed that the noise and the input are independent in the ANM (noted X ⊥ N).
3.
However, we remind the reader that several data generative processes may generate the same data distribution.
4.
In our examples, we perform a piece-wise constant fit using m = 20 equally spaced points. If we believe that the fit should be better in the “causal direction”, then $R^2 = R^2_y - R^2_x$ should be positive if X → Y ), since $R^2_y$ is the residual of the fit $\hat y = f(x)$.
5.
Using an ANM model (Eq. (2.5)), we expect that input and residual of regression (our estimation of the noise) are independent, i.e. if X → Y , R _y ⊥ X. We use a kernel independence test statistic [25] to calculate I(X, Y ); larger values of I(X, Y ) mean a greater confidence that X and Y are independent. For example, I(., .) can be the HSIC statistic.
6.
According to the Information Geometry Causal Inference (IGCI) principle, if (X → Y ) and there is independence between “causal mechanism” and P(X), then, under some conditions (e.g. no noise and invertible mechanism), H _x ≥ H _y [36].
7.
According to IGCI principle again, if X → Y , under the same conditions as for the entropy criterion, S _y ≥ S _x [36].
8.
CDS measures variations in conditional distribution. The idea is that, for X → Y , if X is independent of the noise, then, after normalizing the support P(Y |X) should not vary a lot. For additive noise models, C _CDS should be similar to C _IR (but not always, because of support normalization). In the case of multiplicative noise, C _CDS should capture the independence of noise and input, where C _IR usually fails.
9.
If Π takes continuous values, $P_{\mathscr {M}} \Big ( P_\varPi (X, Y) \Big )$ and $P_{\mathscr {M}} \Big ( P_\varPi (X, Y)~|~G \Big )$ should be understood as densities rather than a distributions.
10.
Except possibly for a finite subset or a subset of measure 0.
11.
https://competitions.codalab.org/competitions/1381.
12.
This could easily be refined in a number of ways, including restricting the mother distribution $P_{\mathscr {M}}$ to scatter plots with the exact same number of samples as the pairs to be tested. In the dataset of the cause-effect pair challenge that we use as $P_{\mathscr {M}}$, the number of samples vary between 500 and 5000.
13.
http://www.cs.huji.ac.il/~galel/Repository/.
14.
http://dreamchallenges.org.
15.
Causality workbench: http://www.causality.inf.ethz.ch/. Founding members: Constantin F. Aliferis (Vanderbilt University, Tennessee), Gregory F. Cooper (University of Pittsburgh, Pennsylvania), André Elisseeff (IBM Research, Switzerland), Jean-Philippe Pellet (IBM Research, Switzerland), Alexander Statnikov (Vanderbilt University, Tennessee), Peter Spirtes (Carnegie Mellon University, Pennsylvania).
16.
https://ei.is.tuebingen.mpg.de/.
17.
Causality Workbench repo: http://www.causality.inf.ethz.ch/repository.php.
18.
Connectomics challenge: http://connectomics.chalearn.org/.
19.
Bio-model database: http://www.ebi.ac.uk/biomodels-main/.
20.
Chemical plant simulator: http://depts.washington.edu/control/LARRY/TE/download.html.
21.
CMU case studies: https://www.cmu.edu/dietrich/philosophy/events/workshops-conferences/causal-discovery/index.html.
22.
http://causeme.uv.es/index.html.

References

Constantin F. Aliferis, Ioannis Tsamardinos, Alexander R. Statnikov, and Laura E. Brown. Causal explorer: A causal probabilistic network learning toolkit for biomedical discovery. In Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, METMBS ’03, June 23 - 26, 2003, Las Vegas, Nevada, USA, pages 371–376, 2003.
Google Scholar
Demian Battaglia, Isabelle Guyon, Vincent Lemaire, Javier Orlandi, Bisakha Ray, and Jordi Soriano. Neural Connectomics Challenge. Springer Publishing Company, Incorporated, 1st edition, 2017. ISBN 3319530690, 9783319530697.
Google Scholar
J A Blackard and D J Dean. Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Computers and Electronics in Agriculture, vol.24:131–151, 1999.
Article Google Scholar
Patrick Bloebaum, Dominik Janzing, Takashi Washio, Shohei Shimizu, and Bernhard Schoelkopf. Cause-effect inference by comparing regression errors. In International Conference on Artificial Intelligence and Statistics, pages 900–909, 2018.
Google Scholar
C Bonferroni. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8:3–62, 1936.
MATH Google Scholar
Gianluca Bontempi. From dependency to causality: a machine learning approach. In Proc. NIPS 2013 workshop on causality, http://clopinet.com/isabelle/Projects/NIPS2013/, December 2013.
P. B. Burns, R. J. Rohrich, and K. C. Chung. The levels of evidence and their role in evidence-based medicine. Plastic and reconstructive surgery, 128(1):305–310, 2011.
Article Google Scholar
Krzysztof Chalupka, Frederick Eberhardt, and Pietro Perona. Estimating causal direction and confounding of two discrete variables. arXiv preprint arXiv:1611.01504, 2016.
Google Scholar
Povilas Daniusis, Dominik Janzing, Joris Mooij, Jakob Zscheischler, Bastian Steudel, Kun Zhang, and Bernhard Schölkopf. Inferring deterministic causal relations. arXiv preprint arXiv:1203.3475, 2012.
Google Scholar
Diogo Moitinho de Almeida. Automated feature engineering applied to causality. In Proc. NIPS 2013 workshop on causality, http://clopinet.com/isabelle/Projects/NIPS2013/, December 2013.
Suzana de Siqueira Santos, Daniel Yasumasa Takahashi, Asuka Nakata, and André Fujita. A comparative study of statistical methods used to identify dependencies between gene expression signals. Briefings in Bioinformatics, 15(6):906–918, 2014. doi: 10.1093/bib/bbt051. URL http://dx.doi.org/10.1093/bib/bbt051.
Article Google Scholar
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley & Sons, USA, 2nd edition, 2001.
MATH Google Scholar
B. Efron. Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association, 78:316–331, 1983.
Article MathSciNet Google Scholar
Bradley Efron and Robert J Tibshirani. An introduction to the bootstrap. CRC press, 1994.
Google Scholar
Guy Fagherazzi, Alice Vilier, Daniela Saes Sartorelli, Martin Lajous, Beverley Balkau, and Françoise Clavel-Chapelon. Consumption of artificially and sugar-sweetened beverages and incident type 2 diabetes in the etude epidémiologique auprès des femmes de la mutuelle générale de l’education nationale–european prospective investigation into cancer and nutrition cohort. The American Journal of Clinical Nutrition, 97(3):517–523, March 2013.
Google Scholar
Ronald A. Fisher. The Design of Experiments. 1935.
Google Scholar
José A. R. Fonollosa. Conditional distribution variability measure for causality detection. In Proc. NIPS 2013 workshop on causality, http://clopinet.com/isabelle/Projects/NIPS2013/, December 2013.
José AR Fonollosa. Conditional distribution variability measures for causality detection. arXiv preprint arXiv:1601.06680, 2016.
Google Scholar
U.S. Preventive Services Task Force. Guide to clinical preventive services: report of the u.s. preventive services task force. 1989.
Google Scholar
Stuart Geman, Elie Bienenstock, and René Doursat. Neural networks and the bias/variance dilemma. Neural Comput., 4(1):1–58, 1992. ISSN 0899-7667. doi: https://doi.org/10.1162/neco.1992.4.1.1.
Article Google Scholar
Olivier Goudet. Causality pairwise inference datasets. replication data for: “learning functional causal models with generative neural networks”, 2017. URL http://dx.doi.org/10.7910/DVN/3757KX.
Olivier Goudet, Diviyan Kalainathan, Philippe Caillou, Isabelle Guyon, David Lopez-Paz, and Michèle Sebag. Causal generative neural networks. arXiv preprint arXiv:1711.08936, 2017.
Google Scholar
Clive WJ Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, pages 424–438, 1969.
Google Scholar
A. Gretton, K. Fukumizu, CH. Teo, L. Song, B. Schölkopf, and AJ. Smola. A kernel statistical test of independence. In Advances in neural information processing systems 20, pages 585–592, Red Hook, NY, USA, September 2008. Max-Planck-Gesellschaft, Curran.
Google Scholar
Arthur Gretton, Ralf Herbrich, Alexander Smola, Olivier Bousquet, and Bernhard Schölkopf. Kernel methods for measuring independence. Journal of Machine Learning Research, 6(Dec):2075–2129, 2005.
MathSciNet MATH Google Scholar
I. Guyon, C. Aliferis, G. Cooper, A. Elisseeff, J.-P. Pellet, P. Spirtes, and A. Statnikov. Design and analysis of the causality pot-luck challenge. In JMLR W&CP, volume 5: NIPS 2008 causality workshop, Whistler, Canada, December 12 2008.
Google Scholar
I. Guyon, D. Battaglia, A. Guyon, V. Lemaire, J. G. Orlandi, B. Ray, M. Saeed, J. Soriano, A. Statnikov, and O. Stetter. Design of the first neuronal connectomics challenge: From imaging to connectivity. In 2014 International Joint Conference on Neural Networks (IJCNN), pages 2600–2607, July 2014. https://doi.org/10.1109/IJCNN.2014.6889913.
Isabelle Guyon. A practical guide to model selection, 2010.
Google Scholar
Isabelle Guyon. Chalearn cause effect pairs challenge, 2013. URL http://www.causality.inf.ethz.ch/cause-effect.php.
Isabelle Guyon. Chalearn fast causation coefficient challenge. 2014.
Google Scholar
Isabelle Guyon and et al. Results and analysis of the 2013 chalearn cause-effect pair challenge. In Proc. NIPS 2013 workshop on causality, Workshop URL: http://clopinet.com/isabelle/Projects/NIPS2013/; Challenge URL: http://www.causality.inf.ethz.ch/cause-effect.php, December 2013.
Isabelle Guyon, Steve Gunn, Masoud Nikravesh, and Lotfi A. Zadeh. Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing). Springer-Verlag, Berlin, Heidelberg, 2006. ISBN 3540354875.
Google Scholar
Patrik O Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Schölkopf. Nonlinear causal discovery with additive noise models. In Neural Information Processing Systems (NIPS), pages 689–696, 2009.
Google Scholar
P. Illari, F. Russo, and J. Williamson. Causality in the Sciences. Oxford University Press, 2011.
Book Google Scholar
Dominik Janzing and Bernhard Scholkopf. Causal inference using the algorithmic markov condition. IEEE Transactions on Information Theory, 56(10):5168–5194, 2010.
Article MathSciNet Google Scholar
Dominik Janzing, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniušis, Bastian Steudel, and Bernhard Schölkopf. Information-geometric approach to inferring causal directions. Artif. Intell., 182-183:1–31, May 2012. ISSN 0004-3702. doi: 10.1016/j.artint.2012.01.002. URL http://dx.doi.org/10.1016/j.artint.2012.01.002.
Article MathSciNet MATH Google Scholar
Mingyu Chung Gabriela K. Fragiadakis Jonathan Fitzgerald Birgit Schoeberl Garry P. Nolan Claire Tomlin Karen Sachs, Solomon Itani. Experiment design in static models of dynamic biological systems. In NIPS2013 workshop on causality, 2013.
Google Scholar
Shachar Kaufman, Saharon Rosset, Claudia Perlich, and Ori Stitelman. Leakage in data mining: Formulation, detection, and avoidance. ACM Transactions on Knowledge Discovery from Data (TKDD), 6(4):15, 2012.
Google Scholar
Ron Kohavi. Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, pages 202–207. AAAI Press, 1996.
Google Scholar
S. Kpotufe, E. Sgouritsa, D. Janzing, and B. Schölkopf. Consistency of causal inference under the additive noise model. In Proceedings of the 31st International Conference on Machine Learning, W&CP 32 (1), pages 478–495. JMLR, 2014.
Google Scholar
John Langford. Tutorial on practical prediction theory for classification. J. Mach. Learn. Res., 6:273–306, December 2005. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=1046920.1058111.
David Lopez-Paz, Krikamol Muandet, and Benjamin Recht. The randomized causation coefficient. J. Mach. Learn. Res., 16(1):2901–2907, January 2015a. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=2789272.2912092.
David Lopez-Paz, Krikamol Muandet, Bernhard Schölkopf, and Ilya O Tolstikhin. Towards a learning theory of cause-effect inference. In ICML, pages 1452–1461, 2015b.
Google Scholar
David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, and Léon Bottou. Discovering causal signals in images. arXiv preprint arXiv:1605.08179;, 2016.
Google Scholar
Bram Minnaert. Feature importance in causal inference for numerical and categorical variables. In Proc. NIPS 2013 workshop on causality, http://clopinet.com/isabelle/Projects/NIPS2013/, December 2013.
Joris M Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research, 17(32):1–102, 2016.
Google Scholar
J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.
MATH Google Scholar
J. Peters, D. Janzing, and B. Schölkopf. Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press, Cambridge, MA, USA, 2017a.
MATH Google Scholar
Jonas Peters, Dominik Janzing, and Bernhard Scholkopf. Causal inference on discrete data using additive noise models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12):2436–2450, 2011.
Article Google Scholar
Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Elements of causal inference: foundations and learning algorithms. MIT press, 2017b.
MATH Google Scholar
Karl Popper. Conjectures and refutations: The growth of scientific knowledge. routledge, 2014.
Google Scholar
Hans Reichenbach. The direction of time. Dover Publications, 1956.
Book Google Scholar
Paul R. Rosenbaum and Donald B. Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55, 1983.
Article MathSciNet Google Scholar
Donald Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5):688–701, 1974.
Article Google Scholar
Spyridon Samothrakis, Diego Perez, and Simon Lucas. Training gradient boosting machines using curve fitting and information-theoretic features for causal direction detection. In Proc. NIPS 2013 workshop on causality, http://clopinet.com/isabelle/Projects/NIPS2013/, December 2013.
K. F. Schulz, D. G. Altman, D. Moher, and for the CONSORT Group. Consort 2010 statement: updated guidelines for reporting parallel group randomised trials. Ann. Int. Med., 2010.
Google Scholar
P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. The MIT Press, Cambridge, Massachusetts, London, England, 2000.
MATH Google Scholar
Alexander Statnikov, Mikael Henaff, Nikita I Lytkin, and Constantin F Aliferis. New methods for separating causes from effects in genomics data. BMC Genomics, 13, 2012a.
Google Scholar
Alexander Statnikov, Mikael Henaff, Nikita I Lytkin, and Constantin F Aliferis. New methods for separating causes from effects in genomics data. BMC genomics, 13(8):S22, 2012b.
Article Google Scholar
Alexander Statnikov, Sisi Ma, Mikael Henaff, Nikita Lytkin, Efstratios Efstathiadis, Eric R. Peskin, and Constantin F. Aliferis. Ultra-scalable and efficient methods for hybrid observational and experimental local causal pathway discovery. J. Mach. Learn. Res., 16(1):3219–3267, January 2015. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=2789272.2912102.
Natasa Tagasovska, Thibault Vatter, and Valérie Chavez-Demoulin. Nonparametric quantile-based causal discovery. arXiv preprint arXiv:1801.10579, 2018.
Google Scholar
https://en.wikipedia.org/wiki/Evidence-based_medicine. Evidence-based medicine.
Yi Wang, Yi Li andHongbao Cao, Momiao Xiong, Yin Yao Shugart, and Li Jin. Efficient test for nonlinear dependence of two continuous variables. BMC Bioinformatics, 16(260), 2015.
Google Scholar
K. Zhang, B. Schölkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional shift. In Proceedings of the 30th International Conference on Machine Learning, W&CP 28 (3), page 819–827. JMLR, 2013.
Google Scholar
Kun Zhang. Causal learning and machine learning. In Antti Hyttinen, Joe Suzuki, and Brandon Malone, editors, Proceedings of The 3rd International Workshop on Advanced Methodologies for Bayesian Networks, volume 73 of Proceedings of Machine Learning Research, pages 4–4. PMLR, 20–22 Sep 2017. URL http://proceedings.mlr.press/v73/zhang17a.html.
Kun Zhang and Aapo Hyvärinen. On the identifiability of the post-nonlinear causal model. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, pages 647–655. AUAI Press, 2009.
Google Scholar
Kun Zhang, Zhikun Wang, Jiji Zhang, and Bernhard Schölkopf. On estimation of functional causal models: general results and application to the post-nonlinear causal model. ACM Transactions on Intelligent Systems and Technology (TIST), 7(2):13, 2016.
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Dominik Janzing and Berna Batu for their careful review of this chapter.

Author information

Authors and Affiliations

Team TAU - CNRS, INRIA, Université Paris Sud, Université Paris Saclay, Orsay, France
Isabelle Guyon
ChaLearn, Berkeley, CA, USA
Isabelle Guyon
Team TAU - CNRS, INRIA, Université Paris Sud, Université Paris Saclay, Orsay, France
Olivier Goudet

Authors

Isabelle Guyon
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Goudet
View author publications
You can also search for this author in PubMed Google Scholar
Diviyan Kalainathan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Isabelle Guyon .

Editor information

Editors and Affiliations

Team TAU - CNRS, INRIA, Université Paris Sud, Université Paris Saclay, Orsay France, ChaLearn, Berkeley, CA, USA
Isabelle Guyon
SoFi, San Francisco, CA, USA
Alexander Statnikov
University of Paris-Sud, Paris-Saclay, Paris, France
Berna Bakir Batu

Appendices

Appendix 1: Derivation of Bayes Optimal Causation Coefficients

We derive the Bayes optimal causation coefficients introduced in Sect. 2.3.2.

The Bayes optimal decision rule prescribes the following:

$$\displaystyle \begin{aligned} \begin{array}{rll} \text{Classify pair }(X_\pi, Y_\pi)\text{ as }[X \rightarrow Y] & iff~& \\ P_{\mathscr{M}}\Big( G = [X \rightarrow Y]~|~(X_\pi, Y_\pi)\Big) & > & P_{\mathscr{M}}\Big( G \neq [X \rightarrow Y]~|~(X_\pi, Y_\pi)\Big). \end{array} \end{aligned} $$

(2.28)

Given a classification problem of patterns z (in our case z = (X _π, Y _π) pairs), with ground truth g ₁ and g ₀ (in our case g ₁ = [X → Y ] and g ₀ = ¬[X → Y ]), we define a “discriminant function” as a function g(z) taking values in $\mathbb {R}$ such that we predict class g ₁ if g(z) > θ and class g ₀ otherwise. Important: with our definition, θ is a real number not necessarily equal to 0 (as is commonly used).

In this context,

$$\displaystyle \begin{aligned} P_{\mathscr{M}}\Big(G = [X \rightarrow Y]~|~(X_\pi, Y_\pi) \Big)- P_{\mathscr{M}}\Big(G \neq [X \rightarrow Y]~|~(X_\pi, Y_\pi)\Big) \end{aligned}$$

is a Bayes optimal discriminant function and because:

$$\displaystyle \begin{aligned} P_{\mathscr{M}}\Big(G = [X \rightarrow Y]~|~(X_\pi, Y_\pi)\Big) = 1- P_{\mathscr{M}}\Big(G \neq [X \rightarrow Y]~|~(X_\pi, Y_\pi)\Big), \end{aligned}$$

the following is also a Bayes optimal discriminant function:

$$\displaystyle \begin{aligned} 2~P_{\mathscr{M}}\Big(G = [X \rightarrow Y]~|~(X_\pi, Y_\pi)\Big)-1 \end{aligned}$$

and therefore, so is:

$$\displaystyle \begin{aligned} P_{\mathscr{M}} \Big(G = [X \rightarrow Y]~|~(X_\pi, Y_\pi) \Big) . \end{aligned}$$

Also note that we could have considered the symmetric problem of classifying Y → X vs. all other cases. If we assume that the mother distribution is perfectly symmetrical, i.e. that for each pair (X, Y ) labeled X → Y we have the symmetric pair (Y, X) labeled Y → X and for all pairs (X, Y ) labeled X ↔ Y , we have the same pair labeled Y ↔ X and for all pairs (X, Y ) labeled X ⊥ Y , we have the same pair labeled Y ⊥ X, then, the ranking of all pairs (X _π, Y _π) by sorting according to $P_{\mathscr {M}} \Big (G = [X \rightarrow Y]~|~(X_\pi , Y_\pi ) \Big )$ should be in reverse order as the ranking with $P_{\mathscr {M}} \Big (G = [X \leftarrow Y]~|~(X_\pi , Y_\pi ) \Big )$. Consequently we obtain the same ranking with:

(2.29)

where Φ(.) is any strictly monotonically increasing function.

It is important to note that, because we consider four possible truth values for G, G ∈{X → Y, X ← Y, X ↔ Y, X ⊥ Y }, $P_{\mathscr {M}}\Big (G = [X \rightarrow Y]~|~(X_\pi , Y_\pi ) \Big )$ is NOT equal to $(1- P_{\mathscr {M}}\Big (G = [X \leftarrow Y]~|~(X_\pi , Y_\pi ) \Big )$.

It may also be convenient to define an Bayes optimal causation coefficient in terms of data generative model. To that end, notice that if a − b is a discriminant function, so is a∕b − 1. Therefore,

$$\displaystyle \begin{aligned} P_{\mathscr{M}}\Big(G=[X \rightarrow Y]~|~(X_\pi, Y_\pi) \Big) / P_{\mathscr{M}}\Big(G=[X \leftarrow Y]~|~(X_\pi, Y_\pi) \Big) - 1 \end{aligned}$$

is also a Bayes optimal discriminant function. Using Bayes’ rule again,

$$\displaystyle \begin{aligned} \begin{array}{ll} &P_{\mathscr{M}}\Big(G=[X \rightarrow Y]~|~(X_\pi, Y_\pi) \Big) = \\ &~~~~~~~~P_{\mathscr{M}}\Big((X_\pi, Y_\pi)~|~G=[X \rightarrow Y] \Big) P_{\mathscr{M}}\Big(G=[X \rightarrow Y]\Big)/P_{\mathscr{M}}\Big((X_\pi, Y_\pi) \Big) \end{array} \end{aligned}$$

and further assuming that the mother distribution is not biased towards a particular causal direction:

$$\displaystyle \begin{aligned} P_{\mathscr{M}}\Big(G=[X \rightarrow Y]\Big) = P_{\mathscr{M}}\Big(G=[Y \rightarrow X] \Big) \end{aligned}$$

the following is also a Bayes optimal discriminant function:

$$\displaystyle \begin{aligned} P_{\mathscr{M}}\Big((X_\pi, Y_\pi)~|~G=[X \rightarrow Y]\Big) /P_{\mathscr{M}}\Big((X_\pi, Y_\pi)~|~G=[Y \rightarrow X]\Big) - 1 \end{aligned}$$

and so is, for any strictly monotonically increasing function Φ(.):

(2.30)

Appendix 2: Proof of Theorem 2.1: $\mathscr {B}$-Identifiability Implies (α, β)-Identifiability

Given the hypotheses of the theorem (symmetrical mother distribution), we can choose $C_{\mathscr {B} 2}$ as causation coefficient (Eq. (2.8)) and apply it to $\mathcal {B}_\varPi (X, Y)$:

$$\displaystyle \begin{aligned} C_{\mathscr{B} 2}(X_\pi, Y_\pi) =& P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[X \rightarrow Y]\Big)\\ &- P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[Y \rightarrow X]\Big). \end{aligned} $$

Imposing α = β = θ = 0, as per the definition of (α, β)-identifiability, the causal direction is (α, β)-identifiable for $P_{\mathscr {M}} \Big ( \mathcal {B}_\varPi (X, Y), G \Big )$ iff:

$$\displaystyle \begin{aligned} \begin{array}{rcl} Pr(C_{\mathscr{B} 2}(X_\pi, Y_\pi)>0~|~G = [X \leftarrow Y])= 0. &\displaystyle &\displaystyle \text{ (Type I errors) } \\ Pr(C_{\mathscr{B} 2}(X_\pi, Y_\pi)<0~|~G = [X \rightarrow Y])= 0. &\displaystyle &\displaystyle \text{ (Type II errors) } \end{array} \end{aligned} $$

In other words, (α, β)-identifiability with α = β = θ = 0 for $\mathcal {B}_\varPi (X, Y)$ is equivalent to:

$$\displaystyle \begin{aligned} G &= [X \rightarrow Y]\\ &\quad \Rightarrow P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[X \rightarrow Y]\Big) > P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[Y \rightarrow X]\Big) . {} \end{aligned} $$

(2.31)

and vice versa if we invert the roles of X and Y . If $\mathscr {B}$-identifiability holds, then:

$$\displaystyle \begin{aligned} G = [X \rightarrow Y] \Rightarrow \left \{ \begin{array}{lll} \exists \left(f\in\mathcal{F} \wedge P(N) \in \mathcal{N}\right) & \text{s.t.} & Y:=f(X,N) \\ \nexists \left(f\in\mathcal{F} \wedge P(N) \in \mathcal{N} \right) & \text{s.t.} & X:=f(Y,N). \end{array} \right. {} \end{aligned} $$

(2.32)

which can be equivalently re-written, for the given pair (X _π, Y _π), as:

$$\displaystyle \begin{aligned} G = [X \rightarrow Y] \Rightarrow \left \{ \begin{array}{lll} P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[X \rightarrow Y]\Big) > 0 \\ P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[Y \rightarrow X]\Big) = 0. \end{array} \right. {} \end{aligned} $$

(2.33)

It can easily be seen that if Eq. (2.33) is satisfied then Eq. (2.31) holds. Thus we have proved that $\mathscr {B}$-identifiability implies (α, β)-identifiability with α = β = θ = 0 for $\mathcal {B}_\varPi (X, Y)$. Let us prove now the reciprocal statement.

Starting from Eq. (2.31), if we swap the role of X and Y , we obtain: $G = [Y \rightarrow X] \Rightarrow P_{\mathscr {M}}\Big (\mathcal {B}_\varPi (X, Y)~|~G=[Y \rightarrow X]\Big ) > P_{\mathscr {M}}\Big (\mathcal {B}_\varPi (X, Y)~|~G=[X \rightarrow Y]\Big ) ;$and if we contrapose Eq. (2.31) we obtain: $P_{\mathscr {M}}\Big (\mathcal {B}_\varPi (X, Y)~|~G=[Y \rightarrow X]\Big ) > P_{\mathscr {M}}\Big (\mathcal {B}_\varPi (X, Y)~|~G=[X \rightarrow Y]\Big ) \Rightarrow G = [Y \rightarrow X] ,$ since when data are generated with a binary process ¬(G = [X → Y ]) is equivalent to G = [Y → X].

Thus we have an equivalence, both for the previous formula and for Eq. (2.31):

$$\displaystyle \begin{aligned} G &= [X \rightarrow Y] \Leftrightarrow P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[X \rightarrow Y]\Big)\\ &\quad > P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[Y \rightarrow X]\Big) \geq 0. {} \end{aligned} $$

(2.34)

In this last formula, if $P_{\mathscr {M}}\Big (\mathcal {B}_\varPi (X, Y)~|~G=[Y \rightarrow X]\Big ) \neq 0$, then ∃(X, Y ) s.t. G = [Y → X]. This would contradicts that G = [X → Y ]. Hence we must have $P_{\mathscr {M}}\Big (\mathcal {B}_\varPi (X, Y)~|~G=[Y \rightarrow X]\Big ) = 0$. Therefore, if Eq. (2.34) holds, then Eq. (2.33) holds too, or equivalently Eq. (2.32). Thus, we have proved that (α, β)-identifiability with α = β = θ = 0 implies $\mathscr {B}$-identifiability. $\square $

Appendix 3: Examples of Cause-Effect Pairs

In this section, we should graphically how the some of the causation coefficient filters are computed. The examples are drawn from Tables 2.1 and 2.2. It can be observed that, depending on the type of data generative model, one or the other assumption is violated. Hence the causation coefficient filters generally disagree on the causal direction. Even though the small decision tree of Fig. 7.7 performs relatively well on the challenge data, it is very easy to construct pairs to make it fail. In the following chapters, we will see more advanced method (Figs. 2.15, 2.16, 2.17, 2.18, 2.19, 2.20, 2.21, 2.22, 2.23, 2.24, 2.25 and 2.26).

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Guyon, I., Goudet, O., Kalainathan, D. (2019). Evaluation Methods of Cause-Effect Pairs. In: Guyon, I., Statnikov, A., Batu, B. (eds) Cause Effect Pairs in Machine Learning. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-21810-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-21810-2_2
Published: 23 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21809-6
Online ISBN: 978-3-030-21810-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluation Methods of Cause-Effect Pairs

Abstract

Notes

References

Acknowledgements