Skip to main content

Evaluation Methods of Cause-Effect Pairs

  • Chapter
  • First Online:

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

Abstract

This chapter addresses the problem of benchmarking causal models or validating particular putative causal relationships, in the limited setting of cause-effect pairs, when empirical “observational” data are available. We do not address experimental validations e.g. via randomized controlled trials. Our goal is to compare methods, which provide a score C(X, Y ), called causation coefficient, rating a pair of variable (X, Y ) for being in a potential causal relationship X → Y . Causation coefficients may be used for various purposes, including to prioritize experiments, which may be costly or risky, or guiding decision makers in domains in which experiments are infeasible or unethical. We provide a methodology to evaluate their reliability. We take three points of views: (1) that of algorithm developers who must justify the soundness of their method, particularly with respect to identifiability and consistency, (2) that of practitioners who seek to understand on what basis algorithms make their decisions and evaluate their statistical significance, and (3) that of benchmark organizers who desire to make fair evaluations to compare methods. We adopt the framework of pattern recognition in which pairs of variable (X, Y ) and their ground truth causal graph are drawn i.i.d. from a “mother distribution”. This leads us to define new notions of probabilistic identifiability, Bayes optimal causation coefficients, and multi-part statistical tests. These new notions are evaluated on the data of the first cause-effect pair challenge. We also compile a list of resources, including datasets of real or synthetic pairs, and data generative models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In case of time series, {(x 1, y 1), (x 2, y 2), ⋯ , (x n, y n)} are time ordered and not drawn i.i.d..

  2. 2.

    The difference with the general case is outlined in blue: the noise is additive. It is usually assumed that the noise and the input are independent in the ANM (noted X ⊥ N).

  3. 3.

    However, we remind the reader that several data generative processes may generate the same data distribution.

  4. 4.

    In our examples, we perform a piece-wise constant fit using m = 20 equally spaced points. If we believe that the fit should be better in the “causal direction”, then \(R^2 = R^2_y - R^2_x\) should be positive if X → Y ), since \(R^2_y\) is the residual of the fit \(\hat y = f(x)\).

  5. 5.

    Using an ANM model (Eq. (2.5)), we expect that input and residual of regression (our estimation of the noise) are independent, i.e. if X → Y , R y ⊥ X. We use a kernel independence test statistic [25] to calculate I(X, Y ); larger values of I(X, Y ) mean a greater confidence that X and Y are independent. For example, I(., .) can be the HSIC statistic.

  6. 6.

    According to the Information Geometry Causal Inference (IGCI) principle, if (X → Y ) and there is independence between “causal mechanism” and P(X), then, under some conditions (e.g. no noise and invertible mechanism), H x ≥ H y [36].

  7. 7.

    According to IGCI principle again, if X → Y , under the same conditions as for the entropy criterion, S y ≥ S x [36].

  8. 8.

    CDS measures variations in conditional distribution. The idea is that, for X → Y , if X is independent of the noise, then, after normalizing the support P(Y |X) should not vary a lot. For additive noise models, C CDS should be similar to C IR (but not always, because of support normalization). In the case of multiplicative noise, C CDS should capture the independence of noise and input, where C IR usually fails.

  9. 9.

    If Π takes continuous values, \(P_{\mathscr {M}} \Big ( P_\varPi (X, Y) \Big )\) and \(P_{\mathscr {M}} \Big ( P_\varPi (X, Y)~|~G \Big )\) should be understood as densities rather than a distributions.

  10. 10.

    Except possibly for a finite subset or a subset of measure 0.

  11. 11.

    https://competitions.codalab.org/competitions/1381.

  12. 12.

    This could easily be refined in a number of ways, including restricting the mother distribution \(P_{\mathscr {M}}\) to scatter plots with the exact same number of samples as the pairs to be tested. In the dataset of the cause-effect pair challenge that we use as \(P_{\mathscr {M}}\), the number of samples vary between 500 and 5000.

  13. 13.

    http://www.cs.huji.ac.il/~galel/Repository/.

  14. 14.

    http://dreamchallenges.org.

  15. 15.

    Causality workbench: http://www.causality.inf.ethz.ch/. Founding members: Constantin F. Aliferis (Vanderbilt University, Tennessee), Gregory F. Cooper (University of Pittsburgh, Pennsylvania), André Elisseeff (IBM Research, Switzerland), Jean-Philippe Pellet (IBM Research, Switzerland), Alexander Statnikov (Vanderbilt University, Tennessee), Peter Spirtes (Carnegie Mellon University, Pennsylvania).

  16. 16.

    https://ei.is.tuebingen.mpg.de/.

  17. 17.

    Causality Workbench repo: http://www.causality.inf.ethz.ch/repository.php.

  18. 18.

    Connectomics challenge: http://connectomics.chalearn.org/.

  19. 19.

    Bio-model database: http://www.ebi.ac.uk/biomodels-main/.

  20. 20.

    Chemical plant simulator: http://depts.washington.edu/control/LARRY/TE/download.html.

  21. 21.

    CMU case studies: https://www.cmu.edu/dietrich/philosophy/events/workshops-conferences/causal-discovery/index.html.

  22. 22.

    http://causeme.uv.es/index.html.

References

  1. Constantin F. Aliferis, Ioannis Tsamardinos, Alexander R. Statnikov, and Laura E. Brown. Causal explorer: A causal probabilistic network learning toolkit for biomedical discovery. In Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, METMBS ’03, June 23 - 26, 2003, Las Vegas, Nevada, USA, pages 371–376, 2003.

    Google Scholar 

  2. Demian Battaglia, Isabelle Guyon, Vincent Lemaire, Javier Orlandi, Bisakha Ray, and Jordi Soriano. Neural Connectomics Challenge. Springer Publishing Company, Incorporated, 1st edition, 2017. ISBN 3319530690, 9783319530697.

    Google Scholar 

  3. J A Blackard and D J Dean. Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Computers and Electronics in Agriculture, vol.24:131–151, 1999.

    Article  Google Scholar 

  4. Patrick Bloebaum, Dominik Janzing, Takashi Washio, Shohei Shimizu, and Bernhard Schoelkopf. Cause-effect inference by comparing regression errors. In International Conference on Artificial Intelligence and Statistics, pages 900–909, 2018.

    Google Scholar 

  5. C Bonferroni. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8:3–62, 1936.

    MATH  Google Scholar 

  6. Gianluca Bontempi. From dependency to causality: a machine learning approach. In Proc. NIPS 2013 workshop on causality, http://clopinet.com/isabelle/Projects/NIPS2013/, December 2013.

  7. P. B. Burns, R. J. Rohrich, and K. C. Chung. The levels of evidence and their role in evidence-based medicine. Plastic and reconstructive surgery, 128(1):305–310, 2011.

    Article  Google Scholar 

  8. Krzysztof Chalupka, Frederick Eberhardt, and Pietro Perona. Estimating causal direction and confounding of two discrete variables. arXiv preprint arXiv:1611.01504, 2016.

    Google Scholar 

  9. Povilas Daniusis, Dominik Janzing, Joris Mooij, Jakob Zscheischler, Bastian Steudel, Kun Zhang, and Bernhard Schölkopf. Inferring deterministic causal relations. arXiv preprint arXiv:1203.3475, 2012.

    Google Scholar 

  10. Diogo Moitinho de Almeida. Automated feature engineering applied to causality. In Proc. NIPS 2013 workshop on causality, http://clopinet.com/isabelle/Projects/NIPS2013/, December 2013.

  11. Suzana de Siqueira Santos, Daniel Yasumasa Takahashi, Asuka Nakata, and André Fujita. A comparative study of statistical methods used to identify dependencies between gene expression signals. Briefings in Bioinformatics, 15(6):906–918, 2014. doi: 10.1093/bib/bbt051. URL http://dx.doi.org/10.1093/bib/bbt051.

    Article  Google Scholar 

  12. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley & Sons, USA, 2nd edition, 2001.

    MATH  Google Scholar 

  13. B. Efron. Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association, 78:316–331, 1983.

    Article  MathSciNet  Google Scholar 

  14. Bradley Efron and Robert J Tibshirani. An introduction to the bootstrap. CRC press, 1994.

    Google Scholar 

  15. Guy Fagherazzi, Alice Vilier, Daniela Saes Sartorelli, Martin Lajous, Beverley Balkau, and Françoise Clavel-Chapelon. Consumption of artificially and sugar-sweetened beverages and incident type 2 diabetes in the etude epidémiologique auprès des femmes de la mutuelle générale de l’education nationale–european prospective investigation into cancer and nutrition cohort. The American Journal of Clinical Nutrition, 97(3):517–523, March 2013.

    Google Scholar 

  16. Ronald A. Fisher. The Design of Experiments. 1935.

    Google Scholar 

  17. José A. R. Fonollosa. Conditional distribution variability measure for causality detection. In Proc. NIPS 2013 workshop on causality, http://clopinet.com/isabelle/Projects/NIPS2013/, December 2013.

  18. José AR Fonollosa. Conditional distribution variability measures for causality detection. arXiv preprint arXiv:1601.06680, 2016.

    Google Scholar 

  19. U.S. Preventive Services Task Force. Guide to clinical preventive services: report of the u.s. preventive services task force. 1989.

    Google Scholar 

  20. Stuart Geman, Elie Bienenstock, and René Doursat. Neural networks and the bias/variance dilemma. Neural Comput., 4(1):1–58, 1992. ISSN 0899-7667. doi: https://doi.org/10.1162/neco.1992.4.1.1.

    Article  Google Scholar 

  21. Olivier Goudet. Causality pairwise inference datasets. replication data for: “learning functional causal models with generative neural networks”, 2017. URL http://dx.doi.org/10.7910/DVN/3757KX.

  22. Olivier Goudet, Diviyan Kalainathan, Philippe Caillou, Isabelle Guyon, David Lopez-Paz, and Michèle Sebag. Causal generative neural networks. arXiv preprint arXiv:1711.08936, 2017.

    Google Scholar 

  23. Clive WJ Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, pages 424–438, 1969.

    Google Scholar 

  24. A. Gretton, K. Fukumizu, CH. Teo, L. Song, B. Schölkopf, and AJ. Smola. A kernel statistical test of independence. In Advances in neural information processing systems 20, pages 585–592, Red Hook, NY, USA, September 2008. Max-Planck-Gesellschaft, Curran.

    Google Scholar 

  25. Arthur Gretton, Ralf Herbrich, Alexander Smola, Olivier Bousquet, and Bernhard Schölkopf. Kernel methods for measuring independence. Journal of Machine Learning Research, 6(Dec):2075–2129, 2005.

    MathSciNet  MATH  Google Scholar 

  26. I. Guyon, C. Aliferis, G. Cooper, A. Elisseeff, J.-P. Pellet, P. Spirtes, and A. Statnikov. Design and analysis of the causality pot-luck challenge. In JMLR W&CP, volume 5: NIPS 2008 causality workshop, Whistler, Canada, December 12 2008.

    Google Scholar 

  27. I. Guyon, D. Battaglia, A. Guyon, V. Lemaire, J. G. Orlandi, B. Ray, M. Saeed, J. Soriano, A. Statnikov, and O. Stetter. Design of the first neuronal connectomics challenge: From imaging to connectivity. In 2014 International Joint Conference on Neural Networks (IJCNN), pages 2600–2607, July 2014. https://doi.org/10.1109/IJCNN.2014.6889913.

  28. Isabelle Guyon. A practical guide to model selection, 2010.

    Google Scholar 

  29. Isabelle Guyon. Chalearn cause effect pairs challenge, 2013. URL http://www.causality.inf.ethz.ch/cause-effect.php.

  30. Isabelle Guyon. Chalearn fast causation coefficient challenge. 2014.

    Google Scholar 

  31. Isabelle Guyon and et al. Results and analysis of the 2013 chalearn cause-effect pair challenge. In Proc. NIPS 2013 workshop on causality, Workshop URL: http://clopinet.com/isabelle/Projects/NIPS2013/; Challenge URL: http://www.causality.inf.ethz.ch/cause-effect.php, December 2013.

  32. Isabelle Guyon, Steve Gunn, Masoud Nikravesh, and Lotfi A. Zadeh. Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing). Springer-Verlag, Berlin, Heidelberg, 2006. ISBN 3540354875.

    Google Scholar 

  33. Patrik O Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Schölkopf. Nonlinear causal discovery with additive noise models. In Neural Information Processing Systems (NIPS), pages 689–696, 2009.

    Google Scholar 

  34. P. Illari, F. Russo, and J. Williamson. Causality in the Sciences. Oxford University Press, 2011.

    Book  Google Scholar 

  35. Dominik Janzing and Bernhard Scholkopf. Causal inference using the algorithmic markov condition. IEEE Transactions on Information Theory, 56(10):5168–5194, 2010.

    Article  MathSciNet  Google Scholar 

  36. Dominik Janzing, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniušis, Bastian Steudel, and Bernhard Schölkopf. Information-geometric approach to inferring causal directions. Artif. Intell., 182-183:1–31, May 2012. ISSN 0004-3702. doi: 10.1016/j.artint.2012.01.002. URL http://dx.doi.org/10.1016/j.artint.2012.01.002.

    Article  MathSciNet  MATH  Google Scholar 

  37. Mingyu Chung Gabriela K. Fragiadakis Jonathan Fitzgerald Birgit Schoeberl Garry P. Nolan Claire Tomlin Karen Sachs, Solomon Itani. Experiment design in static models of dynamic biological systems. In NIPS2013 workshop on causality, 2013.

    Google Scholar 

  38. Shachar Kaufman, Saharon Rosset, Claudia Perlich, and Ori Stitelman. Leakage in data mining: Formulation, detection, and avoidance. ACM Transactions on Knowledge Discovery from Data (TKDD), 6(4):15, 2012.

    Google Scholar 

  39. Ron Kohavi. Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, pages 202–207. AAAI Press, 1996.

    Google Scholar 

  40. S. Kpotufe, E. Sgouritsa, D. Janzing, and B. Schölkopf. Consistency of causal inference under the additive noise model. In Proceedings of the 31st International Conference on Machine Learning, W&CP 32 (1), pages 478–495. JMLR, 2014.

    Google Scholar 

  41. John Langford. Tutorial on practical prediction theory for classification. J. Mach. Learn. Res., 6:273–306, December 2005. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=1046920.1058111.

  42. David Lopez-Paz, Krikamol Muandet, and Benjamin Recht. The randomized causation coefficient. J. Mach. Learn. Res., 16(1):2901–2907, January 2015a. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=2789272.2912092.

  43. David Lopez-Paz, Krikamol Muandet, Bernhard Schölkopf, and Ilya O Tolstikhin. Towards a learning theory of cause-effect inference. In ICML, pages 1452–1461, 2015b.

    Google Scholar 

  44. David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, and Léon Bottou. Discovering causal signals in images. arXiv preprint arXiv:1605.08179;, 2016.

    Google Scholar 

  45. Bram Minnaert. Feature importance in causal inference for numerical and categorical variables. In Proc. NIPS 2013 workshop on causality, http://clopinet.com/isabelle/Projects/NIPS2013/, December 2013.

  46. Joris M Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research, 17(32):1–102, 2016.

    Google Scholar 

  47. J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.

    MATH  Google Scholar 

  48. J. Peters, D. Janzing, and B. Schölkopf. Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press, Cambridge, MA, USA, 2017a.

    MATH  Google Scholar 

  49. Jonas Peters, Dominik Janzing, and Bernhard Scholkopf. Causal inference on discrete data using additive noise models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12):2436–2450, 2011.

    Article  Google Scholar 

  50. Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Elements of causal inference: foundations and learning algorithms. MIT press, 2017b.

    MATH  Google Scholar 

  51. Karl Popper. Conjectures and refutations: The growth of scientific knowledge. routledge, 2014.

    Google Scholar 

  52. Hans Reichenbach. The direction of time. Dover Publications, 1956.

    Book  Google Scholar 

  53. Paul R. Rosenbaum and Donald B. Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55, 1983.

    Article  MathSciNet  Google Scholar 

  54. Donald Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5):688–701, 1974.

    Article  Google Scholar 

  55. Spyridon Samothrakis, Diego Perez, and Simon Lucas. Training gradient boosting machines using curve fitting and information-theoretic features for causal direction detection. In Proc. NIPS 2013 workshop on causality, http://clopinet.com/isabelle/Projects/NIPS2013/, December 2013.

  56. K. F. Schulz, D. G. Altman, D. Moher, and for the CONSORT Group. Consort 2010 statement: updated guidelines for reporting parallel group randomised trials. Ann. Int. Med., 2010.

    Google Scholar 

  57. P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. The MIT Press, Cambridge, Massachusetts, London, England, 2000.

    MATH  Google Scholar 

  58. Alexander Statnikov, Mikael Henaff, Nikita I Lytkin, and Constantin F Aliferis. New methods for separating causes from effects in genomics data. BMC Genomics, 13, 2012a.

    Google Scholar 

  59. Alexander Statnikov, Mikael Henaff, Nikita I Lytkin, and Constantin F Aliferis. New methods for separating causes from effects in genomics data. BMC genomics, 13(8):S22, 2012b.

    Article  Google Scholar 

  60. Alexander Statnikov, Sisi Ma, Mikael Henaff, Nikita Lytkin, Efstratios Efstathiadis, Eric R. Peskin, and Constantin F. Aliferis. Ultra-scalable and efficient methods for hybrid observational and experimental local causal pathway discovery. J. Mach. Learn. Res., 16(1):3219–3267, January 2015. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=2789272.2912102.

  61. Natasa Tagasovska, Thibault Vatter, and Valérie Chavez-Demoulin. Nonparametric quantile-based causal discovery. arXiv preprint arXiv:1801.10579, 2018.

    Google Scholar 

  62. https://en.wikipedia.org/wiki/Evidence-based_medicine. Evidence-based medicine.

  63. Yi Wang, Yi Li andHongbao Cao, Momiao Xiong, Yin Yao Shugart, and Li Jin. Efficient test for nonlinear dependence of two continuous variables. BMC Bioinformatics, 16(260), 2015.

    Google Scholar 

  64. K. Zhang, B. Schölkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional shift. In Proceedings of the 30th International Conference on Machine Learning, W&CP 28 (3), page 819–827. JMLR, 2013.

    Google Scholar 

  65. Kun Zhang. Causal learning and machine learning. In Antti Hyttinen, Joe Suzuki, and Brandon Malone, editors, Proceedings of The 3rd International Workshop on Advanced Methodologies for Bayesian Networks, volume 73 of Proceedings of Machine Learning Research, pages 4–4. PMLR, 20–22 Sep 2017. URL http://proceedings.mlr.press/v73/zhang17a.html.

  66. Kun Zhang and Aapo Hyvärinen. On the identifiability of the post-nonlinear causal model. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, pages 647–655. AUAI Press, 2009.

    Google Scholar 

  67. Kun Zhang, Zhikun Wang, Jiji Zhang, and Bernhard Schölkopf. On estimation of functional causal models: general results and application to the post-nonlinear causal model. ACM Transactions on Intelligent Systems and Technology (TIST), 7(2):13, 2016.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Dominik Janzing and Berna Batu for their careful review of this chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Isabelle Guyon .

Editor information

Editors and Affiliations

Appendices

Appendix 1: Derivation of Bayes Optimal Causation Coefficients

We derive the Bayes optimal causation coefficients introduced in Sect. 2.3.2.

The Bayes optimal decision rule prescribes the following:

$$\displaystyle \begin{aligned} \begin{array}{rll} \text{Classify pair }(X_\pi, Y_\pi)\text{ as }[X \rightarrow Y] & iff~& \\ P_{\mathscr{M}}\Big( G = [X \rightarrow Y]~|~(X_\pi, Y_\pi)\Big) & > & P_{\mathscr{M}}\Big( G \neq [X \rightarrow Y]~|~(X_\pi, Y_\pi)\Big). \end{array} \end{aligned} $$
(2.28)

Given a classification problem of patterns z (in our case z = (X π, Y π) pairs), with ground truth g 1 and g 0 (in our case g 1 = [X → Y ] and g 0 = ¬[X → Y ]), we define a “discriminant function” as a function g(z) taking values in \(\mathbb {R}\) such that we predict class g 1 if g(z) > θ and class g 0 otherwise. Important: with our definition, θ is a real number not necessarily equal to 0 (as is commonly used).

In this context,

$$\displaystyle \begin{aligned} P_{\mathscr{M}}\Big(G = [X \rightarrow Y]~|~(X_\pi, Y_\pi) \Big)- P_{\mathscr{M}}\Big(G \neq [X \rightarrow Y]~|~(X_\pi, Y_\pi)\Big) \end{aligned}$$

is a Bayes optimal discriminant function and because:

$$\displaystyle \begin{aligned} P_{\mathscr{M}}\Big(G = [X \rightarrow Y]~|~(X_\pi, Y_\pi)\Big) = 1- P_{\mathscr{M}}\Big(G \neq [X \rightarrow Y]~|~(X_\pi, Y_\pi)\Big), \end{aligned}$$

the following is also a Bayes optimal discriminant function:

$$\displaystyle \begin{aligned} 2~P_{\mathscr{M}}\Big(G = [X \rightarrow Y]~|~(X_\pi, Y_\pi)\Big)-1 \end{aligned}$$

and therefore, so is:

$$\displaystyle \begin{aligned} P_{\mathscr{M}} \Big(G = [X \rightarrow Y]~|~(X_\pi, Y_\pi) \Big) . \end{aligned}$$

Also note that we could have considered the symmetric problem of classifying Y → X vs. all other cases. If we assume that the mother distribution is perfectly symmetrical, i.e. that for each pair (X, Y ) labeled X → Y we have the symmetric pair (Y, X) labeled Y → X and for all pairs (X, Y ) labeled X ↔ Y , we have the same pair labeled Y ↔ X and for all pairs (X, Y ) labeled X ⊥ Y , we have the same pair labeled Y ⊥ X, then, the ranking of all pairs (X π, Y π) by sorting according to \(P_{\mathscr {M}} \Big (G = [X \rightarrow Y]~|~(X_\pi , Y_\pi ) \Big )\) should be in reverse order as the ranking with \(P_{\mathscr {M}} \Big (G = [X \leftarrow Y]~|~(X_\pi , Y_\pi ) \Big )\). Consequently we obtain the same ranking with:

(2.29)

where Φ(.) is any strictly monotonically increasing function.

It is important to note that, because we consider four possible truth values for G, G ∈{X → Y, X ← Y, X ↔ Y, X ⊥ Y }, \(P_{\mathscr {M}}\Big (G = [X \rightarrow Y]~|~(X_\pi , Y_\pi ) \Big )\) is NOT equal to \((1- P_{\mathscr {M}}\Big (G = [X \leftarrow Y]~|~(X_\pi , Y_\pi ) \Big )\).

It may also be convenient to define an Bayes optimal causation coefficient in terms of data generative model. To that end, notice that if a − b is a discriminant function, so is ab − 1. Therefore,

$$\displaystyle \begin{aligned} P_{\mathscr{M}}\Big(G=[X \rightarrow Y]~|~(X_\pi, Y_\pi) \Big) / P_{\mathscr{M}}\Big(G=[X \leftarrow Y]~|~(X_\pi, Y_\pi) \Big) - 1 \end{aligned}$$

is also a Bayes optimal discriminant function. Using Bayes’ rule again,

$$\displaystyle \begin{aligned} \begin{array}{ll} &P_{\mathscr{M}}\Big(G=[X \rightarrow Y]~|~(X_\pi, Y_\pi) \Big) = \\ &~~~~~~~~P_{\mathscr{M}}\Big((X_\pi, Y_\pi)~|~G=[X \rightarrow Y] \Big) P_{\mathscr{M}}\Big(G=[X \rightarrow Y]\Big)/P_{\mathscr{M}}\Big((X_\pi, Y_\pi) \Big) \end{array} \end{aligned}$$

and further assuming that the mother distribution is not biased towards a particular causal direction:

$$\displaystyle \begin{aligned} P_{\mathscr{M}}\Big(G=[X \rightarrow Y]\Big) = P_{\mathscr{M}}\Big(G=[Y \rightarrow X] \Big) \end{aligned}$$

the following is also a Bayes optimal discriminant function:

$$\displaystyle \begin{aligned} P_{\mathscr{M}}\Big((X_\pi, Y_\pi)~|~G=[X \rightarrow Y]\Big) /P_{\mathscr{M}}\Big((X_\pi, Y_\pi)~|~G=[Y \rightarrow X]\Big) - 1 \end{aligned}$$

and so is, for any strictly monotonically increasing function Φ(.):

(2.30)

Appendix 2: Proof of Theorem 2.1: \(\mathscr {B}\)-Identifiability Implies (α, β)-Identifiability

Given the hypotheses of the theorem (symmetrical mother distribution), we can choose \(C_{\mathscr {B} 2}\) as causation coefficient (Eq. (2.8)) and apply it to \(\mathcal {B}_\varPi (X, Y)\):

$$\displaystyle \begin{aligned} C_{\mathscr{B} 2}(X_\pi, Y_\pi) =& P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[X \rightarrow Y]\Big)\\ &- P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[Y \rightarrow X]\Big). \end{aligned} $$

Imposing α = β = θ = 0, as per the definition of (α, β)-identifiability, the causal direction is (α, β)-identifiable for \(P_{\mathscr {M}} \Big ( \mathcal {B}_\varPi (X, Y), G \Big )\) iff:

$$\displaystyle \begin{aligned} \begin{array}{rcl} Pr(C_{\mathscr{B} 2}(X_\pi, Y_\pi)>0~|~G = [X \leftarrow Y])= 0. &\displaystyle &\displaystyle \text{ (Type I errors) } \\ Pr(C_{\mathscr{B} 2}(X_\pi, Y_\pi)<0~|~G = [X \rightarrow Y])= 0. &\displaystyle &\displaystyle \text{ (Type II errors) } \end{array} \end{aligned} $$

In other words, (α, β)-identifiability with α = β = θ = 0 for \(\mathcal {B}_\varPi (X, Y)\) is equivalent to:

$$\displaystyle \begin{aligned} G &= [X \rightarrow Y]\\ &\quad \Rightarrow P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[X \rightarrow Y]\Big) > P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[Y \rightarrow X]\Big) . {} \end{aligned} $$
(2.31)

and vice versa if we invert the roles of X and Y . If \(\mathscr {B}\)-identifiability holds, then:

$$\displaystyle \begin{aligned} G = [X \rightarrow Y] \Rightarrow \left \{ \begin{array}{lll} \exists \left(f\in\mathcal{F} \wedge P(N) \in \mathcal{N}\right) & \text{s.t.} & Y:=f(X,N) \\ \nexists \left(f\in\mathcal{F} \wedge P(N) \in \mathcal{N} \right) & \text{s.t.} & X:=f(Y,N). \end{array} \right. {} \end{aligned} $$
(2.32)

which can be equivalently re-written, for the given pair (X π, Y π), as:

$$\displaystyle \begin{aligned} G = [X \rightarrow Y] \Rightarrow \left \{ \begin{array}{lll} P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[X \rightarrow Y]\Big) > 0 \\ P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[Y \rightarrow X]\Big) = 0. \end{array} \right. {} \end{aligned} $$
(2.33)

It can easily be seen that if Eq. (2.33) is satisfied then Eq. (2.31) holds. Thus we have proved that \(\mathscr {B}\)-identifiability implies (α, β)-identifiability with α = β = θ = 0 for \(\mathcal {B}_\varPi (X, Y)\). Let us prove now the reciprocal statement.

Starting from Eq. (2.31), if we swap the role of X and Y , we obtain: \(G = [Y \rightarrow X] \Rightarrow P_{\mathscr {M}}\Big (\mathcal {B}_\varPi (X, Y)~|~G=[Y \rightarrow X]\Big ) > P_{\mathscr {M}}\Big (\mathcal {B}_\varPi (X, Y)~|~G=[X \rightarrow Y]\Big ) ;\)and if we contrapose Eq. (2.31) we obtain: \(P_{\mathscr {M}}\Big (\mathcal {B}_\varPi (X, Y)~|~G=[Y \rightarrow X]\Big ) > P_{\mathscr {M}}\Big (\mathcal {B}_\varPi (X, Y)~|~G=[X \rightarrow Y]\Big ) \Rightarrow G = [Y \rightarrow X] ,\) since when data are generated with a binary process ¬(G = [X → Y ]) is equivalent to G = [Y → X].

Thus we have an equivalence, both for the previous formula and for Eq. (2.31):

$$\displaystyle \begin{aligned} G &= [X \rightarrow Y] \Leftrightarrow P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[X \rightarrow Y]\Big)\\ &\quad > P_{\mathscr{M}}\Big(\mathcal{B}_\varPi(X, Y)~|~G=[Y \rightarrow X]\Big) \geq 0. {} \end{aligned} $$
(2.34)

In this last formula, if \(P_{\mathscr {M}}\Big (\mathcal {B}_\varPi (X, Y)~|~G=[Y \rightarrow X]\Big ) \neq 0\), then ∃(X, Y ) s.t. G = [Y → X]. This would contradicts that G = [X → Y ]. Hence we must have \(P_{\mathscr {M}}\Big (\mathcal {B}_\varPi (X, Y)~|~G=[Y \rightarrow X]\Big ) = 0\). Therefore, if Eq. (2.34) holds, then Eq. (2.33) holds too, or equivalently Eq. (2.32). Thus, we have proved that (α, β)-identifiability with α = β = θ = 0 implies \(\mathscr {B}\)-identifiability. \(\square \)

Appendix 3: Examples of Cause-Effect Pairs

In this section, we should graphically how the some of the causation coefficient filters are computed. The examples are drawn from Tables 2.1 and 2.2. It can be observed that, depending on the type of data generative model, one or the other assumption is violated. Hence the causation coefficient filters generally disagree on the causal direction. Even though the small decision tree of Fig. 7.7 performs relatively well on the challenge data, it is very easy to construct pairs to make it fail. In the following chapters, we will see more advanced method (Figs. 2.15, 2.16, 2.17, 2.18, 2.19, 2.20, 2.21, 2.22, 2.23, 2.24, 2.25 and 2.26).

Fig. 2.15
figure 15

Linear with additive Gaussian noise. Pair 1 in Table 2.1. This is a well-known non identifiable pair. However, due to the finite sample size, wrong decisions might be made

Fig. 2.16
figure 16

Linear with additive uniform noise. Pair 2 in Table 2.1. Unlike the previous pair, this one is identifiable with the Additive Noise Model (ANM): IR Y > IR X, there is a better independence between the input and the residual in the correct direction. However the R 2 is better in the wrong direction!

Fig. 2.17
figure 17

Linear with multiplicative uniform noise. Pair 3 in Table 2.1. The ANM has difficulties, but the CDA works

Fig. 2.18
figure 18

S-shaped function with Gaussian input violating the independence of input density and function. Pair 4 in Table 2.1. The IGCI entropy criterion fails, but the IGCI slope criterion works

Fig. 2.19
figure 19

Parabola with very small noise. Pair 5 in Table 2.1. The IGCI slope criterion fails, because the function is ntn invertible

Fig. 2.20
figure 20

Square root with binary noise. Pair 6 in Table 2.1. The regression fit and residual criteria fail as well as the CDS. But the IGCI criteria work

Fig. 2.21
figure 21

Altitude vs. temperature of German cities. Pair 1 in Table 2.2. The IGCI slope criterion fails, because the function is not invertible. More surprisingly, CDS fails too, probably because of outliers

Fig. 2.22
figure 22

Simulated altitude vs. temperature. Pair 2 in Table 2.2. Only the IGCI slope criterion fails, because the function is not invertible

Fig. 2.23
figure 23

Real weight vs. age pair. Pair 3 in Table 2.2. Only H really works on that pair (IR is neutral)

Fig. 2.24
figure 24

Simulated weight vs. age pair. Pair 4 in Table 2.2. Only R 2 and H work on that pair

Fig. 2.25
figure 25

Real pair “hill shade at 3 pm” vs. aspect. Pair 5 in Table 2.2. R 2 fails because of the multiplicative noise and S fails because the function is not invertible

Fig. 2.26
figure 26

Simulated pair “hill shade at 3 pm” vs. aspect. Pair 6 in Table 2.2. The only pair in which all diagnoses agree between real and synthetic data

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Guyon, I., Goudet, O., Kalainathan, D. (2019). Evaluation Methods of Cause-Effect Pairs. In: Guyon, I., Statnikov, A., Batu, B. (eds) Cause Effect Pairs in Machine Learning. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-21810-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-21810-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-21809-6

  • Online ISBN: 978-3-030-21810-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics