Data-driven discovery of causal interactions

  • Saisai MaEmail author
  • Lin Liu
  • Jiuyong Li
  • Thuc Duy Le
Regular Paper


Causal discovery is a primary focus in many fields. Various methods have been developed to mine causal relationships from observational data. Most of the methods are only capable of identifying individual causes without considering their interactions. However, in real life, many effects are due to multiple factors that interact with each other. Therefore, detecting the interactions between those causal factors is essential for understanding the real causal mechanisms. So far, there are no efficient data-driven approaches to discovering causal interactions from data, especially large data sets. In this paper, we propose a general data-driven framework and develop four algorithms instantiated from the framework to detect causal interactions, directly from data. Extensive experiments on both synthetic and real-world data have shown that the proposed framework and the algorithms can achieve high effectiveness and efficiency for causal interaction discovery.


Causal discovery Potential outcome Causal interactions 



This work has been partially supported by Australian Research Council (ARC) Discovery grant DP140103617 and ARC Discovery grant DP170101306.


  1. 1.
    Ahrens, W., Krickeberg, K., Pigeot, I.: An introduction to epidemiology. In: Ahrens, W., Pigeot, I. (eds.) Handbook of Epidemiology, pp 1–40. Springer, Berlin (2005)CrossRefGoogle Scholar
  2. 2.
    Bartel, D.P.: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116(2), 281–297 (2004)CrossRefGoogle Scholar
  3. 3.
    Dao, B., Nguyen, T., Venkatesh, S., Phung, D.: Latent sentiment topic modelling and nonparametric discovery of online mental health-related communities. Int. J. Data Sci. Anal. 4(3), 209–31 (2017)CrossRefGoogle Scholar
  4. 4.
    Eberhardt, F.: Introduction to the foundations of causal discovery. Int. J. Data Sci. Anal. 3(2), 81–91 (2017)CrossRefGoogle Scholar
  5. 5.
    Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. Wiley, New York (2013)zbMATHGoogle Scholar
  6. 6.
    Hahn, L.W., Ritchie, M.D., Moore, J.H.: Multifactor dimensionality reduction software for detecting gene–gene and gene-environment interactions. Bioinformatics 19(3), 376–382 (2003)CrossRefGoogle Scholar
  7. 7.
    Hastie, T., Tibshirani, R., Narasimhan, B., Chu, G.: Package ‘impute’ (2016).
  8. 8.
    Hunter, D.J.: Gene-environment interactions in human diseases. Nat. Rev. Genet. 6(4), 287–298 (2005)CrossRefGoogle Scholar
  9. 9.
    Imbens, G.W.: The role of the propensity score in estimating dose–response functions. Biometrika 87(3), 706–710 (2000)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Jiang, X., Neapolitan, R.E., Barmada, M.M., Visweswaran, S., Cooper, G.F.: A fast algorithm for learning epistatic genomic relationships. AMIA Ann. Symp. Proc. 2010, 341–345 (2010)Google Scholar
  11. 11.
    Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)CrossRefGoogle Scholar
  12. 12.
    Knol, M.J., VanderWeele, T.J., Groenwold, R.H.H., Klungel, O.H., Rovers, M.M., Grobbee, D.E.: Estimating measures of interaction on an additive scale for preventive exposures. Eur. J. Epidemiol. 26(6), 433–438 (2011)CrossRefGoogle Scholar
  13. 13.
    Kupper, L.L., Hogan, M.D.: Interaction in epidemiologic studies. Am. J. Epidemiol. 108(6), 447–453 (1978)CrossRefGoogle Scholar
  14. 14.
    Le, T.D., Zhang, J., Liu, L., Li, J.: Ensemble methods for miRNA target prediction from expression data. PLoS ONE 10(6), e0131-627 (2015)CrossRefGoogle Scholar
  15. 15.
    Lewis, B.P., Shih, I.H., Jones-Rhoades, M.W., Bartel, D.P., Burge, C.B.: Prediction of mammalian microRNA targets. Cell 115(7), 787–798 (2003)CrossRefGoogle Scholar
  16. 16.
    Li, J., Le, T.D., Liu, L., Liu, J., Jin, Z., Sun, B., Ma, S.: From observational studies to causal rule mining. ACM Trans. Intell. Syst. Technol. 7(2), 14 (2015)CrossRefGoogle Scholar
  17. 17.
    Li, J., Ma, S., Le, T., Liu, L., Liu, J.: Causal decision trees. IEEE Trans. Knowl. Data Eng. PP(99), 1–14 (2016)Google Scholar
  18. 18.
    Liddell, F.D.K.: The interaction of asbestos and smoking in lung cancer. Ann. Occup. Hyg. 45(5), 341–356 (2001)CrossRefGoogle Scholar
  19. 19.
    Ma, S., Li, J., Liu, L., Le, T.D.: Discovering Context Specific Causal Relationships. arXiv preprint arXiv:1808.06316 (2018)
  20. 20.
    Ma, S., Li, J., Liu, L., Le, T.D.: Mining combined causes in large data sets. Knowl. Based Syst. 92, 104–111 (2016)CrossRefGoogle Scholar
  21. 21.
    Miller, D.J., Zhang, Y., Yu, G., Liu, Y., Chen, L., Langefeld, C.D., Herrington, D., Wang, Y.: An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics 25(19), 2478–2485 (2009)CrossRefGoogle Scholar
  22. 22.
    Novick, L.R., Cheng, P.W.: Assessing interactive causal influence. Psychol. Rev. 111(2), 455 (2004)CrossRefGoogle Scholar
  23. 23.
    Pearl, J.: Causality: Models, Reasoning and Inference. Cambridge University Press, Cambridge (2000)zbMATHGoogle Scholar
  24. 24.
    Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., Smyth, G.K.: Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43(7), e47 (2015)CrossRefGoogle Scholar
  25. 25.
    Robins, J.M.: Marginal structural models versus structural nested models as tools for causal inference. In: Halloran, M.E., Berry, D. (eds.) Statistical Models in Epidemiology, the Environment, and Clinical Trials, pp 95–133. Springer, New York (2000)Google Scholar
  26. 26.
    Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Rosenbaum, P.R., Rubin, D.B.: Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79(387), 516–524 (1984)CrossRefGoogle Scholar
  28. 28.
    Rosenblum, M., van der Laan, M.J.: Optimizing randomized trial designs to distinguish which subpopulations benefit from treatment. Biometrika 98(4), 845–860 (2011)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Rothman, K.J.: Causes. Am. J. Epidemiol. 104(6), 587–592 (1976)CrossRefGoogle Scholar
  30. 30.
    Rothman, K.J., Greenland, S., Lash, T.L.: Modern Epidemiology. Lippincott Williams & Wilkins, Philadelphia (2008)Google Scholar
  31. 31.
    Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688 (1974)CrossRefGoogle Scholar
  32. 32.
    Song, J., Satoshi, O., Masahito, K.: Tell cause from effect: models and evaluation. Int. J. Data Sci. Anal. 4(2), 99–112 (2017)CrossRefGoogle Scholar
  33. 33.
    Soulakis, N.D., Carson, M.B., Lee, Y.J., Schneider, D.H., Skeehan, C.T., Scholtens, D.M.: Visualizing collaborative electronic health record usage for hospitalized patients with heart failure. J. Am. Med. Inf. Assoc. 22(2), 299–311 (2015)CrossRefGoogle Scholar
  34. 34.
    Van der Weele, T.J.: On the distinction between interaction and effect modification. Epidemiology 20(6), 863–871 (2009)CrossRefGoogle Scholar
  35. 35.
    Van der Weele, T.J., Robins, J.M.: A theory of sufficient cause interactions. COBRA Preprint Series, p. 13 (2006)Google Scholar
  36. 36.
    Van der Weele, T.J., Robins, J.M.: Empirical and counterfactual conditions for sufficient cause interactions. Biometrika 95(1), 49–61 (2008)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Vimaleswaran, K.S., Power, C., Hyppnen, E.: Interaction between vitamin D receptor gene polymorphisms and 25-hydroxyvitamin D concentrations on metabolic and cardiovascular disease outcomes. Diabetes Metab. 40(5), 386–389 (2014)CrossRefGoogle Scholar
  38. 38.
    White, P.A.: Causal judgement from contingency information: judging interactions between two causal candidates. Q. J. Exp. Psychol. Sect. A 55(3), 819–838 (2002)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Yang, S., Natarajan, S.: Knowledge intensive learning: combining qualitative constraints with causal independence for parameter learning in probabilistic models. In: Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science, pp 580–595. Springer, Berlin (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of IT and Mathematical SciencesUniversity of South AustraliaAdelaideAustralia

Personalised recommendations