Defining and Discovering Interactive Causes

  • Xia JiangEmail author
  • Richard Neapolitan
Part of the Intelligent Systems Reference Library book series (ISRL, volume 137)


The problem of learning causal influences from passive data has attracted a good deal of attention in the past 30 years, and techniques have been developed and tested. These techniques assume the composition property, which entails that they cannot in general learn interactive causes with little marginal effects. However, such interactions are fairly commonplace. One notable example is genetic epistasis, which is the interaction of two or more genetic loci to affect phenotype. Often the genes exhibit little marginal effects. Another important example is the interaction of a treatment with patient features to affect outcomes. Even though efforts have recently been made towards developing new algorithms that discover such interactions from data, to our knowledge no definition of a discrete causal interaction has been forwarded. Using information theory, we develop a fuzzy definition of a discrete causal action, called Interaction Strength (IS). The IS is bounded above by 1 and equals 1 if the causes in the interaction exhibit no marginal effects. Using the IS and BN scoring, we develop an exhaustive search algorithm, Exhaustive-IGain, which learns interactions from low-dimension datasets, and a heuristic search algorithm, called MBS-IGain, which learns interactions from high-dimensional datasets. Using simulated high-dimensional datasets, based on models of genetic epistasis, we compare MBS-IGain to 7 algorithms that learn genetic epistasis from high-dimensional datasets, and show that MBS-IGain’s discovery performance is notably better than the other methods. We apply MBS-IGain to a real LOAD dataset, and obtain results substantiating previous research and new results. Using low-dimensional simulated datasets, we show Exhaustive-IGain can learn 4-cause interactions with no marginal effects. We apply Exhaustive-Gain to a real clinical breast cancer datasets, and learn interactions that agree with the judgements of a breast cancer oncologist. Our algorithms are only directly applicable to problems where we have a specified target and its candidate causes. However, our algorithms could be used for general causal learning by being a front end to a standard causal learning algorithm.


Bayesian network Interaction Causal learning Information gain Entropy Epistasis SNP GWAS 




This work was supported by National Library of Medicine grants number R00LM010822, R01LM011663, and R01LM011962.


  1. 1.
    Spirtes, P., Glymour, C., Scheines, R.: Causation, prediction, and search. MIT Press, Boston, MA (2000)zbMATHGoogle Scholar
  2. 2.
  3. 3.
    Chickering, D., Meek, C.,: Finding optimal Bayesian networks. In: Darwiche, A., Friedman, N. (eds.) Uncertainty in Artificial Intelligence, Proceedings of the Eighteenth Conference. Morgan Kaufmann, San Mateo, CA (2002)Google Scholar
  4. 4.
    Cheverud, J., Routman, E.: Epistasis and its contribution to genetic variance components. Genetics 139(3), 1455 (1995)Google Scholar
  5. 5.
    Urbanowicz, R., Granizo-Mackenzie, A., Kiralis, J., Moore, J.H.: A classification and characterization of two-locus, pure, strict, epistatic models for simulation and detection. BioData Min. 7, 8 (2014)CrossRefGoogle Scholar
  6. 6.
    Fisher, R.: The correlation between relatives on the supposition of mendelian inheritance. Trans R Soc Edinburgh 52, 399–433 (1918)CrossRefGoogle Scholar
  7. 7.
    Galvin, A., Ioannidis, J.P.A., Dragani, T.A.: Beyond genome-wide association studies: Genetic heterogeneity and individual predisposition to cancer. Trends Genet. 26(3), 132–141 (2010)CrossRefGoogle Scholar
  8. 8.
    Manolio, T.A., Collins, F.S., Cox, N.J., et al.: Finding the missing heritability of complex diseases and complex traits. Nature 461, 747–753 (2009)CrossRefGoogle Scholar
  9. 9.
    Mahr, B.: Personal genomics: The case of missing heritability. Nature 456, 18–21 (2008)CrossRefGoogle Scholar
  10. 10.
    Moore, J.H., Asselbergs, F.W., Williams, S.M.: Bioinformatics challenges for genome-wide association studies. Bioinformatics 26, 445–455 (2010)CrossRefGoogle Scholar
  11. 11.
    Manolio, T.A., Collins, F.S.: The HapMap and genome-wide association studies in diagnosis and therapy. Annu. Rev. Med. 60, 443–456 (2009)CrossRefGoogle Scholar
  12. 12.
    Herbert, A., Gerry, N.P., McQueen, M.B.: A common genetic variant is associated with adult and childhood obesity. J. Comput. Biol. 312, 279–384 (2006)Google Scholar
  13. 13.
    Spinola, M., Meyer, P., Kammerer, S., et al.: Association of the PDCD5 locus with long cancer risk and prognosis in smokers. Am. J. Hum. Genet. 55, 27–46 (2001)Google Scholar
  14. 14.
    Lambert, J.C., Heath, S., Even, G., et al.: Genome-wide association study identifies variants at CLU and CR1 associated with Alzheimer’s disease. Nat. Genet. 41, 1094–1099 (2009)CrossRefGoogle Scholar
  15. 15.
    Curtis, C., Shah, S.P., Chin, S.F., et al.: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroup. Nature 486, 346–352 (2012)Google Scholar
  16. 16.
    Soulakis, N.D., Carson, M.B., Lee, Y.J., Schneider, D.H., Skeehan, C.T., Scholtens, D.M.: Visualizing collaborative electronic health record usage for hospitalized patients with heart failure. JAMIA 22(2), 299–311 (2015)Google Scholar
  17. 17.
    Neapolitan, R.E.: Learning Bayesian Networks. Prentice Hall, Upper Saddle River, NJ (2004)Google Scholar
  18. 18.
    Jensen, F.V., Neilsen, T.D.: Bayesian Networks and Decision Graphs. Springer-Verlag, New York (2007)CrossRefGoogle Scholar
  19. 19.
    Neapolitan, R.E.: Probabilistic reasoning in expert systems. Wiley, NY, NY (1989)Google Scholar
  20. 20.
    Pearl, J.: Probabilistic reasoning in intelligent systems. Morgan Kaufmann, Burlington, MA (1988)zbMATHGoogle Scholar
  21. 21.
    Segal, E., Pe’er, D., Regev, A., Koller, D., Friedman, N.: Learning module networks. Journal of Machine Learning Research 6, 557–588 (2005)zbMATHMathSciNetGoogle Scholar
  22. 22.
    Friedman, N., Linial, M., Nachman, I., Pe’er, D. Using Bayesian networks to analyze expression data. In: Proceedings of the fourth annual international conference on computational molecular biology, Tokyo, Japan (2005)Google Scholar
  23. 23.
    Fishelson, M., Geiger, D.: Optimizing exact genetic linkage computation. J. Comput. Biol. 11, 263–275 (2004)CrossRefGoogle Scholar
  24. 24.
    Neapolitan, R.E.: Probabilistic Reasoning in Bioinformatics. Morgan Kaufmann, Burlington, MA (2009)zbMATHGoogle Scholar
  25. 25.
    Jiang, X., Cooper, G.F.: A real-time temporal Bayesian architecture for event surveillance and its application to patient-specific multiple disease outbreak detection. Data Min. Knowl. Disc. 20(3), 328–360 (2010)CrossRefGoogle Scholar
  26. 26.
    Jiang, X., Wallstrom, G., Cooper, G.F., Wagner, M.M.: Bayesian prediction of an epidemic curve. J. Biomed. Inform. 42(1), 90–99 (2009)CrossRefGoogle Scholar
  27. 27.
    Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks. J. Artif. Intell. Res 42(2–3), 393–405 (1990)CrossRefzbMATHMathSciNetGoogle Scholar
  28. 28.
    Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 309–347 (1992)zbMATHGoogle Scholar
  29. 29.
    Heckerman D, Geiger D, Chickering D. Learning Bayesian networks: The combination of knowledge and statistical data. Technical report MSR-TR-94–09. Microsoft Research, 1995Google Scholar
  30. 30.
    Chickering, M.: Learning Bayesian networks is NP-complete. In: Fisher, D., Lenz, H., (eds.) Learning from Data: Artificial Intelligence and Statistics V. Springer-Verlag, NY (1996)Google Scholar
  31. 31.
    Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27(3), 379–423 (1948)CrossRefzbMATHMathSciNetGoogle Scholar
  32. 32.
    Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)CrossRefzbMATHGoogle Scholar
  33. 33.
    Zang, Z., Jiang, X., Neapolitan, R.E.: Discovering causal interactions using Bayesian network scoring and information gain. BMC Bioinformatics 17, 221 (2016)CrossRefGoogle Scholar
  34. 34.
    Jiang, X., Jao, J., Neapolitan, R.E. Learning predictive interactions using Information Gain and Bayesian network scoring. PLOS ONE (2015)
  35. 35.
    Jiang, X., Barmada, M.M., Cooper, G.F., Becich, M.J.: A Bayesian method for evaluating and discovering disease loci associations. PLoS ONE 6(8), e22075 (2011)CrossRefGoogle Scholar
  36. 36.
    Kooperberg, C., Ruczinski, I.: Identifying interacting SNPs using Monte Carlo logic regression. Genet. Epidemiol. 28, 157–170 (2005)CrossRefGoogle Scholar
  37. 37.
    Agresti, A.: Categorical data analysis, 2nd edn. Wiley, New York (2007)zbMATHGoogle Scholar
  38. 38.
    Park, M.Y., Hastie, T.: Penalized logistic regression for detecting gene interactions. Biostatistics 9, 30–50 (2008)CrossRefzbMATHGoogle Scholar
  39. 39.
    Wu, T.T., Chen, Y.F., Hastie, T., Sobel, E., Lange, K.: Genome-wide association analysis by lasso penalized logistic regression. Genome Analysis 25, 714–721 (2009)Google Scholar
  40. 40.
    Hahn, L.W., Ritchie, M.D., Moore, J.H.: Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19, 376–382 (2003)CrossRefGoogle Scholar
  41. 41.
    Marchini, J., Donnelly, P., Cardon, L.R.: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 37, 413–417 (2005)CrossRefGoogle Scholar
  42. 42.
    Moore, J.H., Gilbert, J.C., Tsai, C.T., Chiang, F.T., Holden, T., Barney, N., et al.: A flexible computational framework for detecting characterizing and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241, 252–261 (2006)CrossRefMathSciNetGoogle Scholar
  43. 43.
    Yang, C., He, Z., Wan, X., Yang, Q., Xue, H., Yu, W.: SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics 25, 504–511 (2009)CrossRefGoogle Scholar
  44. 44.
    Moore, J.H., White, B.C. Tuning ReliefF for genome-wide genetic analysis. In: Marchiori, E., Moore JH, Rajapakee JC (eds.) Proceedings of EvoBIO 2007. Berlin: Springer-Verlag (2007)Google Scholar
  45. 45.
    Meng Y, Yang Q, Cuenco KT, Cupples LA, Destefano AL, Lunetta KL 2007. Two-stage approach for identifying single-nucleotide polymorphisms associated with rheumatoid arthritis using random forests and Bayesian networks. BMC Proc 2007: 1 Suppl 1:S56Google Scholar
  46. 46.
    Wan, X., Yang, C., Yang, Q., Xue, H., Tang, N.L., Yu, W.: Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 26(1), 30–37 (2007)CrossRefGoogle Scholar
  47. 47.
    Zhang, Y., Liu, J.S.: Bayesian inference of epistatic interactions in case control studies. Nat. Genet. 39, 1167–1173 (2007)CrossRefGoogle Scholar
  48. 48.
    Miller, D.J., Zhang, Y., Yu, G., Liu, Y., Chen, L., Langefeld, C.D., et al.: An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics 25(19), 2478–2485 (2009)CrossRefGoogle Scholar
  49. 49.
    Jiang X, Barmada MM, Neapolitan RE, Visweswaran S, Cooper GF. A fast algorithm for learning epistatic genomic relationships. AMIA Symposium Proceedings 2010: 341–345Google Scholar
  50. 50.
    Jiang, X., Neapolitan, R.E.: LEAP: biomarker inference through learning and evaluating association patterns. Genet. Epidemiol. 39(3), 173–184 (2015)CrossRefGoogle Scholar
  51. 51.
    Chen, L., Yu, G., Langefeld, C.D., et al.: Comparative analysis of methods for detecting interacting loci. BMC Genom. 12, 344 (2011)CrossRefGoogle Scholar
  52. 52.
    Rieman, E.M., Webster, J.A., Myers, A.J., Hardy, J., Dunckley, T., Zismann, V.L., et al.: GAB2 alleles modify Alzheimer’s risk in APOE carriers. Neuron 54, 713–720 (2007)CrossRefGoogle Scholar
  53. 53.
    Tycko, B., Lee, J.H., Ciappa, A., Saxena, A., Li, C.M., Feng, L.: APOE and APOC1 promoter polymorphisms and the risk of Alzheimer disease in African American and Caribbean Hispanic individuals. Arch. Neurol. 61(9), 1434–1439 (2004)CrossRefGoogle Scholar
  54. 54.
    Turner SD, Martin ER, Beecham GW, Gilbert JR, Haines JL, Pericak-Vance MA, et al. Genome-wide Analysis of Gene-Gene Interaction in Alzheimer Disease. Abstract in ASHG 2008 Annual Meeting (2008)Google Scholar
  55. 55.
    Urbanowicz R, Kiralis J, Sinnott-Armstrong NA, et al. GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 2012; 5(1):16. doi: 10.1186/1756-0381-5-16
  56. 56.
    Fisher, R.A.: On the ‘probable error’ of a coefficient of correlation deduced from a small sample. Metron 1, 3–32 (1921)Google Scholar
  57. 57.
    Rathnam, C., Lee, S., Jiang, X.: An algorithm for direct causal learning of influences on patient outcomes. Artif. Intell. Med. 75, 1–15 (2017)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Department of Biomedical InformaticsUniversity of PittsburghPittsburghUSA
  2. 2.Department of Preventive MedicineNorthwestern University Feinberg School of MedicineChicagoUSA

Personalised recommendations