A Multifactor Dimensionality Reduction Based Associative Classification for Detecting SNP Interactions

  • Suneetha UppuEmail author
  • Aneesh Krishna
  • Raj P. Gopalan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9489)


Identification and characterization of interactions between genes have been increasingly explored in current Genome-wide association studies (GWAS). Several machine learning and data mining approaches have been proposed to identify the multi-locus interactions in higher order genomic data. However, detecting these interactions is challenging due to bio-molecular complexities and computational limitations. In this paper, a multifactor dimensionality reduction based associative classifier is proposed for detecting SNP interactions in genetic epidemiological studies. The approach is evaluated for one to six loci models by varying heritability, minor allele frequency, case-control ratios and sample size. The experimental results demonstrated significant improvements in accuracy for detecting interacting single nucleotide polymorphisms (SNPs) responsible for complex diseases when compared to the previous approaches. Further, the approach was successfully evaluated by using sporadic breast cancer data. The results show interactions among five polymorphisms in three different estrogen-metabolism genes.


Epistasis Genome wide association studies Associative classification SNP interactions Multifactor dimensionality reduction 


  1. 1.
    Sheet, S.F., Human genome project. US Department of Energy Genome Program’s Biological and Environmental Research Information System (BERIS). Accessed 28 July 2010
  2. 2.
    Padyukov, L.: Between the Lines of Genetic Code: Genetic Interactions in Understanding Disease and Complex Phenotypes. Academic Press, Waltham, MA (2013)Google Scholar
  3. 3.
    Cordell, H.J.: Detecting gene–gene interactions that underlie human diseases. Nat. Rev. Genet. 10(6), 392–404 (2009)CrossRefGoogle Scholar
  4. 4.
    Koo, C.L., et al.: A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology. In: BioMed Research International (2013)Google Scholar
  5. 5.
    Qi, Y.: Random Forest for Bioinformatics. In: Zhang, C., Ma, Y. (eds.) Ensemble Machine Learning, pp. 307–323. Springer, New York (2012)CrossRefGoogle Scholar
  6. 6.
    Chen, C.C., et al.: Methods for identifying SNP interactions: a review on variations of logic regression, random forest and Bayesian logistic regression. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(6), 1580–1591 (2011)CrossRefGoogle Scholar
  7. 7.
    Zhang, Y., Liu, J.S.: Bayesian inference of epistatic interactions in case-control studies. Nat. Genet. 39(9), 1167–1173 (2007)CrossRefGoogle Scholar
  8. 8.
    Ritchie, M.D., et al.: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69(1), 138–147 (2001)CrossRefGoogle Scholar
  9. 9.
    Motsinger-Reif, A.A., et al.: Comparison of approaches for machine-learning optimization of neural networks for detecting gene-gene interactions in genetic epidemiology. Genet. Epidemiol. 32(4), 325–340 (2008)CrossRefGoogle Scholar
  10. 10.
    McKinney, B.A., et al.: Machine learning for detecting gene-gene interactions. Appl. Bioinform. 5(2), 77–88 (2006)CrossRefGoogle Scholar
  11. 11.
    Ramanan, V.K., et al.: Pathway analysis of genomic data: concepts, methods, and prospects for future development. Trends Genet. 28(7), 323–332 (2012)CrossRefGoogle Scholar
  12. 12.
    Upstill-Goddard, R., et al.: Machine learning approaches for the discovery of gene–gene interactions in disease data. Briefings Bioinform. 14(2), 251–260 (2013)CrossRefGoogle Scholar
  13. 13.
    Moore, J.H., et al.: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241(2), 252–261 (2006)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Thabtah, F.: A review of associative classification mining. Knowl. Eng. Rev. 22(01), 37–65 (2007)CrossRefGoogle Scholar
  15. 15.
    Yu, P., Wild, D.J.: Fast rule-based bioactivity prediction using associative classification mining. J. Cheminformatics 4(1), 1–10 (2012)CrossRefGoogle Scholar
  16. 16.
    Uppu, S., Krishna, A., Gopalan, R.P.: Detecting SNP Interactions in balanced and imbalanced datasets using associative classification. Aust. J. Intell. Inf. Process. Syst. 14(1), 7–18 (2014)Google Scholar
  17. 17.
    Uppu, S., Krishna, A., Gopalan, R.P.: An associative classification based approach for detecting SNP-SNP interactions in high dimensional genome. In: IEEE International Conference on Bioinformatics and Bioengineering (BIBE). IEEE (2014)Google Scholar
  18. 18.
    Han, J.: CPAR: Classification based on predictive association rules. In: Proceedings of the Third SIAM International Conference on Data Mining (2003)Google Scholar
  19. 19.
    Velez, D.R., et al.: A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet. Epidemiol. 31(4), 306–315 (2007)CrossRefGoogle Scholar
  20. 20.
    Hahn, L.W., Ritchie, M.D., Moore, J.H.: Multifactor dimensionality reduction software for detecting gene–gene and gene–environment interactions. Bioinformatics 19(3), 376–382 (2003)CrossRefGoogle Scholar
  21. 21.
    Urbanowicz, R.J., et al.: GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 5(1), 1–14 (2012)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Suneetha Uppu
    • 1
    Email author
  • Aneesh Krishna
    • 1
  • Raj P. Gopalan
    • 1
  1. 1.Department of ComputingCurtin UniversityPerthAustralia

Personalised recommendations