COE: A General Approach for Efficient Genome-Wide Two-Locus Epistasis Test in Disease Association Study

  • Xiang Zhang
  • Feng Pan
  • Yuying Xie
  • Fei Zou
  • Wei Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5541)


The availability of high density single nucleotide polymorphisms (SNPs) data has made genome-wide association study computationally challenging. Two-locus epistasis (gene-gene interaction) detection has attracted great research interest as a promising method for genetic analysis of complex diseases. In this paper, we propose a general approach, COE, for efficient large scale gene-gene interaction analysis, which supports a wide range of tests. In particular, we show that many commonly used statistics are convex functions. From the observed values of the events in two-locus association test, we can develop an upper bound of the test value. Such an upper bound only depends on single-locus test and the genotype of the SNP-pair. We thus group and index SNP-pairs by their genotypes. This indexing structure can benefit the computation of all convex statistics. Utilizing the upper bound and the indexing structure, we can prune most of the SNP-pairs without compromising the optimality of the result. Our approach is especially efficient for large permutation test. Extensive experiments demonstrate that our approach provides orders of magnitude performance improvement over the brute force approach.


Convex Function Permutation Test Indexing Structure Error Threshold High Density Single Nucleotide Polymorphism 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Balding, D.J.: A tutorial on statistical methods for population association studies. Nature Reviews Genetics 7(10), 781–791 (2006)CrossRefPubMedGoogle Scholar
  5. 5.
    Bohringer, S., Hardt, C., Miterski, B., Steland, A., Epplen, J.T.: Multilocus statistics to uncover epistasis and heterogeneity in complex diseases: revisiting a set of multiple sclerosis data. European Journal of Human Genetics 11, 573–584 (2003)CrossRefPubMedGoogle Scholar
  6. 6.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefGoogle Scholar
  7. 7.
    Carlborg, O., Andersson, L., Kinghom, B.: The use of a genetic algorithm for simultaneous mapping of multiple interacting quantitative trait loci. Genetics 155, 2003–2010 (2000)PubMedPubMedCentralGoogle Scholar
  8. 8.
    Carlson, C.S., Eberle, M.A., Kruglyak, L., Nickerson, D.A.: Mapping complex disease loci in whole-genome association studies. Nature 429, 446–452 (2004)CrossRefPubMedGoogle Scholar
  9. 9.
    Chi, P.B., et al.: Comparison of snp tagging methods using empirical data: association study of 713 snps on chromosome 12q14.3-12q24.21 for asthma and total serum ige in an african caribbean population. Genet. Epidemiol. 30(7), 609–619 (2006)CrossRefPubMedGoogle Scholar
  10. 10.
    Cordell, H.J.: Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Human Molecular Genetics 11(20), 2463–2468 (2002)CrossRefPubMedGoogle Scholar
  11. 11.
    Doerge, R.W.: Multifactorial genetics: Mapping and analysis of quantitative trait loci in experimental populations. Nature Reviews Genetics 3, 43–52 (2002)CrossRefPubMedGoogle Scholar
  12. 12.
    Dong, C., et al.: Exploration of gene–gene interaction effects using entropy-based methods. European Journal of Human Genetics 16, 229–235 (2008)CrossRefPubMedGoogle Scholar
  13. 13.
    Erlichman, C., Sargent, D.J.: New treatment options for colorectal cancer. N. Engl. J. Med. 351, 391–392 (2004)CrossRefPubMedGoogle Scholar
  14. 14.
    Evans, D.M., Marchini, J., Morris, A.P., Cardon, L.R.: Two-stage two-locus models in genome-wide association. PLoS Genet. 2, e157 (2006)CrossRefGoogle Scholar
  15. 15.
    Halperin, E., Kimmel, G., Shamir, R.: Tag snp selection in genotype data for maximizing snp prediction accuracy. In: Proc. ISMB (2005)Google Scholar
  16. 16.
    Herbert, A., et al.: A common genetic variant is associated with adult and childhood obesity. Science 312, 279–284 (2006)CrossRefPubMedGoogle Scholar
  17. 17.
    Hoh, J., Ott, J.: Mathematical multi-locus approaches to localizing complex human trait genes. Nature Reviews Genetics 4, 701–709 (2003)CrossRefPubMedGoogle Scholar
  18. 18.
    Kirman, I., Huang, E.H., Whelan, R.L.: B cell response to tumor antigens is associated with depletion of b progenitors in murine colocarcinoma. Surgery 135, 313–318 (2004)CrossRefPubMedGoogle Scholar
  19. 19.
    Nelson, M.R., Kardia, S.L., Ferrell, R.E., Sing, C.F.: A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Research 11, 458–470 (2001)CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Ozaki, K., et al.: Functional snps in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction. Nat. Genet. 32, 650–654 (2002)CrossRefPubMedGoogle Scholar
  21. 21.
    Pagano, M., Gauvreau, K.: Principles of Biostatistics. Duxbury Press, Pacific Grove (2000)Google Scholar
  22. 22.
    Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H.: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. American Journal of Human Genetics 69, 138–147 (2001)CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Roberts, A., McMillan, L., Wang, W., Parker, J., Rusyn, I., Threadgill, D.: Inferring missing genotypes in large snp panels using fast nearest-neighbor searches over sliding windows. In: Proc. ISMB (2007)Google Scholar
  24. 24.
    Roses, A.: The genome era begins. Nat. Genet. 33(suppl. 2), 217 (2003)CrossRefGoogle Scholar
  25. 25.
    Ruivenkamp, C.A., Csikos, T., Klous, A.M., van Wezel, T., Demant, P.: Five new mouse susceptibility to colon cancer loci, scc11-scc15. Oncogene. 22, 7258–7260 (2003)CrossRefPubMedGoogle Scholar
  26. 26.
    Saxena, R., et al.: Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336 (2007)CrossRefPubMedGoogle Scholar
  27. 27.
    Scuteri, A., et al.: Genome-wide association scan shows genetic variants in the fto gene are associated with obesity-related traits. PLoS Genet. 3(7) (2007)Google Scholar
  28. 28.
    Sebastiani, P., Lazarus, R., Weiss, S.T., Kunkel, L.M., Kohane, I.S., Ramoni, M.F.: Minimal haplotype tagging. Proc. Natl. Acad. Sci. USA 100(17), 9900–9905 (2003)CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Segré, D., DeLuna, A., Church, G.M., Kishony, R.: Modular epistasis in yeast metabolism. Nat. Genet. 37, 77–83 (2005)PubMedGoogle Scholar
  30. 30.
    Storey, J., Akey, J., Kruglyak, L.: Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biology 8, e267 (2005)CrossRefGoogle Scholar
  31. 31.
    Thomas, D.C.: Statistical methods in genetic epidemiology. Oxford Univeristy Press, Oxford (2004)Google Scholar
  32. 32.
    Wade, C.M., Daly, M.J.: Genetic variation in laboratory mice. Nat. Genet. 3737, 1175–1180 (2005)CrossRefGoogle Scholar
  33. 33.
    Weedon, M.N., et al.: A common variant of hmga2 is associated with adult and childhood height in the general population. Nat. Genet. 39, 1245–1250 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Zhang, X., Zou, F., Wang, W.: Fastanova: an efficient algorithm for genome-wide association study. In: KDD (2008)Google Scholar
  35. 35.
    Zhang, X., Zou, F., Wang, W.: FastChi: an efficient algorithm for analyzing gene-gene interactions. In: PSB (2009)Google Scholar
  36. 36.
    Zhao, J., Boerwinkle, E., Xiong, M.: An entropy-based statistic for genomewide association studies. Am. J. Hum. Genet. 77, 27–40 (2005)CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Xiang Zhang
    • 1
  • Feng Pan
    • 1
  • Yuying Xie
    • 2
  • Fei Zou
    • 3
  • Wei Wang
    • 1
  1. 1.Department of Computer ScienceUniversity of North Carolina at Chapel HillUSA
  2. 2.Department of GeneticsUniversity of North Carolina at Chapel HillUSA
  3. 3.Department of BiostatisticsUniversity of North Carolina at Chapel HillUSA

Personalised recommendations