Skip to main content

COE: A General Approach for Efficient Genome-Wide Two-Locus Epistasis Test in Disease Association Study

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5541))

Abstract

The availability of high density single nucleotide polymorphisms (SNPs) data has made genome-wide association study computationally challenging. Two-locus epistasis (gene-gene interaction) detection has attracted great research interest as a promising method for genetic analysis of complex diseases. In this paper, we propose a general approach, COE, for efficient large scale gene-gene interaction analysis, which supports a wide range of tests. In particular, we show that many commonly used statistics are convex functions. From the observed values of the events in two-locus association test, we can develop an upper bound of the test value. Such an upper bound only depends on single-locus test and the genotype of the SNP-pair. We thus group and index SNP-pairs by their genotypes. This indexing structure can benefit the computation of all convex statistics. Utilizing the upper bound and the indexing structure, we can prune most of the SNP-pairs without compromising the optimality of the result. Our approach is especially efficient for large permutation test. Extensive experiments demonstrate that our approach provides orders of magnitude performance improvement over the brute force approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://www.fnih.org/GAIN2/home_new.shtml

  2. http://www.gnf.org/

  3. http://www.jax.org/

  4. Balding, D.J.: A tutorial on statistical methods for population association studies. Nature Reviews Genetics 7(10), 781–791 (2006)

    Article  CAS  PubMed  Google Scholar 

  5. Bohringer, S., Hardt, C., Miterski, B., Steland, A., Epplen, J.T.: Multilocus statistics to uncover epistasis and heterogeneity in complex diseases: revisiting a set of multiple sclerosis data. European Journal of Human Genetics 11, 573–584 (2003)

    Article  PubMed  Google Scholar 

  6. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  7. Carlborg, O., Andersson, L., Kinghom, B.: The use of a genetic algorithm for simultaneous mapping of multiple interacting quantitative trait loci. Genetics 155, 2003–2010 (2000)

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Carlson, C.S., Eberle, M.A., Kruglyak, L., Nickerson, D.A.: Mapping complex disease loci in whole-genome association studies. Nature 429, 446–452 (2004)

    Article  CAS  PubMed  Google Scholar 

  9. Chi, P.B., et al.: Comparison of snp tagging methods using empirical data: association study of 713 snps on chromosome 12q14.3-12q24.21 for asthma and total serum ige in an african caribbean population. Genet. Epidemiol. 30(7), 609–619 (2006)

    Article  PubMed  Google Scholar 

  10. Cordell, H.J.: Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Human Molecular Genetics 11(20), 2463–2468 (2002)

    Article  CAS  PubMed  Google Scholar 

  11. Doerge, R.W.: Multifactorial genetics: Mapping and analysis of quantitative trait loci in experimental populations. Nature Reviews Genetics 3, 43–52 (2002)

    Article  CAS  PubMed  Google Scholar 

  12. Dong, C., et al.: Exploration of gene–gene interaction effects using entropy-based methods. European Journal of Human Genetics 16, 229–235 (2008)

    Article  CAS  PubMed  Google Scholar 

  13. Erlichman, C., Sargent, D.J.: New treatment options for colorectal cancer. N. Engl. J. Med. 351, 391–392 (2004)

    Article  CAS  PubMed  Google Scholar 

  14. Evans, D.M., Marchini, J., Morris, A.P., Cardon, L.R.: Two-stage two-locus models in genome-wide association. PLoS Genet. 2, e157 (2006)

    Article  Google Scholar 

  15. Halperin, E., Kimmel, G., Shamir, R.: Tag snp selection in genotype data for maximizing snp prediction accuracy. In: Proc. ISMB (2005)

    Google Scholar 

  16. Herbert, A., et al.: A common genetic variant is associated with adult and childhood obesity. Science 312, 279–284 (2006)

    Article  CAS  PubMed  Google Scholar 

  17. Hoh, J., Ott, J.: Mathematical multi-locus approaches to localizing complex human trait genes. Nature Reviews Genetics 4, 701–709 (2003)

    Article  CAS  PubMed  Google Scholar 

  18. Kirman, I., Huang, E.H., Whelan, R.L.: B cell response to tumor antigens is associated with depletion of b progenitors in murine colocarcinoma. Surgery 135, 313–318 (2004)

    Article  PubMed  Google Scholar 

  19. Nelson, M.R., Kardia, S.L., Ferrell, R.E., Sing, C.F.: A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Research 11, 458–470 (2001)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Ozaki, K., et al.: Functional snps in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction. Nat. Genet. 32, 650–654 (2002)

    Article  CAS  PubMed  Google Scholar 

  21. Pagano, M., Gauvreau, K.: Principles of Biostatistics. Duxbury Press, Pacific Grove (2000)

    Google Scholar 

  22. Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H.: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. American Journal of Human Genetics 69, 138–147 (2001)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Roberts, A., McMillan, L., Wang, W., Parker, J., Rusyn, I., Threadgill, D.: Inferring missing genotypes in large snp panels using fast nearest-neighbor searches over sliding windows. In: Proc. ISMB (2007)

    Google Scholar 

  24. Roses, A.: The genome era begins. Nat. Genet. 33(suppl. 2), 217 (2003)

    Article  CAS  Google Scholar 

  25. Ruivenkamp, C.A., Csikos, T., Klous, A.M., van Wezel, T., Demant, P.: Five new mouse susceptibility to colon cancer loci, scc11-scc15. Oncogene. 22, 7258–7260 (2003)

    Article  CAS  PubMed  Google Scholar 

  26. Saxena, R., et al.: Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336 (2007)

    Article  CAS  PubMed  Google Scholar 

  27. Scuteri, A., et al.: Genome-wide association scan shows genetic variants in the fto gene are associated with obesity-related traits. PLoS Genet. 3(7) (2007)

    Google Scholar 

  28. Sebastiani, P., Lazarus, R., Weiss, S.T., Kunkel, L.M., Kohane, I.S., Ramoni, M.F.: Minimal haplotype tagging. Proc. Natl. Acad. Sci. USA 100(17), 9900–9905 (2003)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Segré, D., DeLuna, A., Church, G.M., Kishony, R.: Modular epistasis in yeast metabolism. Nat. Genet. 37, 77–83 (2005)

    PubMed  Google Scholar 

  30. Storey, J., Akey, J., Kruglyak, L.: Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biology 8, e267 (2005)

    Article  Google Scholar 

  31. Thomas, D.C.: Statistical methods in genetic epidemiology. Oxford Univeristy Press, Oxford (2004)

    Google Scholar 

  32. Wade, C.M., Daly, M.J.: Genetic variation in laboratory mice. Nat. Genet. 3737, 1175–1180 (2005)

    Article  Google Scholar 

  33. Weedon, M.N., et al.: A common variant of hmga2 is associated with adult and childhood height in the general population. Nat. Genet. 39, 1245–1250 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Zhang, X., Zou, F., Wang, W.: Fastanova: an efficient algorithm for genome-wide association study. In: KDD (2008)

    Google Scholar 

  35. Zhang, X., Zou, F., Wang, W.: FastChi: an efficient algorithm for analyzing gene-gene interactions. In: PSB (2009)

    Google Scholar 

  36. Zhao, J., Boerwinkle, E., Xiong, M.: An entropy-based statistic for genomewide association studies. Am. J. Hum. Genet. 77, 27–40 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, X., Pan, F., Xie, Y., Zou, F., Wang, W. (2009). COE: A General Approach for Efficient Genome-Wide Two-Locus Epistasis Test in Disease Association Study. In: Batzoglou, S. (eds) Research in Computational Molecular Biology. RECOMB 2009. Lecture Notes in Computer Science(), vol 5541. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02008-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02008-7_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02007-0

  • Online ISBN: 978-3-642-02008-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics