Statistics in Biosciences

, Volume 10, Issue 1, pp 117–138 | Cite as

A Powerful Test for SNP Effects on Multivariate Binary Outcomes Using Kernel Machine Regression

  • Clemontina A. DavenportEmail author
  • Arnab Maity
  • Patrick F. Sullivan
  • Jung-Ying Tzeng


Evaluating multiple binary outcomes is common in genetic studies of complex diseases. These outcomes are often correlated because they are collected from the same individual and they may share common marker effects. In this paper, we propose a procedure to test for effect of a single nucleotide polymorphism-set on multiple, possibly correlated, binary responses. We develop a score-based test using a non-parametric modeling framework that jointly models the global effect of the marker set. We account for the non-linear effects and potentially complicated interaction between markers using reproducing kernels. Our testing procedure only requires estimation under the null hypothesis and we use multivariate generalized estimating equations to estimate the model components to account for the correlation among the outcomes. We evaluate finite sample performance of our test via simulation study and demonstrate our methods using the Clinical Antipsychotic Trials of Intervention Effectiveness antibody study data and the CoLaus study data.


Correlated binary responses Generalized estimating equations IBS kernel Kernel machine Non-parametric regression 



The authors thank Dr. Robert Yolken at Johns Hopkins University for providing the antibody data. The authors also thank Drs. Peter Vollenweider and Gerard Waeber, PIs of the CoLaus study, and Drs. Meg Ehm and Matthew Nelson, collaborators at GlaxoSmithKline for providing the CoLaus phenotype and sequence data. This work was supported by National Institutes of Health Grants R00 ES017744 (to A.M.), R01 MH084022 (to J.Y.T. and P.F.S.), and P01 CA142538 (to J.Y.T.).

Compliance with Ethical Standards

Conflicts of interest

The authors have no conflicts of interest to declare.


  1. 1.
    Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  2. 2.
    Arsenault BJ, Rana JS, Stroes ESG, Desprs J-P, Shah PK, Kastelein JJP, Wareham NJ, Boekholdt SM, Khaw K-T (2010) Beyond low-density lipoprotein cholesterol: respective contributions of nonhigh-density lipoprotein cholesterol levels, triglycerides, and the total cholesterol/high-density lipoprotein cholesterol ratio to coronary heart disease risk in apparently healthy men and women. J Am Coll Cardiol 55:3541Google Scholar
  3. 3.
    Austin MA, Hokanson JE, Edwards KL (1998) Hypertriglyceridemia as a cardiovascular risk factor. Am J Cardiol 81:7B12BCrossRefGoogle Scholar
  4. 4.
    Bauer CR, Shankaran S, Bada HS, Lester B, Wright LL, Krause-Steinrauf H, Smeriglio VL, Finnegan LP, Maza PL, Verter J (2002) The maternal lifestyle study: drug exposure during pregnancy and short-term maternal outcomes. Am J Obstet Gynecol 186:487–495CrossRefGoogle Scholar
  5. 5.
    Buhmann MD (2003) Radial basis functions: theory and implementations. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  6. 6.
    Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167CrossRefGoogle Scholar
  7. 7.
    Chen J, Chen W, Zhao N, Wu MC, Schaid DJ (2016) Small-sample kernel association tests for human genetic and microbiome association studies. Genet Epidemiol 40:5–19CrossRefGoogle Scholar
  8. 8.
    Das A, Poole WK, Bada HS (2004) A repeated measures approach for simultaneous modeling of multiple neurobehavioral outcomes in newborns exposed to cocaine in utero. Am J Epidemiol 159:891–899CrossRefGoogle Scholar
  9. 9.
    Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (2001) Executive summary of the third report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III). JAMA 285:2486–2497Google Scholar
  10. 10.
    Firmann M, Mayor V, Vidal PM, Bochud M, Pecoud A, Hayoz D, Paccaud F, Preisig M, Song KS, Yuan X et al (2008) The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8:6CrossRefGoogle Scholar
  11. 11.
    Freytag S, Bickeböller H, Amos CI, Kneib T, Schlather M (2012) A novel kernel for correcting size bias in the logistic kernel machine test with an application to rheumatoid arthritis. Hum Hered Hum7:97–108CrossRefGoogle Scholar
  12. 12.
    Girault EM, Foppen E, Ackermans MT, Fliers E, Kalsbeek A (2013) Central administration of an orexin receptor 1 antagonist prevents the stimulatory effect of Olanzapine on endogenous glucose production. Brain Res 1527:238–245CrossRefGoogle Scholar
  13. 13.
    Grundy SM, Cleeman JI, Daniels SR, Donato KA, Eckel RH, Franklin BA, Gordon DJ, Krauss RM, Savage PJ, Smith SC Jr et al (2005) Diagnosis and management of the metabolic syndrome: an American Heart Association/National Heart, Lung, and Blood Institute scientific statement. Circulation 112:2735–2752CrossRefGoogle Scholar
  14. 14.
    Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36:1171–1220MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Kralisch S, Klein J, Lossner U, Bluher M, Paschke R, Stumvoll M, Fasshauer M (2005) Isoproterenol, TNFalpha, and insulin downregulate adipose triglyceride lipase in 3T3-L1 adipocytes. Mol Cell Endocrinol 240:43–49CrossRefGoogle Scholar
  16. 16.
    Kwee LC, Liu D, Lin X, Ghosh D, Epstein MP (2008) A powerful and flexible multilocus association test for quantitative traits. Am J Hum Genet 82:386–397CrossRefGoogle Scholar
  17. 17.
    Lanckriet GRG, Cristianini N, Bartlett P, El Ghaoui L, Jordan M (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72MathSciNetzbMATHGoogle Scholar
  18. 18.
    Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34:816–834CrossRefGoogle Scholar
  19. 19.
    Liang KY, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Lieberman JA, Stroup TS, McEvoy JP, Swartz MS, Rosenheck RA, Perkins DO, Keef RSE, Davis SM, Davis CE, Lebowitz BD et al (2005) Effectiveness of antipsychotic drugs in patients with chronic schizophrenia. N Engl J Med 353:1209–1223CrossRefGoogle Scholar
  21. 21.
    Lin X (1997) Variance component testing in generalised linear models with random effects. Biometrika 84:309–326MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Lipsitz SR, Fitzmaurice GM, Ibrahim JG, Sinha D, Parzen M, Lipshultz S (2009) Joint generalized estimating equations for multivariate longitudinal binary outcomes with missing data: an application to acquired immune deficiency syndrome data. J R Stat Soc 172:3–20MathSciNetCrossRefGoogle Scholar
  23. 23.
    Liu D, Lin X, Ghosh D (2007) Semiparametric regression of multi-dimensional genetic pathway data: least squares kernel machines and linear mixed models. Biometrics 63:1077–1088zbMATHGoogle Scholar
  24. 24.
    Liu D, Ghosh D, Lin X (2008) Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC Bioinform 9:292CrossRefGoogle Scholar
  25. 25.
    Maity A, Sullivan PF, Tzeng JY (2012) Multivariate phenotype association analysis by marker-set kernel machine regressions. Genet Epidemiol 36:686–695CrossRefGoogle Scholar
  26. 26.
    McCartan C, Mason R, Jayasinghe SR, Griffiths LR (2012) Cardiomyopathy classification: ongoing debate in the genomics era. Biochem Res Int 2012:796926CrossRefGoogle Scholar
  27. 27.
    Miller M, Stone NJ, Ballantyne C, Bittner V, Criqui MH, Ginsberg HN, Goldberg AC, Howard WJ, Jacobson MS, Kris-Etherton PM et al (2011) Triglycerides and cardiovascular disease: a scientific statement from the American Heart Association. Circulation 123:2292–2333CrossRefGoogle Scholar
  28. 28.
    Nam D, Kim SY (2008) Gene-set approach for expression pattern analysis. Brief Bioinform 9:189–197CrossRefGoogle Scholar
  29. 29.
    Nelson MR, Wegmann D, Ehm MG, Kessner D, St Jean P, Verzilli C, Shen J, Tang Z, Bacanu SA, Fraser D (2012) An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337:100–114CrossRefGoogle Scholar
  30. 30.
    Pan KH, Lih CJ, Cohen SN (2005) Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. Proc Natl Acad Sci USA 102:8961–8965CrossRefGoogle Scholar
  31. 31.
    Shen Y, Zhao Y, Zheng D, Chang X, Ju S, Guo L (2013) Effects of orexin A on GLUT4 expression and lipid content via MAPK signaling in 3T3-L1 adipocytes. J Steroid Biochem Mol Biol 138:376–383CrossRefGoogle Scholar
  32. 32.
    Sikder D, Kodadek T (2007) The neurohormone orexin stimulates hypoxia-inducible factor-1 activity. Genes Dev 21:2995–3005CrossRefGoogle Scholar
  33. 33.
    Sullivan PF, Lin D, Tzeng JY, van den Oord E, Perkins D, Stroup TS, Wagner M, Lee S, Wright FA, Zou F et al (2008) Genomewide association for schizophrenia in the CATIE study: results of Stage 1. Mol Psychiatry 13:570–584CrossRefGoogle Scholar
  34. 34.
    Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific, SingaporeCrossRefzbMATHGoogle Scholar
  35. 35.
    Szafranski M, Grandvalet Y, Rakotomamonjy A (2010) Composite kernel learning. Mach Learn 79:73–103MathSciNetCrossRefGoogle Scholar
  36. 36.
    Tsuneki H, Wada T, Sasaoka T (2012) Role of orexin in the central regulation of glucose and energy homeostasis. Endocr J 59:365–374CrossRefGoogle Scholar
  37. 37.
    Vapnik VN (1998) Statistical learning theory. Wiley, New YorkzbMATHGoogle Scholar
  38. 38.
    Wang X, Lee S, Zhu X, Redline S, Lin X (2013) GEE-based SNP set association test for continuous and discrete traits in family based association studies. Genet Epidemiol 37:778–786CrossRefGoogle Scholar
  39. 39.
    Wortley KE, Chang GQ, Davydova Z, Leibowitz SF (2003) Peptides that regulate food intake: orexin gene expression is increased during states of hypertriglyceridemia. Am J Physiol 284:R1454–R1465Google Scholar
  40. 40.
    Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X (2010) Powerful SNP set analysis for case-control genome-wide association studies. Am J Hum Genet 86:929–942CrossRefGoogle Scholar
  41. 41.
    Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X (2011) Rare variant association testing for sequencing data using the sequence kernel association test (SKAT). Am J Hum Genet 89:82–93CrossRefGoogle Scholar
  42. 42.
    Wu M, Maity A, Lee S, Simmons EM, Harmon QE, Lin X, Engel S, Molldrem JJ, Armistead PM (2013) Kernel machine SNP-set testing under multiple candidate kernels. Genet Epidemiol 37:267–275CrossRefGoogle Scholar
  43. 43.
    Yan Q, Tiwari HK, Yi N, Gao G, Zhang K, Lin W, Lou XY, Cui X, Liu N (2015) A sequence kernel association test for dichotomous traits in family samples under a generalized linear mixed model. Hum Hered 79:60–68CrossRefGoogle Scholar
  44. 44.
    Yolken RH, Torrey EF, Lieberman JA, Yang S, Dickerson FB (2011) Serological evidence of exposure to herpes simplex virus Type 1 is associated with cognitive deficits in the CATIE schizophrenia sample. Schizophr Res 128:61–65CrossRefGoogle Scholar
  45. 45.
    Zhang D, Lin X (2003) Hypothesis testing in semiparametric additive mixed models. Biostatistics 4:57–74CrossRefzbMATHGoogle Scholar
  46. 46.
    Zhang Y, Xu z, Shen X, Pan W, Alzheimer’s Disease Neuroimaging Initiative (2014) Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. Neuroimage 96:309–325Google Scholar
  47. 47.
    Zhao Y, Chen F, Zhai R, Lin X, Diao N (2012) Association test based on SNP set: logistic kernel machine based test vs. principal component analysis. PLoS ONE 7:e44978CrossRefGoogle Scholar

Copyright information

© International Chinese Statistical Association 2017

Authors and Affiliations

  • Clemontina A. Davenport
    • 1
    Email author
  • Arnab Maity
    • 2
  • Patrick F. Sullivan
    • 3
  • Jung-Ying Tzeng
    • 4
    • 5
    • 6
  1. 1.Department of Biostatistics and BioinformaticsDuke University Medical CenterDurhamUSA
  2. 2.Department of StatisticsNorth Carolina State UniversityRaleighUSA
  3. 3.Department of GeneticsUniversity of North Carolina at Chapel HillChapel HillUSA
  4. 4.Department of Statistics, Bioinformatics Research CenterNorth Carolina State UniversityRaleighUSA
  5. 5.Department of StatisticsNational Cheng-Kung UniversityTainanTaiwan
  6. 6.Institute of Epidemiology and Preventive MedicineNational Taiwan UniversityTaipeiTaiwan

Personalised recommendations