Semiparametric Kernel-Based Regression for Evaluating Interaction Between Pathway Effect and Covariate

  • Zaili Fang
  • Inyoung Kim
  • Jeesun Jung


Pathway-based analysis has the ability to detect subtle changes in response variables that could be missed when using gene-based analysis. Since genes interact with other covariates such as environmental or clinical variables, so do pathways, which are sets of genes that serve particular cellular or physiological functions. However, since pathways are sets of genes and since environmental or clinical variables do not have parametric relationships with response variables, it is difficult to model unknown interaction terms between high-dimensional variables and low-dimensional variables as environmental or clinical variables. In this paper, we propose a semiparametric interaction model for two unknown functions to evaluate the interaction between a pathway and environmental or clinical variable: for the pathway, we use an unknown high-dimensional function, and for environmental or clinical variable, we use an unknown low-dimensional function. We model the environmental or clinical variable nonparametrically via a natural cubic spline. We model both the pathway effect and the interaction between the pathway and environmental or clinical effect nonparametrically via a kernel machine. Since both interactions among genes within the same pathway and the interaction between the pathway and the environmental or clinical variables are complex, we allow for the possibility that a pathway is interacting with environmental or clinical variables and the genes within the same pathway are interacting with each other. We illustrate our approach using simulated data and genetic pathway data for type II diabetes. Supplementary materials accompanying this paper appear online.


Gaussian random process Kernel machine Pathway analysis Semiparametric model Smoothing splines 



This study was supported in part by the National Science Foundation Grant Number 0964680.

Supplementary material

13253_2017_317_MOESM1_ESM.pdf (470 kb)
Supplementary material 1 (pdf 469 KB)


  1. Aronszajn, N. (1950). Theory of Reproducing Kernels, Transactions of the American Mathmatical Society, 68, 337–404.MathSciNetCrossRefMATHGoogle Scholar
  2. Cheng, L, Kim, I., and Pang, H. (2016). Bayesian semiparametric model for pathway-based analysis with zero-inflated clinical outcomes. Journal of Agricultural, Biological, and Environmental Statistics, 21, 641–662.MathSciNetCrossRefMATHGoogle Scholar
  3. Claeskens, G. (2004). Restricted Likelihood Ratio Lack-of-fit Tests Using Mixed Spline Models. Journal of the Royal Statistical Society, Series B, 66, 909–926.MathSciNetCrossRefMATHGoogle Scholar
  4. Crainiceanu, C., Ruppert, D., Claeskens, G., and Wand, M. P. (2005). Exact Likelihood Ratio Tests for Penalized Splines. Biometrika, 92, 91–103.Google Scholar
  5. Czyzyk, A., Lao, B., Orowska, K., Szczepanik, Z., and Bartosiewicz, W. (1989). Effect of Antidiabetics on Post-exercise Alaninemia in Patients with Non-insulin-dependent Diabetes Mellitus (Type 2). Polskie Archiwum Medycyny Wewntrznej, 81, 193–206.Google Scholar
  6. Fang, Z., Kim, I., and Schaumont, P. (2016). Flexible variable selection for recovering sparsity in nonadditive nonparametric models. Biometrics, 72, 1155–1163.MathSciNetCrossRefMATHGoogle Scholar
  7. Franconi, F., Loizzo, A., Ghirlanda, G., and Seghieri, G. (2006). Taurine Supplementation and Diabetes Mellitus. Current Opinion in Clinical Nutrition & Metabolic Care, 9, 32–36.CrossRefGoogle Scholar
  8. Goeman, J. J., van de Geer, S. A., de Kort F., and van Houwelingen, H. C. (2004). A Global Test for Groups of Genes: Testing Association with a Clinical Outcome. Bioinformatics, 20, 93–99.Google Scholar
  9. Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. London: Chapman and Hall.Google Scholar
  10. Gu, C. and Wahba, G. (1993). Semiparametric Analysis of Variance with Tensor Product Thin Plate. Journal of the Royal Statistical Society, Series B, 55, 353–368.MathSciNetMATHGoogle Scholar
  11. Juretić, D., Krajnović, V., and Lukac-Bajalo, J. (2002). Altered Distribution of Urinary Glycosaminoglycans in Diabetic Subjects. Acta Diabetologica, 39, 123–128.CrossRefGoogle Scholar
  12. Kim, I., Pang, H., and Zhao, H. (2012). Bayesian Semiparametric Regression Models for Evaluating Pathway Effects on Continuous and Binary Clinical Outcomes. Statistics in Medicine, 31, 1633–1651.MathSciNetCrossRefGoogle Scholar
  13. — (2013). Statistical Properties on Semiparametric Regression for Evaluating Pathway Effects. Journal of statistical planning and inference, 143, 745–763.Google Scholar
  14. Kimeldorf, G. and Wahba, G. (1971). Some Results on Tchebychefian Spline Functions. Journal of Mathematical Analysis and Applications, 33, 82–95.MathSciNetCrossRefMATHGoogle Scholar
  15. Kwee L. C., Liu, D., Lin, X., Ghosh, D., and Epstein, M. P. (2008). A powerful and flexible multilocus association test for quantitative traits. American Journal of Human Genetics, 82, 386–397.CrossRefGoogle Scholar
  16. Lin, X. (1997). Variance Component Testing in Generalized Linear Models with Random Effects. Biometrika, 84, 309–326.Google Scholar
  17. Liu, D., Ghosh, D., and Lin, X. (2008). Estimation and Testing for the Effect of a Genetic Pathway on a Disease Outcome Using Logistic Kernel Machine Regression via Logistic Mixed Models. BMC Bioinformatics, 9, 292.CrossRefGoogle Scholar
  18. Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric Regression of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines and Linear Mixed Models. Biometrics, 63, 1079–1088.Google Scholar
  19. MacKay, D. J. C. (1998). Introducing to Gaussian Process. In Bishop, C. M., editor, Neural Networks and Machine Learning. New York: Springer-Verlag.Google Scholar
  20. Maity, A. and Lin, X. (2011). Powerful tests for detecting a gene effect in the presences of possible gene-gene interactions using garrote kernel machines. Biometrics, 67, 1271–1284.Google Scholar
  21. Misu, H., Takamura, T., Matsuzawa, N., Shimizu, A., Ota, T., Sakurai, M., Ando, H., Arai, K., Yamashita, T., Honda, M., Yamashita, T., and Kaneko, S. (2007). Genes Involved in Oxidative Phosphorylation are Coordinately Upregulated with Fasting Hyperglycaemia in Livers of Patients with Type 2 Diabetes. Diabetologia, 50, 268–277.CrossRefGoogle Scholar
  22. Mootha, V. K., Lindgren, C. M., Eriksson, K., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., Houstis, N., Daly, M. J., Patterson, N., Mesirov, J. P., Golub, T. R., Tamayo, P., Spiegelman, B., Lander, E. S., Hirschhorn, J. N., Altshuler, D., and Groop, L. C. (2003). PGC-l alpha-Responsive Genes Involved in Oxidative Phosphorylation are Coordinately Downregulated in Human Diabetes. Nature Genetics, 34, 267–273.CrossRefGoogle Scholar
  23. Pang, H., Lin, A., Holford, M., Enerson, B., Lu, B., Lawton, M. P., Floyd, E., and Zhao, H. (2006). Pathway Analysis Using Random Forests Classification and Regression. Bioinformatics, 22, 2028–2036.Google Scholar
  24. Pang, H., Kim, I., and Zhao, H. (2014). Random Effect Model for Multiple Pathway Analysis with Applications to Type II Diabetes Microarray Data. Statistics in Bioscience,
  25. Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Process for Machine Learning. Cambridge: MIT Press.MATHGoogle Scholar
  26. Searle, S. R., Casella, G., and McCulloch, C. E. (1992). Variance Components. New York: Wiley.CrossRefMATHGoogle Scholar
  27. Simon, R., Marks, V., Leeds, A., and Anderson, J. (2011). A Comprehensive Review of Oral Glucosamine Use and Effects on Glucose Metabolism in Normal and Diabetic Individuals. Diabetes Metabolism Research and Reviews, 27, 14–27CrossRefGoogle Scholar
  28. Storey, J. D. (2002). A Direct Approach to False Discovery Rates. Journal of the Royal Statistical Society, Series B, 64, 479–498.MathSciNetCrossRefMATHGoogle Scholar
  29. Vu, H. T. V. and Zhou, S. (1997). Generalization of Likelihood Ration Tests under Nonstandard Conditions. Annals of Statistics, 25, 897–916.Google Scholar
  30. Wahba, G. (1990). Spline Models for Observational Data. Philadelphia: Society for Industrial and Applied Mathematics.Google Scholar
  31. Wang, Z., Maity, A., Luo, Y., Neely, M., and Tzeng, J. Y. (2015). Complete effect-profile assessment in association studies with multiple genetic and multiple environmental factors. Genetics Epidemiology, 39, 122–133.CrossRefGoogle Scholar
  32. Zhang, D. and Lin, X. (2003). Hypothesis Testing in Semiparametric Additive Mixed Models. Biostatistics, 4, 57–74.CrossRefMATHGoogle Scholar

Copyright information

© International Biometric Society 2017

Authors and Affiliations

  1. 1.Department of StatisticsVirginia Polytechnic Institute and State UniversityBlacksburgUSA
  2. 2.Lab of Epidemiology and Biometry, National Institute on Alcohol Abuse and AlcoholismNational Institutes of HealthBethesdaUSA

Personalised recommendations