Abstract
Pathway-based analysis has the ability to detect subtle changes in response variables that could be missed when using gene-based analysis. Since genes interact with other covariates such as environmental or clinical variables, so do pathways, which are sets of genes that serve particular cellular or physiological functions. However, since pathways are sets of genes and since environmental or clinical variables do not have parametric relationships with response variables, it is difficult to model unknown interaction terms between high-dimensional variables and low-dimensional variables as environmental or clinical variables. In this paper, we propose a semiparametric interaction model for two unknown functions to evaluate the interaction between a pathway and environmental or clinical variable: for the pathway, we use an unknown high-dimensional function, and for environmental or clinical variable, we use an unknown low-dimensional function. We model the environmental or clinical variable nonparametrically via a natural cubic spline. We model both the pathway effect and the interaction between the pathway and environmental or clinical effect nonparametrically via a kernel machine. Since both interactions among genes within the same pathway and the interaction between the pathway and the environmental or clinical variables are complex, we allow for the possibility that a pathway is interacting with environmental or clinical variables and the genes within the same pathway are interacting with each other. We illustrate our approach using simulated data and genetic pathway data for type II diabetes. Supplementary materials accompanying this paper appear online.
Similar content being viewed by others
References
Aronszajn, N. (1950). Theory of Reproducing Kernels, Transactions of the American Mathmatical Society, 68, 337–404.
Cheng, L, Kim, I., and Pang, H. (2016). Bayesian semiparametric model for pathway-based analysis with zero-inflated clinical outcomes. Journal of Agricultural, Biological, and Environmental Statistics, 21, 641–662.
Claeskens, G. (2004). Restricted Likelihood Ratio Lack-of-fit Tests Using Mixed Spline Models. Journal of the Royal Statistical Society, Series B, 66, 909–926.
Crainiceanu, C., Ruppert, D., Claeskens, G., and Wand, M. P. (2005). Exact Likelihood Ratio Tests for Penalized Splines. Biometrika, 92, 91–103.
Czyzyk, A., Lao, B., Orowska, K., Szczepanik, Z., and Bartosiewicz, W. (1989). Effect of Antidiabetics on Post-exercise Alaninemia in Patients with Non-insulin-dependent Diabetes Mellitus (Type 2). Polskie Archiwum Medycyny Wewntrznej, 81, 193–206.
Fang, Z., Kim, I., and Schaumont, P. (2016). Flexible variable selection for recovering sparsity in nonadditive nonparametric models. Biometrics, 72, 1155–1163.
Franconi, F., Loizzo, A., Ghirlanda, G., and Seghieri, G. (2006). Taurine Supplementation and Diabetes Mellitus. Current Opinion in Clinical Nutrition & Metabolic Care, 9, 32–36.
Goeman, J. J., van de Geer, S. A., de Kort F., and van Houwelingen, H. C. (2004). A Global Test for Groups of Genes: Testing Association with a Clinical Outcome. Bioinformatics, 20, 93–99.
Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. London: Chapman and Hall.
Gu, C. and Wahba, G. (1993). Semiparametric Analysis of Variance with Tensor Product Thin Plate. Journal of the Royal Statistical Society, Series B, 55, 353–368.
Juretić, D., Krajnović, V., and Lukac-Bajalo, J. (2002). Altered Distribution of Urinary Glycosaminoglycans in Diabetic Subjects. Acta Diabetologica, 39, 123–128.
Kim, I., Pang, H., and Zhao, H. (2012). Bayesian Semiparametric Regression Models for Evaluating Pathway Effects on Continuous and Binary Clinical Outcomes. Statistics in Medicine, 31, 1633–1651.
— (2013). Statistical Properties on Semiparametric Regression for Evaluating Pathway Effects. Journal of statistical planning and inference, 143, 745–763.
Kimeldorf, G. and Wahba, G. (1971). Some Results on Tchebychefian Spline Functions. Journal of Mathematical Analysis and Applications, 33, 82–95.
Kwee L. C., Liu, D., Lin, X., Ghosh, D., and Epstein, M. P. (2008). A powerful and flexible multilocus association test for quantitative traits. American Journal of Human Genetics, 82, 386–397.
Lin, X. (1997). Variance Component Testing in Generalized Linear Models with Random Effects. Biometrika, 84, 309–326.
Liu, D., Ghosh, D., and Lin, X. (2008). Estimation and Testing for the Effect of a Genetic Pathway on a Disease Outcome Using Logistic Kernel Machine Regression via Logistic Mixed Models. BMC Bioinformatics, 9, 292.
Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric Regression of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines and Linear Mixed Models. Biometrics, 63, 1079–1088.
MacKay, D. J. C. (1998). Introducing to Gaussian Process. In Bishop, C. M., editor, Neural Networks and Machine Learning. New York: Springer-Verlag.
Maity, A. and Lin, X. (2011). Powerful tests for detecting a gene effect in the presences of possible gene-gene interactions using garrote kernel machines. Biometrics, 67, 1271–1284.
Misu, H., Takamura, T., Matsuzawa, N., Shimizu, A., Ota, T., Sakurai, M., Ando, H., Arai, K., Yamashita, T., Honda, M., Yamashita, T., and Kaneko, S. (2007). Genes Involved in Oxidative Phosphorylation are Coordinately Upregulated with Fasting Hyperglycaemia in Livers of Patients with Type 2 Diabetes. Diabetologia, 50, 268–277.
Mootha, V. K., Lindgren, C. M., Eriksson, K., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., Houstis, N., Daly, M. J., Patterson, N., Mesirov, J. P., Golub, T. R., Tamayo, P., Spiegelman, B., Lander, E. S., Hirschhorn, J. N., Altshuler, D., and Groop, L. C. (2003). PGC-l alpha-Responsive Genes Involved in Oxidative Phosphorylation are Coordinately Downregulated in Human Diabetes. Nature Genetics, 34, 267–273.
Pang, H., Lin, A., Holford, M., Enerson, B., Lu, B., Lawton, M. P., Floyd, E., and Zhao, H. (2006). Pathway Analysis Using Random Forests Classification and Regression. Bioinformatics, 22, 2028–2036.
Pang, H., Kim, I., and Zhao, H. (2014). Random Effect Model for Multiple Pathway Analysis with Applications to Type II Diabetes Microarray Data. Statistics in Bioscience, https://doi.org/10.1007/s12561-014-9109-1.
Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Process for Machine Learning. Cambridge: MIT Press.
Searle, S. R., Casella, G., and McCulloch, C. E. (1992). Variance Components. New York: Wiley.
Simon, R., Marks, V., Leeds, A., and Anderson, J. (2011). A Comprehensive Review of Oral Glucosamine Use and Effects on Glucose Metabolism in Normal and Diabetic Individuals. Diabetes Metabolism Research and Reviews, 27, 14–27
Storey, J. D. (2002). A Direct Approach to False Discovery Rates. Journal of the Royal Statistical Society, Series B, 64, 479–498.
Vu, H. T. V. and Zhou, S. (1997). Generalization of Likelihood Ration Tests under Nonstandard Conditions. Annals of Statistics, 25, 897–916.
Wahba, G. (1990). Spline Models for Observational Data. Philadelphia: Society for Industrial and Applied Mathematics.
Wang, Z., Maity, A., Luo, Y., Neely, M., and Tzeng, J. Y. (2015). Complete effect-profile assessment in association studies with multiple genetic and multiple environmental factors. Genetics Epidemiology, 39, 122–133.
Zhang, D. and Lin, X. (2003). Hypothesis Testing in Semiparametric Additive Mixed Models. Biostatistics, 4, 57–74.
Acknowledgements
This study was supported in part by the National Science Foundation Grant Number 0964680.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Fang, Z., Kim, I. & Jung, J. Semiparametric Kernel-Based Regression for Evaluating Interaction Between Pathway Effect and Covariate. JABES 23, 129–152 (2018). https://doi.org/10.1007/s13253-017-0317-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-017-0317-2