Semiparametric Kernel-Based Regression for Evaluating Interaction Between Pathway Effect and Covariate
Pathway-based analysis has the ability to detect subtle changes in response variables that could be missed when using gene-based analysis. Since genes interact with other covariates such as environmental or clinical variables, so do pathways, which are sets of genes that serve particular cellular or physiological functions. However, since pathways are sets of genes and since environmental or clinical variables do not have parametric relationships with response variables, it is difficult to model unknown interaction terms between high-dimensional variables and low-dimensional variables as environmental or clinical variables. In this paper, we propose a semiparametric interaction model for two unknown functions to evaluate the interaction between a pathway and environmental or clinical variable: for the pathway, we use an unknown high-dimensional function, and for environmental or clinical variable, we use an unknown low-dimensional function. We model the environmental or clinical variable nonparametrically via a natural cubic spline. We model both the pathway effect and the interaction between the pathway and environmental or clinical effect nonparametrically via a kernel machine. Since both interactions among genes within the same pathway and the interaction between the pathway and the environmental or clinical variables are complex, we allow for the possibility that a pathway is interacting with environmental or clinical variables and the genes within the same pathway are interacting with each other. We illustrate our approach using simulated data and genetic pathway data for type II diabetes. Supplementary materials accompanying this paper appear online.
KeywordsGaussian random process Kernel machine Pathway analysis Semiparametric model Smoothing splines
This study was supported in part by the National Science Foundation Grant Number 0964680.
- Crainiceanu, C., Ruppert, D., Claeskens, G., and Wand, M. P. (2005). Exact Likelihood Ratio Tests for Penalized Splines. Biometrika, 92, 91–103.Google Scholar
- Czyzyk, A., Lao, B., Orowska, K., Szczepanik, Z., and Bartosiewicz, W. (1989). Effect of Antidiabetics on Post-exercise Alaninemia in Patients with Non-insulin-dependent Diabetes Mellitus (Type 2). Polskie Archiwum Medycyny Wewntrznej, 81, 193–206.Google Scholar
- Goeman, J. J., van de Geer, S. A., de Kort F., and van Houwelingen, H. C. (2004). A Global Test for Groups of Genes: Testing Association with a Clinical Outcome. Bioinformatics, 20, 93–99.Google Scholar
- Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. London: Chapman and Hall.Google Scholar
- — (2013). Statistical Properties on Semiparametric Regression for Evaluating Pathway Effects. Journal of statistical planning and inference, 143, 745–763.Google Scholar
- Lin, X. (1997). Variance Component Testing in Generalized Linear Models with Random Effects. Biometrika, 84, 309–326.Google Scholar
- Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric Regression of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines and Linear Mixed Models. Biometrics, 63, 1079–1088.Google Scholar
- MacKay, D. J. C. (1998). Introducing to Gaussian Process. In Bishop, C. M., editor, Neural Networks and Machine Learning. New York: Springer-Verlag.Google Scholar
- Maity, A. and Lin, X. (2011). Powerful tests for detecting a gene effect in the presences of possible gene-gene interactions using garrote kernel machines. Biometrics, 67, 1271–1284.Google Scholar
- Misu, H., Takamura, T., Matsuzawa, N., Shimizu, A., Ota, T., Sakurai, M., Ando, H., Arai, K., Yamashita, T., Honda, M., Yamashita, T., and Kaneko, S. (2007). Genes Involved in Oxidative Phosphorylation are Coordinately Upregulated with Fasting Hyperglycaemia in Livers of Patients with Type 2 Diabetes. Diabetologia, 50, 268–277.CrossRefGoogle Scholar
- Mootha, V. K., Lindgren, C. M., Eriksson, K., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., Houstis, N., Daly, M. J., Patterson, N., Mesirov, J. P., Golub, T. R., Tamayo, P., Spiegelman, B., Lander, E. S., Hirschhorn, J. N., Altshuler, D., and Groop, L. C. (2003). PGC-l alpha-Responsive Genes Involved in Oxidative Phosphorylation are Coordinately Downregulated in Human Diabetes. Nature Genetics, 34, 267–273.CrossRefGoogle Scholar
- Pang, H., Lin, A., Holford, M., Enerson, B., Lu, B., Lawton, M. P., Floyd, E., and Zhao, H. (2006). Pathway Analysis Using Random Forests Classification and Regression. Bioinformatics, 22, 2028–2036.Google Scholar
- Pang, H., Kim, I., and Zhao, H. (2014). Random Effect Model for Multiple Pathway Analysis with Applications to Type II Diabetes Microarray Data. Statistics in Bioscience, https://doi.org/10.1007/s12561-014-9109-1.
- Vu, H. T. V. and Zhou, S. (1997). Generalization of Likelihood Ration Tests under Nonstandard Conditions. Annals of Statistics, 25, 897–916.Google Scholar
- Wahba, G. (1990). Spline Models for Observational Data. Philadelphia: Society for Industrial and Applied Mathematics.Google Scholar