Bootstrap-Based LASSO-Type Selection to Build Generalized Additive Partially Linear Models for High-Dimensional Data
Generalized additive partially linear model (GAPLM) is a flexible option to model the effects of covariates on the response by allowing nonlinear effects of some covariates and linear effects of the other covariates. To address the practical needs of applying GAPLM to high-dimensional data, we propose a procedure to select variables and therefore to build a GAPLM by using the bootstrap technique with the penalized regression. We demonstrate the proposed procedure by applying it to analyze data from a breast cancer study and an HIV study. The two examples show that the procedure is useful in practice. A simulation study also shows that the proposed procedure has a better performance of variable selection than the penalized regression.
KeywordsMarginal Effect Subset Selection Coordinate Descent Real Data Analysis Stability Selection
Material has been reviewed by the Walter Reed Army Institute of Research. There is no objection to its presentation and/or publication. The opinions or assertions contained herein are the private views of the author, and are not to be construed as official, or as reflecting true views of the Department of the Army, the Agency for Healthcare Research and Quality, the Department of Defense or the Department of Health and Human Services. Liang’s research was partially supported by NSF grants DMS-1418042 and DMS-1620898, and by Award Number 11529101, made by National Natural Science Foundation of China.
- Bach, F. R. (2008). Bolasso: Model consistent Lasso estimation though the bootstrap. In Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML).Google Scholar
- Guo, P., Zeng, F., Hu, X., Zhang, D., Zhu, S., Deng, Y., et al. (2015). Improved variable selection algorithm using a Lasso-type penalty, with an application to assessing hepatitis b infection relevant factors in community residents. PLoS ONE, 10.Google Scholar
- Hall, P., Lee, E. R., & Park, B. U. (2009). Bootstrap-based penalty choice for the Lasso, achieving oracle performance. Statistica Sinica, 449–471.Google Scholar
- Strobl, R., Grill, E., & Mansmann, U. (2012). Graphical modeling of binary data using the Lasso: A simulation study. BMC Medical Research Methodology, 12(16).Google Scholar
- van’t Veer, L. J., Dai, H. Y., et al. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536.Google Scholar