Bootstrap-Based LASSO-Type Selection to Build Generalized Additive Partially Linear Models for High-Dimensional Data

  • Xiang LiuEmail author
  • Tian Chen
  • Yuanzhang Li
  • Hua LiangEmail author
Part of the ICSA Book Series in Statistics book series (ICSABSS)


Generalized additive partially linear model (GAPLM) is a flexible option to model the effects of covariates on the response by allowing nonlinear effects of some covariates and linear effects of the other covariates. To address the practical needs of applying GAPLM to high-dimensional data, we propose a procedure to select variables and therefore to build a GAPLM by using the bootstrap technique with the penalized regression. We demonstrate the proposed procedure by applying it to analyze data from a breast cancer study and an HIV study. The two examples show that the procedure is useful in practice. A simulation study also shows that the proposed procedure has a better performance of variable selection than the penalized regression.


Marginal Effect Subset Selection Coordinate Descent Real Data Analysis Stability Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Material has been reviewed by the Walter Reed Army Institute of Research. There is no objection to its presentation and/or publication. The opinions or assertions contained herein are the private views of the author, and are not to be construed as official, or as reflecting true views of the Department of the Army, the Agency for Healthcare Research and Quality, the Department of Defense or the Department of Health and Human Services. Liang’s research was partially supported by NSF grants DMS-1418042 and DMS-1620898, and by Award Number 11529101, made by National Natural Science Foundation of China.


  1. Bach, F. R. (2008). Bolasso: Model consistent Lasso estimation though the bootstrap. In Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML).Google Scholar
  2. Chatterjee, A., & Lahiri, S. N. (2011). Bootstrapping Lasso estimators. Journal of the American Statistical Association, 106(494), 608–625.MathSciNetCrossRefzbMATHGoogle Scholar
  3. Efron, B. (2014). Estimation and accuracy after model selection. Journal of the American Statistical Association, 109(507), 991–1007.MathSciNetCrossRefGoogle Scholar
  4. Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32, 407–499.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.MathSciNetCrossRefzbMATHGoogle Scholar
  6. Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.CrossRefGoogle Scholar
  7. Guo, P., Zeng, F., Hu, X., Zhang, D., Zhu, S., Deng, Y., et al. (2015). Improved variable selection algorithm using a Lasso-type penalty, with an application to assessing hepatitis b infection relevant factors in community residents. PLoS ONE, 10.Google Scholar
  8. Hall, P., Lee, E. R., & Park, B. U. (2009). Bootstrap-based penalty choice for the Lasso, achieving oracle performance. Statistica Sinica, 449–471.Google Scholar
  9. Härdle, W., Müller, M., Sperlich, S., & Werwatz, A. (2004). Nonparametric and semiparametric models. New York: Springer.CrossRefzbMATHGoogle Scholar
  10. McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London, New York: Chapman and Hall.CrossRefzbMATHGoogle Scholar
  11. Meier, L., & Bühlmann, P. (2007). Smoothing l1-penalized estimators for highdimensional time-course data. Electronic Journal of Statistics, 1, 597–615.MathSciNetCrossRefzbMATHGoogle Scholar
  12. Meier, L., Geer, S. V. D., & Bhlmann, P. (2008). The group Lasso for logistic regression. Journal of the Royal Statistical Society, Series B, 70(1), 53–71.MathSciNetCrossRefzbMATHGoogle Scholar
  13. Meinshausen, N., & Bühlmann, P. (2006). High dimensional graphs and variable selection with the Lasso. Annals of Statsitics, 34(3), 1436–1462.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Meinshausen, N., & Bühlmann, P. (2010). Stability selection. Journal of Royal Statistical Society, Series B, 72(4), 417–473.MathSciNetCrossRefGoogle Scholar
  15. Shah, R. D., & Samworth, R. J. (2013). Variable selection with error control: Another look at stability selection. Journal of the Royal Statistical Society Series B, 75(1), 55–80.MathSciNetCrossRefGoogle Scholar
  16. Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). A sparse-group Lasso. Journal of Computational and Graphical Statistics, 22(2), 231–245.MathSciNetCrossRefGoogle Scholar
  17. Stevens, K. N., Fredericksen, Z., Vachon, C. M., Wang, X., Margolin, S., Lindblom, A., et al. (2012). 19p13.1 is a triple-negative-specific breast cancer susceptibility locus. Cancer Research, 72(7), 1795–1803.CrossRefGoogle Scholar
  18. Strobl, R., Grill, E., & Mansmann, U. (2012). Graphical modeling of binary data using the Lasso: A simulation study. BMC Medical Research Methodology, 12(16).Google Scholar
  19. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.MathSciNetzbMATHGoogle Scholar
  20. van’t Veer, L. J., Dai, H. Y., et al. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536.Google Scholar
  21. Wang, L., Liu, X., Liang, H., & Carroll, R. (2011). Estimation and variable selection for generalized additive partial linear models. The Annals of Statistics, 39, 1827–1851.MathSciNetCrossRefzbMATHGoogle Scholar
  22. Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68, 49–67.MathSciNetCrossRefzbMATHGoogle Scholar
  23. Yuan, M., & Lin, Y. (2007). On the non-negative garrotte estimator. Journal of the Royal Statistical Society: Series B, 69(2), 143–161.MathSciNetCrossRefzbMATHGoogle Scholar
  24. Zhao, P., & Yu, B. (2006). On model selection consistency of Lasso. Journal of Machine Learning Research, 7, 2541–2563.MathSciNetzbMATHGoogle Scholar
  25. Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.MathSciNetCrossRefzbMATHGoogle Scholar
  26. Zou, H. (2008). A note on path-based variable selection in the penalized proportional hazards model. Biometrika, 95, 241–247.MathSciNetCrossRefzbMATHGoogle Scholar
  27. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301–320.MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  1. 1.Health Informatics InstituteUniversity of South FloridaTampaUSA
  2. 2.Department of Mathematics and StatisticsUniversity of ToledoToledoUSA
  3. 3.Division of Preventive MedicineWalter Reed Army Institute of ResearchSilver SpringUSA
  4. 4.Department of StatisticsGeorge Washington UniversityWashingtonUSA

Personalised recommendations