High-Dimensional Models: Structuring and Selection of Predictors

Tutz, Gerhard; Schmid, Matthias

doi:10.1007/978-3-319-28158-2_7

Gerhard Tutz⁸ &
Matthias Schmid⁹

Part of the book series: Springer Series in Statistics ((SSS))

3517 Accesses

Abstract

In this chapter we consider strategies to select the relevant variables in cases where many explanatory variables are available. The aim is to select the relevant ones in order to obtain a reduced time-to-event model that is easier to interpret than a big model with a multitude of covariates. In addition to interpretability issues, variable selection typically improves prediction accuracy, which is known to suffer if many irrelevant variables are included in the model. This is particularly important in regard to model fitting in high-dimensional situations where the number of predictors exceeds the number of observations. In this chapter we consider regularization methods, which have become a state-of-the art tool for variable selection especially in high-dimensional settings. In Sect. 7.1 we present penalized regression techniques for discrete survival models, which filter out irrelevant variables by imposing penalties on the respective covariate effects during maximum likelihood estimation. In Sect. 7.2 we consider gradient boosting techniques, which not only originate in the machine learning field but also serve as a regularization method to carry out variable selection and model choice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barrier, A., Boelle, P.-Y., Roser, F., Gregg, J., Tse, C., Brault, D., et al. (2006). Stage II colon cancer prognosis prediction by tumor gene expression profiling. Journal of Clinical Oncology, 24, 4685–4691.
Article Google Scholar
Bender, H., & Schumacher, M. (2008). Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics, 9, 14.
Article Google Scholar
Boulesteix, A.-L., & Hothorn, T. (2010). Testing the additional predictive value of high-dimensional data. BMC Bioinformatics, 11, 78.
Article Google Scholar
Boulesteix, A.-L., & Sauerbrei, W. (2011). Added predictive value of high-throughput molecular data to clinical data and its validation. Briefings in Bioinformatics, 12, 215–229.
Article Google Scholar
Breheny, P. (2015). grpreg: Regularization paths for regression models with grouped covariates. R package version 2.8-1. http://cran.r-project.org/web/packages/grpreg/index.html
Google Scholar
Bühlmann, P. (2006). Boosting for high-dimensional linear models. Annals of Statistics, 34, 559–583.
Article MathSciNet MATH Google Scholar
Bühlmann, P., Gertheiss, J., Hieke, S., Kneib, T., Ma, S., Schumacher, M., et al. (2014). Discussion of “The evolution of boosting algorithms” and “Extending statistical boosting”. Methods of Information in Medicine, 53, 436–445.
Article Google Scholar
Bühlmann, P., & Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting (with discussion). Statistical Science, 22, 477–505.
Article MathSciNet MATH Google Scholar
Bühlmann, P., & Yu, B. (2003). Boosting with the L2 loss: Regression and classification. Journal of the American Statistical Association, 98, 324–339.
Article MathSciNet MATH Google Scholar
Candes, E., & Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics, 35, 2313–2351.
Article MathSciNet MATH Google Scholar
Cantoni, E., Flemming, J. M., & Ronchetti, E. (2011). Variable selection in additive models by non-negative garrote. Statistical modelling, 11, 237–252.
Article MathSciNet Google Scholar
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Article MathSciNet MATH Google Scholar
Frank, I. E., & Friedman, J. H. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics, 35, 109–148.
Article MATH Google Scholar
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Medicine Learning (pp. 148–156). San Francisco: Morgan Kaufmann.
Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2015). glmnet: Lasso and elastic-net regularized generalized linear models. R package version 2.0-2. http://cran.r-project.org/web/packages/glmnet/
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189–1232.
Article MathSciNet MATH Google Scholar
Friedman, J. H., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28, 337–407.
Article MathSciNet MATH Google Scholar
Goeman, J., Meijer, R., & Chaturvedi, N. (2014). penalized: L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model. R package version 0.9-45. http://cran.r-project.org/web/packages/penalized/index.html
Hofner, B., Mayr, A., Robinzonov, N., & Schmid, M. (2014). Model-based boosting in R: A hands-on tutorial using the R package mboost. Computational Statistics, 29, 3–35.
Article MathSciNet MATH Google Scholar
Hothorn, T., Bühlmann, P., Kneib, T., Schmid, M., & Hofner, B. (2015). mboost: Model-based boosting. R package version 2.5-0. http://cran.r-project.org/web/packages/mboost/
James, G. M., & Radchenko, P. (2009). A generalized Dantzig selector with shrinkage tuning. Biometrika, 96, 323–337.
Article MathSciNet MATH Google Scholar
Marra, G., & Wood, S. N. (2011). Practical variable selection for generalized additive models. Computational Statistics & Data Analysis, 55, 2372–2387.
Article MathSciNet MATH Google Scholar
Mayr, A., Binder, H., Gefeller, O., & Schmid, M. (2014a). The evolution of boosting algorithms (with discussion). Methods of Information in Medicine, 53, 419–427.
Article Google Scholar
Mayr, A., Binder, H., Gefeller, O., & Schmid, M. (2014b). Extending statistical boosting (with discussion). Methods of Information in Medicine, 53, 428–435.
Article Google Scholar
Mayr, A., & Schmid, M. (2014). Boosting the concordance index for survival data – a unified framework to derive and evaluate biomarker combinations. PLoS One, 9(1), e84483.
Article MathSciNet Google Scholar
Meier, L. (2015). grplasso: Fitting user specified models with Group Lasso penalty. R package version 0.4-5. http://cran.r-project.org/web/packages/grplasso/index.html
Meier, L., van de Geer, S., & Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society, Series B, 70, 53–71.
Article MathSciNet MATH Google Scholar
Schmid, M., & Hothorn, T. (2008). Boosting additive models using component-wise P-splines. Computational Statistics & Data Analysis, 53, 298–311.
Article MathSciNet MATH Google Scholar
Schmid, M., Hothorn, T., Maloney, K. O., Weller, D. E., & Potapov, S. (2011). Geoadditive regression modeling of stream biological condition. Environmental and Ecological Statistics, 18, 709–733.
Article MathSciNet Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.
MathSciNet MATH Google Scholar
Tutz, G., & Binder, H. (2006). Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biometrics, 62, 961–971.
Article MathSciNet MATH Google Scholar
van de Vijver, M. J., He, Y. D., van’t Veer, L. J., Dai, H., Hart, A. A. M., Voskuil, D. W., et al. (2002). A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine, 347, 1999–2009.
Article Google Scholar
Wang, H., & Leng, C. (2008). A note on adaptive group lasso. Computational Statistics & Data Analysis, 52, 5277–5286.
Article MathSciNet MATH Google Scholar
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68, 49–67.
Article MathSciNet MATH Google Scholar
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.
Article MathSciNet MATH Google Scholar
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301–320.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

LMU Munich, Munich, Germany
Gerhard Tutz
University of Bonn, Bonn, Germany
Matthias Schmid

Authors

Gerhard Tutz
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Schmid
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tutz, G., Schmid, M. (2016). High-Dimensional Models: Structuring and Selection of Predictors. In: Modeling Discrete Time-to-Event Data. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-28158-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-28158-2_7
Published: 15 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28156-8
Online ISBN: 978-3-319-28158-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics