Efficient Exploration of Many Variables and Interactions Using Regularized Regression

Barrett, Tyson S.; Lockhart, Ginger

doi:10.1007/s11121-018-0963-9

Efficient Exploration of Many Variables and Interactions Using Regularized Regression

Published: 30 November 2018

Volume 20, pages 575–584, (2019)
Cite this article

Prevention Science Aims and scope Submit manuscript

402 Accesses
5 Citations
Explore all metrics

Abstract

The prevention sciences often face several situations that can compromise the statistical power and validity of a study. Among these, research can (1) have data with many variables, sometimes with low sample sizes, (2) have highly correlated predictors, (3) have unclear theory or empirical evidence related to the research questions, and/or (4) have difficulty selecting the proper covariates in observational studies. Modeling in these situations is difficult—and at times impossible—with conventional methods. Fortunately, regularized regression—a machine learning technique—can aid in exploring datasets that are otherwise difficult to analyze, allowing researchers to draw insights from these data. Although many of these methods have existed for several decades, prevention researchers rarely use them. As a gentle introduction, we discuss the utility of regularized regression to the field of prevention science and apply the technique to a real dataset. The data (n = 7979) for the demonstration consisted of 76 variables (151 including the modeled interactions) from the Youth Risk-Behavior Surveillance System (YRBSS) from 2015. Overall, it is clear that regularized regression can be an important tool in analyzing and gaining insight from data in the prevention sciences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

New Frontiers in Prevention Research Models: Commentary on the Special Issue

Article 23 February 2023

Phillip K. Wood

Using Bayesian Meta-Regression to Advance Prevention Science Research: an Introduction and Empirical Illustration

Article 22 March 2022

Christopher G. Thompson, Brandie Semma, … Idean Ettekal

Meta-analysis with Robust Variance Estimation: Expanding the Range of Working Models

Article 07 May 2021

James E Pustejovsky & Elizabeth Tipton

Notes

Only reporting and discussing significant relationships when many were tested and incurs a high type-I error rate.
It is critical to note that when using the term “important” in this context, it refers to importance in the model’s predictive accuracy. In other words, a variable is important if, in conjunction with the other variables in the model, it is useful in predicting the outcome accurately. This is distinct from the usual discussion of significance and effect size in convention statistics.
All estimates from the elastic net and unbiased models are included in the Supplementary Table.

References

2015 YRBS Data User’s Guide. (2016). https://doi.org/10.1016/j.jadohealth.2016.03.017.
Bécu, J.-M., Grandvalet, Y., Ambroise, C., & Dalmasso, C. (2015). Beyond support in two-stage variable Selection, 1–25. Retrieved from http://arxiv.org/abs/1505.07281. Accessed May 2017
Belloni, A., Chernozhukov, V., & Hansen, C. (2013). Inference on treatment effects after selection among high-dimensional controls. Review of Economic Studies, 81, 608–650. https://doi.org/10.1093/restud/rdt044.
Article Google Scholar
Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25, 7–29. https://doi.org/10.1177/0956797613504966.
Article Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. ArXiv:1001.0736 [Math, Stat], 8. https://doi.org/10.1111/biom.12292
Friedman, J., Hastie, T., Simon, N., & Tibshirani, R. (2016). Package “glmnet”: Lasso and elastic-net regularized general linear models. R Package Version, 23. Retrieved from https://www.jstatsoft.org/article/view/v033i01. Accessed May 2017
Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206–213. https://doi.org/10.1007/s11121-007-0070-9.
Article Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. Elements (Vol. 1). https://doi.org/10.1007/b94608
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12, 55–67. https://doi.org/10.1080/00401706.1970.10488634.
Article Google Scholar
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2007). An introduction to statistical learning. Performance evaluation (Vol. 64). Springer US. https://doi.org/10.1016/j.peva.2007.06.006
Kessler, R., Warner, C., Ivany, C., Petukhova, M., Rose, S., Bromet, E. J., et al. (2015). Predicting suicides after psychiatric hospitalization in US Army soldiers: The Army Study To Assess Risk and rEsilience in Servicemembers (Army STARRS). JAMA Psychiatry, 72, 49–57. Retrieved from. https://doi.org/10.1001/jamapsychiatry.2014.1754.
Article PubMed PubMed Central Google Scholar
Kuhn, M. (2008). Caret package. Journal of Statistical Software, 28(5).
Lockhart, G., Mackinnon, D. P., & Ohlrich, V. (2011). Mediation analysis in psychosomatic medicine research. Psychosomatic Medicine, 73, 29–43. https://doi.org/10.1097/PSY.0b013e318200a54b.Mediation.
Article PubMed Google Scholar
McNeish, D. M. (2015). Using lasso for predictor selection and to assuage overfitting: A method long overlooked in behavioral sciences. Multivariate Behavioral Research, 50, 471–484. https://doi.org/10.1080/00273171.2015.1036965.
Article PubMed Google Scholar
Pinquart, M., & Shen, Y. (2011). Behavior problems in children and adolescents with chronic physical illness: A meta-analysis. Journal of Pediatric Psychology, 36, 375–384. https://doi.org/10.1093/jpepsy/jsq104.
Article PubMed Google Scholar
Sauer, B., Brookhart, M. A., Roy, J. A., & VanderWeele, T. J. (2013). Covariate selection. In P. Velentgas, N. A. Dreyer, P. Nourjah, S. R. Smith, & M. M. Torchia (Eds.), Developing a protocol for observational comparative effectiveness research: A user’s guide (pp. 93–108). Rockville, MD: Agency for Healthcare Research and Quality.
Google Scholar
Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). A sparse-group lasso. Journal of Computational and Graphical Statistics, 22, 231–245. https://doi.org/10.1080/10618600.2012.681250.
Article Google Scholar
Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2015). Fit a GLM (or Cox model) with a combination of lasso and group lasso regularization.
Tibshirani, R. (2011). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 73, 267–288. https://doi.org/10.1111/j.1467-9868.2011.00771.x.
Article Google Scholar
Urminsky, O., Hansen, C., & Chernozhukov, V. (2016). Using double-lasso regression for principled variable selection. Available at SSRN 2733374, 1–70.
Vanderweele, T. J. (2012). Invited commentary: Structural equation models and epidemiologic analysis. American Journal of Epidemiology, 176, 608–612. https://doi.org/10.1093/aje/kws213.
Article PubMed PubMed Central Google Scholar
Wooldridge, J. M. (2013). Introductory econometrics (4th ed.). Mason, OH: South-Western Cengage Learning. https://doi.org/10.1016/j.jconhyd.2010.08.009.
Book Google Scholar
Zhao, Y., & Luo, X. (2016). Pathway lasso: Estimate and select sparse mediation pathways with high dimensional mediators. Retrieved from http://arxiv.org/abs/1603.07749
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429. https://doi.org/10.1198/016214506000000735.
Article CAS Google Scholar
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, 67, 301–320.
Article Google Scholar
Zou, H., & Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. Annals of Statistics, 37, 1733–1751. https://doi.org/10.1214/08-AOS625.ON.
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, Utah State University, 2810 Old Main Hill, Logan, UT, 84322, USA
Tyson S. Barrett & Ginger Lockhart

Authors

Tyson S. Barrett
View author publications
You can also search for this author in PubMed Google Scholar
Ginger Lockhart
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tyson S. Barrett.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approvals

The present study uses data collected through the direction of the Centers for Disease Control and Prevention. As such, the proper ethical approvals were obtained through its supervision.

Informed Consent

As in part three above, the Centers for Disease Control and Prevention were in charge of supervising the informed consent of each of the subjects.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic Supplementary Material

ESM 1

(DOCX 26 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barrett, T.S., Lockhart, G. Efficient Exploration of Many Variables and Interactions Using Regularized Regression. Prev Sci 20, 575–584 (2019). https://doi.org/10.1007/s11121-018-0963-9

Download citation

Published: 30 November 2018
Issue Date: 15 May 2019
DOI: https://doi.org/10.1007/s11121-018-0963-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Efficient Exploration of Many Variables and Interactions Using Regularized Regression

Abstract

Access this article

Similar content being viewed by others

New Frontiers in Prevention Research Models: Commentary on the Special Issue

Using Bayesian Meta-Regression to Advance Prevention Science Research: an Introduction and Empirical Illustration

Meta-analysis with Robust Variance Estimation: Expanding the Range of Working Models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approvals

Informed Consent

Additional information

Publisher’s Note

Electronic Supplementary Material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient Exploration of Many Variables and Interactions Using Regularized Regression

Abstract

Access this article

Similar content being viewed by others

New Frontiers in Prevention Research Models: Commentary on the Special Issue

Using Bayesian Meta-Regression to Advance Prevention Science Research: an Introduction and Empirical Illustration

Meta-analysis with Robust Variance Estimation: Expanding the Range of Working Models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approvals

Informed Consent

Additional information

Publisher’s Note

Electronic Supplementary Material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation