Skip to main content
Log in

Efficient Exploration of Many Variables and Interactions Using Regularized Regression

  • Published:
Prevention Science Aims and scope Submit manuscript

Abstract

The prevention sciences often face several situations that can compromise the statistical power and validity of a study. Among these, research can (1) have data with many variables, sometimes with low sample sizes, (2) have highly correlated predictors, (3) have unclear theory or empirical evidence related to the research questions, and/or (4) have difficulty selecting the proper covariates in observational studies. Modeling in these situations is difficult—and at times impossible—with conventional methods. Fortunately, regularized regression—a machine learning technique—can aid in exploring datasets that are otherwise difficult to analyze, allowing researchers to draw insights from these data. Although many of these methods have existed for several decades, prevention researchers rarely use them. As a gentle introduction, we discuss the utility of regularized regression to the field of prevention science and apply the technique to a real dataset. The data (n = 7979) for the demonstration consisted of 76 variables (151 including the modeled interactions) from the Youth Risk-Behavior Surveillance System (YRBSS) from 2015. Overall, it is clear that regularized regression can be an important tool in analyzing and gaining insight from data in the prevention sciences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Only reporting and discussing significant relationships when many were tested and incurs a high type-I error rate.

  2. It is critical to note that when using the term “important” in this context, it refers to importance in the model’s predictive accuracy. In other words, a variable is important if, in conjunction with the other variables in the model, it is useful in predicting the outcome accurately. This is distinct from the usual discussion of significance and effect size in convention statistics.

  3. All estimates from the elastic net and unbiased models are included in the Supplementary Table.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tyson S. Barrett.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approvals

The present study uses data collected through the direction of the Centers for Disease Control and Prevention. As such, the proper ethical approvals were obtained through its supervision.

Informed Consent

As in part three above, the Centers for Disease Control and Prevention were in charge of supervising the informed consent of each of the subjects.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic Supplementary Material

ESM 1

(DOCX 26 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barrett, T.S., Lockhart, G. Efficient Exploration of Many Variables and Interactions Using Regularized Regression. Prev Sci 20, 575–584 (2019). https://doi.org/10.1007/s11121-018-0963-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11121-018-0963-9

Keywords

Navigation