The seminal conceptual framework for prevention science by Kellam, Koretz, and Moscicki (Kellam et al. 1999; known as the developmental epidemiological preventive science framework) integrates three disciplines: epidemiology, life course development, and intervention trials technology. The third core element adds the systematic study of causal functions of risks and protective factors on targeted outcomes. Thus, prevention research is primarily concerned with establishing evidence-based statements about the causal nature of any form of intervention (i.e., policies, programs, or practices), and the hypothesized improvement of health or the reduction of disease-related problems. Within the last decades, tremendous progress has been made in developing statistical tools for causal inference (see, e.g., Pearl 2009; Peters et al. 2017; VanderWeele 2015; Wiedermann and von Eye 2016) which enable prevention scientists to empirically evaluate causal hypotheses and estimate causal effects. Although many causal inference methods, such as propensity score techniques (e.g., Harder et al. 2010; Lippold et al. 2014), causal mediation approaches (Pearl 2012), and sensitivity analyses (Liu et al. 2013), are already part of the standard methodological toolbox of prevention researchers, the Society of Prevention Research (SPR) recently provided an update of standards of evidence (Gottfredson et al. 2015) with a special emphasis on the importance of testing the complex mechanisms by which interventions causally affect health-related outcomes. The SPR Board further concluded that existing methods may still be too rudimentary to answer complex questions of causation—in particular, those research questions that go beyond simple main effects of experimental interventions. Examples of such methodological shortcomings are

  1. (a)

    the lack of statistical methods to rigorously evaluate complex mediational chains of causation (Imai et al. 2013),

  2. (b)

    the lack of quantitative methods to adequately address the iterative and dynamic processes of both, temporal change in individual health behavior (Beltz et al. 2016) and change in community factors that affect the sustainability of evidence-based interventions (Chambers et al. 2013), and

  3. (c)

    the lack of rigorous statistical tools to provide evidence of causation in the latent variable domain (for first attempts see Butera et al. 2014 and Muthén and Asparouhov 2015; von Eye and Wiedermann 2014).

The Special Section, Advances in Statistical Methods for Causal Inference in Prevention Science, brings together leading scholars in the field of causal inference research to present and discuss recent methodological developments which help to overcome many of these shortcomings. The first article, contributed by Bray et al. (2018), proposes a novel methodological framework that links propensity score techniques and Latent Class Analysis (LCA). While previous work (Butera et al. 2014; Lanza et al. 2016) discusses methods for estimating causal effects of manifest exposures (e.g., depressive symptoms) on complex patterns of latent outcomes (e.g., latent substance use profiles), the present article proposes statistical techniques to quantify the causal effect of latent class exposures (i.e., complex patterns of multiple causes) on manifest distal outcomes. Nationally representative data on adolescent drinking motives and adult drinking disorder are used to demonstrate this novel approach.

The second article, Kelcey et al. (2018), is devoted to the study of causal mediation effects in cluster-randomized interventions. While previous methodological research on multilevel mediation models focused on, for example, modeling issues (e.g., Preacher et al. 2010; Pituch and Stapleton 2012) and methods for statistical inference (e.g., Pituch et al. 2006), the issue of sample size planning for the detection of multilevel mediation effects received considerably less attention. The present article discusses the conceptual and statistical framework to study mediation effects in cluster-randomized interventions and introduces novel power analysis tools (cf., Dong and Maynard 2013) to design multilevel mediation studies. The article, thus, closes an important gap in planning for cluster-randomized interventions which now enables prevention scientists to design multilevel mediation studies with adequate power to address both, the complexity of the intervention setting under study and the potential complexity of the underlying causal mechanisms of the intervention.

While methods to estimate mediation effects with randomized exposures are well understood and readily available (e.g., MacKinnon 2008), several authors noted the limitations that concern the mediator-outcome component of an intervention theory (known as the “conceptual theory”; cf. Gottfredson et al. 2015; Herting 2002). Here, it is well-known that, even under randomized treatment, neither the direction (cf. Wiedermann and von Eye 2015) nor the magnitude of the causal “conceptual” effect can be identified uniquely when mediators are collected at the same time as the outcomes without imposing additional assumptions on the data (e.g., Keele 2015; MacKinnon and Pirlott 2015; Pirlott and MacKinnon 2016). The third article, contributed by Wiedermann et al. (2018), introduces a line of research known as Direction Dependence Analysis (DDA; Wiedermann and Li 2018; Wiedermann and Sebastian 2018) which can be used to detect potential confounders and infer the causal direction of effects (i.e., whether the causal model xy or the reversed model yx reflects the underlying causal mechanism) in observational data. The authors demonstrate how DDA can be used to empirically test causal direction and magnitude of the “conceptual” (mediator-outcome) part of an intervention theory.

In the fourth article, Shimizu (2018) then introduces a more general approach to discern causal structures in non-experimental data. The presented algorithms based on linear non-Gaussian acyclic models (LiNGAM), mainly developed in the field of machine learning research (cf. Shimizu et al. 2006; Hyvärinen and Smith 2013), are capable of identifying causal structures of (non-experimentally observed) variables even in the presence of unobserved confounders (cf. Shimizu and Bollen 2014). This approach is, thus, ideally suited to generate new hypotheses of how intervention effects are altered by factors that are not under experimental control. To the best of our knowledge, this is the first article that introduces modern machine learning algorithms for causal structure learning to the audience of prevention scientists.

The fifth article, authored by Molenaar (2018), focuses on recent developments of methods for causal inference in intensive longitudinal studies (ILSs), i.e., studies with rapid in situ assessment protocols (cf. Bamberger 2016; Bolger and Laurenceau 2013). With technological advances, ILSs provide new opportunities for prevention research to evaluate short- and long-term efficacy of prevention programs and to study the dynamics of temporal changes in health-related behavior (Ridenour et al. 2013). The author introduces Granger causality testing (Granger 1969) which enables prevention researchers to identify causal relations among time-dependent variables. In essence, Granger causality testing relies on a prediction error approach where a variable x “Granger-causes” a variable y if the prediction error of yt (given a universal set of information up to time point t) is smaller than the prediction error of yt without considering the information of x. The author then highlights the methodological issue that two equivalent representations of so-called vector autoregressive (VAR) models exist which, however, can lead to different Granger causal conclusions. The present contribution introduces a data-driven way to find the optimal representation of VAR models to derive Granger causality statements. Application of this novel approach is illustrated using time series of electro-dermal activity (EDA) data of a child with sensory processing disorder and his therapist during interaction in occupational therapy.

The Special Section closes with a commentary by Musci and Stuart (2018). Starting with summarizing the fundamental challenges of causal inference, the authors provide a careful reminder that every statistical approach to causal inference, including the approaches presented in the Special Section, builds on (sometimes strong and untestable) assumptions. Thus, prevention scientists need to be aware of these assumptions and be able to assess their plausibility when applying the methods to their own data. Further, the authors emphasize the importance of sensitivity analyses to evaluate the robustness of the causal inference approach against assumption violations.

The articles included in this Special Section present theoretical and empirical research and make important contributions to extend the methodological repertoire of prevention scientists. While causal inference will continue to be a dynamic field of research, the recent advances constitute important steps in the development of statistical methods to equip the next generation of quantitative prevention scientists.