Introduction

Severe, problematic risk behaviors involving adolescents are a significant, international concern. Around 10% of adolescents, predominantly females, report having self-harmed (Hawton et al. 2002; Madge et al. 2008), and suicide, primarily by males, is the second most frequent cause of death in young people (Patton et al. 2009). Perpetration of violence peaks in late adolescence: the proportion of arrests for violent offences (homicide, rape, robbery, aggravated assault) in the US in 2015 who were under 18 years of age was 8.7%; 33.4% were under 25 years of age (United States Department of Justice 2017). The etiology of both self- and externally-directed aggression is complex. Nevertheless, psychiatric disorders, notably depression, anxiety, substance misuse, and Attention Deficit Hyperactivity Disorder, are common in young people presenting to general hospitals after self-harm; estimates of prevalence range from 48 to 87% (Al Ansari et al. 2001; Manor et al. 2010). In terms of youth violence, externalizing disorders including ADHD and oppositional defiant disorder (ODD) are known antecedents (Farrington 2005; Langbehn et al. 1998). In forensic and criminal/deviant youth populations, mental illness is especially common: two-thirds of males and three quarters of females in detention meet criteria for a clinical diagnosis (Teplin et al. 2006). A metaregression analysis of 25 surveys of youth aged 10–19 years in juvenile detention and correctional facilities found prevalence of psychotic illness to be 3.3 and 2.7% in boys and girls, respectively; 10.6 and 29.2% with major depression; and 11.7 and 18.5% with ADHD (Fazel et al. 2008). Higher rates of violent offending in people with psychotic disorders relative to siblings with no such disorder, and to matched general population controls, suggest an association between psychosis and violence. Further, higher rates of violent offending among those with psychosis with the least time spent in hospital treatment implies a causal link. Against this backdrop, risk assessment for severe problematic behaviors is central to the work of professional mental health and criminal justice practitioners working with high-risk populations.

Risk assessment for the purpose of the prediction of adverse outcomes in forensic populations has its origins in the work of Ernest Burgess (1928). Burgess advocated an actuarial method of risk calculation in which individuals (adult parolees) were assigned a score of one for each of 21 characteristics (risk factors) that he deemed to be linked to parole violation; an overall risk level was assigned dependent on summed risk factors exceeding a predetermined threshold or “cut off” score. In a sample of 3000 parolees, 76% of those deemed high risk went on to re-offend in the subsequent 5-year period. In an era where individual clinical viewpoint (“unstructured clinical judgment”) was the norm, Burgess’ approach represented a step change inasmuch as it had the potential to inform decisions about parole. Glueck and Glueck (1950) used a similar method to examine risk factors in 500 delinquent boys and controls matched on important variables: age, ethnicity, intelligence, and income. Rather than simply assigning an equal score for the presence of each risk factor, weightings were derived from the strength of the risk factor as determined by the magnitude of difference between delinquents and controls. Thus, evidence of serious misbehavior in school (96% of delinquents versus 17% non-delinquents or 5.6:1) would be assigned more weight in the overall risk calculation than evidence of coming from a “broken home” (60 versus 34%; 1.8:1). The Gluecks viewed delinquency as essentially multifactorial in origin and approached the subject from a multidisciplinary perspective (Laud and Sampson 1991). Again, there are considerable advantages in terms of transparency and in the potential to improve decision-making. However, like other actuarial approaches, the risk factors selected tend to be static in nature (i.e., demographic features, prior convictions, and personality characteristics), they do not prioritize clinically relevant modifiable variables—sometimes termed dynamic factors—that could be targeted for treatment, and they reduce the role of professional judgment (Hart 1998a, b). Further, some view the assignment of risk levels or, more specifically, the subsequent restrictions on freedom that can result from them, based on the superficial similarity of an individual to others as unjust and therefore inherently unethical. If the ultimate aim of risk assessment is both to protect the public and promote rehabilitation by targeting modifiable risk factors for treatment then a reliance solely on actuarial methods might not be most effective (Silver and Miller 2002).

In the last 20 years, risk assessment has evolved to address these criticisms in the form of the structured professional judgment approach (Dolan and Doyle 2000). The Historical Clinical Risk-20 (HCR-20; Webster et al. 1997), the first of these tools, combines a schedule of empirically validated risk factors for violence in mentally disordered populations. Clinicians rate the presence, partial presence, or absence of each using their clinical judgment, and make an overall judgment as to the nature and likelihood of violence for a specified prospective period. Structured professional judgment tools have been developed for different populations and for a range of risk behaviors; tools for use with adolescents include the Estimate of Risk of Adolescent Sexual Offence Recidivism Version 2.0 (ERASOR; Worling and Curwen 2001), the Structured Assessment of Violence Risk in Youth (SAVRY; Borum 2006), and the Youth Level of Service/Case Management Inventory (YLS/CMI; Hoge and Andrews 2002). They are widely used to assist decisions about civil and criminal commitment involving hospitalization, treatment, and management, and release decision-making (Heilbrun 2012). In terms of predictive performance, the SAVRY has been judged equivalent to actuarial measures (Olver et al. 2009; Welsh et al. 2008).

One further development in risk assessment instrumentation for use with high risk populations has been the identification of the importance, and the subsequent inclusion, of protective factors. In contrast to the adult field, where the inclusion of protective factors in risk assessment for mental health and criminal justice populations is relatively new (de Vogel et al. 2011; Webster et al. 2009), their importance in adolescent risk assessment is longer standing (Arthur et al. 2003; Borowsky et al. 1997; Deković 1999; Loeber and Farrington 1998; Rolf 1985; Rosenberg 1987). The drivers of this development include the vast literature on adolescent resilience, the concerns of mental health and criminal justice youth-professionals about the lifelong consequences of early stigma, and the belief that young people may be more amenable to treatment and change than adults (Viljoen et al. 2012).

Conceptually, protective factors have been described in two main ways. First, defined simply as an absence of risk, they are viewed as individual and social characteristics that might be preventative by halting the occurrence of the risk factor(s) responsible for problem behavior. For example, excellent performance at school may be viewed as a protective factor against delinquency because it is the opposite of poor performance which is a risk factor (Shader 2001). Alternatively, protective factors have been viewed as characteristics that may reduce dysfunction directly or act as a buffer to mediate the negative effects of risk factors (Dignam and West 1988; Wheaton 1986). Further, protective factors may directly promote pro-social behaviors that, in themselves, may be protective (Jessor and Turbin 2014). The most comprehensive account thus far in the literature is that of Jessor and Turbin (2014; see also Jessor et al. 1991; Jessor 2014) who describe the role of protective factors in the context of problem behavior theory (PBT) in which they comprise characteristics related to presence of pro-social role models, informal social and personal controls, social support for pro-social behavior, and engagement in pro-social behavior. Risk factors, however, are related to presence of role models for problem behavior, opportunities to engage in problem behavior, vulnerabilities for engagement in problem behavior, and actual engagement in problem behavior. Both risk and protective factors might emanate from family, peer, civic, and school environments. Protective factors are viewed not as simply absence of risk, but as distinct entities which can both influence problem- and prosocial-behaviors, and mitigate risk factors. Application of the theory to study of problem and prosocial behaviors in large US and Chinese samples of adolescents (Jessor and Turbin 2014) suggests that presence of informal social and personal controls is the best predictor of problem behavior; the presence of prosocial models and of social support predict prosocial behavior but not problem behavior. Thus, it seems probable that protective and risk factors may play somewhat different roles in the causation of both prosocial and problem behaviors.

Current Study

Despite the use of protective factors in theory and research, Jessor and Turbin (2014) have identified considerable variation and ambiguity in how protective factors have been conceptualized and minzed in risk assessment research studies. While protective factors have been studied in relation to a wide range of undesirable behaviors in adolescence, including tobacco use, sedentariness, truancy, and unhealthy eating (Jessor 2014), the current study is focused on their use in studies of the most severe and acutely risky behaviors such as self-harm and violence. The use of tools or schemes, i.e., schedules comprising empirically-derived risk factors with ratings guidance designed for use by mental health and criminal justice professionals conducting risk assessment of juveniles, has grown significantly in recent years: for example, more than 90% of US states now employ such tools in violence assessment (National Center for Juvenile Justice 2012). We are aware of a number of proprietary risk assessment schemes that are intended to guide in-depth assessment of adolescent mental health and criminal justice populations in regard to these behaviors, and that claim to incorporate protective factors in that assessment. As a first step in an attempt to consolidate current progress in the field, and to identify future research and development priorities, we have conducted a study to determine the extent to which these tools improve the predictive efficacy of assessment relative to that based on risk factors alone.

Method

Review Protocol

The review was conducted in accordance with the preferred reporting items for systematic reviews and meta-analyses statement (Moher et al. 2009) in order to facilitate transparent reporting. Included studies were selected as part of a larger literature search regarding the incorporation of protective factors in risk assessments in both adult and adolescent populations; a meta-analysis of the role of protective factors in adults has been reported on previously (O’Shea and Dickens 2016).

Tool Selection

We conducted extensive literature database and internet searching to identify tools that have been developed to assist with assessment of risk, and which explicitly include protective factors assessment. We identified 17 instruments that aim to assist mental health professionals in the assessment of protective factors. For the purpose of the current review, the identified tools designed for use with adolescent populations specifically (n = 9) were the short-term assessment of risk and treatability: adolescent version (START:AV; Viljoen et al. 2012), the structured assessment of violence risk in youth (SAVRY; Borum 2006), the San Diego regional resiliency check-up (SDRRC; Turner and Fain 2006), the multiplex empirically guided inventory of ecological aggregates for assessing sexually abusive children and adolescents (MEGA♪; Miccio-Fonseca 2013), the Columbia Suicide Severity Rating Scale (CSSRS; Posner et al. 2011), the reasons for living inventory-adolescents (RFL-A; Gutierrez et al. 2000), the Brief RFL-A (BRFL-Osman et al. 1996), the RFL-young adults (RFL-YA; Gutierrez et al. 2002), and the RFL-college students (RFL-CS; Westefeld et al. 1992).

Search Strategy

Multiple electronic databases (PsycINFO, Scopus, Web of Knowledge, Cochrane Library, CINAHL, and NCJRS) were searched for articles published before June 25 2014 as part of the wider search strategy. Search terms relating to the selected assessment tools were combined with those pertaining to multiple adverse outcomes (see example in online Appendix A). Wild card search terms (those ending with “*”) were used to return all permutations of each search term. All articles, including “gray” literature (e.g., conference presentations, technical reports, theses and dissertations), were eligible for inclusion and additional studies were located through hand searching reference lists of papers identified by the previous step.

Inclusion and Exclusion Criteria

Eligibility of articles was assessed by the second author as part of the previous review (O’Shea and Dickens 2016); the first author independently reviewed 25% of the studies to establish inter-rater reliability (κ = 0.91). Articles must have documented an original empirical investigation of the predictive validity of one or more of the identified tools for any of its intended outcomes in an adolescent population, using a prospective or pseudo-prospective design. Area under the curve (AUC) values and their associated 95% confidence intervals (CI) must have been included, or there must have been sufficient statistical information to allow for their calculation. Studies were excluded if they were not written in English, or if the assessment instruments had been amended or adapted from the published version. In cases where samples overlapped, only the study with the largest sample size was retained to avoid including the same participants twice; the exception to this were cases where different tools or outcomes were examined, in which case both studies were retained.

Operationalization of Protective Factors at Tool Level

For each tool that was used in at least one study of predictive ability we identified how protective factors were operationalized. We assigned tools to one of two categories used in a previous study (O’Shea and Dickens 2016): (A) Tool comprises factors that define a protective factor as lying at the opposite end of a continuum to a risk factor (e.g., history of kindness as opposed to history of violence measured on a single continuum); (B) Tool comprises protective factors which are defined as conceptually distinct from risk factors (e.g., history of kindness irrespective of history of violence). A third possibility, i.e., that protective factors are defined simply as the absence of a risk factor, was considered but rejected since this would encompass all formal risk assessment tools whose scoring is predicated on absence or presence of risk factors irrespective of any formal acknowledgement of protective factors. In addition, we identified how raters are guided to integrate protective factors into the overall risk prediction (e.g., through structured professional judgment into a risk estimate; or actuarial determination through cut-off scores).

Data Extraction for Meta-Analysis

The following information was extracted from included studies: number of participants, country of data collection, setting, length of follow-up period, assessment tool(s) used, adverse outcome(s) measured, sample characteristics, the AUC value and 95% CI for each protective scale-outcome combination. The AUC values and 95% CIs were also extracted for summary judgments and risk scales of tools that assess protective factors, in order to provide contextual information regarding the magnitude of effect sizes for protective factors.

Risk of Bias

The quality of included studies was independently assessed by both authors (linear weighted kappa = 0.95) using criteria developed by Haney et al. (2012). Each domain is rated as “yes”, “unclear”, or “no”, and overall risk of bias (low, unclear, or high) reflects raters’ opinions that identified biases reduce confidence of results (see Table 1).

Table 1 Quality assessment of included studies

Data Synthesis

Most studies either inverted protective scale scores or used these scales to predict desistance from risk behavior; for the remaining studies that did not use either of these procedures, we inverted AUC values to facilitate synthesis of results and comparisons with effect sizes from risk scales and, if relevant, summary judgments. Effect size was extracted for each scale-outcome combination. Given that outcomes were defined by the authors of each individual study, there may be some variation in definitions used. In order to minimize the number of reported outcomes the following effect sizes were combined: (1) inpatient physical aggression against others, inpatient physical aggression against objects, and inpatient physical aggression as “any inpatient aggression”; (2) inpatient verbal abuse and inpatient verbal threats as “inpatient verbal aggression”; and (3) non-sexual reoffending, non-violent reoffending, and general reoffending as “general reoffending”. Where a study reported on more than one of the outcomes in any group, a mean effect size was calculated.

Meta-analysis was conducted using R (R Core Team 2015). AUC values were converted to Cohen’s d values using a table provided by Rice and Harris (2005), which are a commonly used measure of effect size appropriate for random effects models (Yang et al. 2010). However, as Cohen’s d values can be biased and overestimate the true effect, especially for small sample sizes (Lakens 2013), they were converted to Hedge’s g values, which corrects for bias, using formulas provided in the compute.es package (Del Re 2013). Hedge’s g values, weighted by the inverse variance weight, were pooled using the rma.mv function from the metafor package (Viechtbauer 2010). This conducts a multilevel meta-analysis that can take into account non-independence of effect sizes resulting from the nested data structure in the current review. Random effects were included for outcome and scale nested within study since estimates derived from the same scale, or predicting the same outcome, will likely be correlated. Random effects were also added for study nested within author, and for author, since effect sizes within studies and authors are likely to be more similar than those between studies or authors. Effect sizes were estimated for each outcome relative to that for any inpatient aggression because the ability of structured professional judgment schemes for prediction of this outcome is well established (e.g., Singh et al. 2011). Effect sizes for protective scales and summary judgments were estimated with risk scales as the reference category because their relative performance is of interest. The effect of gender, coded as the percentage of the sample that is female, was included as a moderator as previous research has found that some risk assessment schemes perform more accurately in women (O’Shea et al. 2013; O’Shea and Dickens 2015). The risk of bias as determined by quality assessment was also included as a moderator to investigate the effect of bias on estimated effect size. Effect sizes were classified as small (0.2), moderate (0.5), and large (0.8) according to Cohen’s criteria (Cohen 1992). Equivalent AUC values were presented for estimated effect sizes to facilitate comparison with previous research and I2 values were calculated to quantify the extent of any observed heterogeneity (Heudo-Medina et al. 2006).

Results

Study Characteristics

The literature search conducted for the previous review (O’Shea and Dickens 2016) identified 107 articles for which the full-text was reviewed; 67 records were excluded resulting in 17 studies for inclusion in the meta-analysis of the contribution of protective factors to risk assessment in adults (O’Shea and Dickens 2016), and 23 studies, comprising 30 independent samples, for inclusion in the current review (see Fig. 1 for exclusion reasons).

Fig. 1
figure 1

Flow diagram of literature search: Modified from the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement flow diagram (Moher et al. 2009). AUC, area under the receiver operating characteristic curve

The total sample size was 3280 (mean N = 143); four articles were masters or doctoral theses and 19 were journal articles published between 2008 and 2014. The most researched tool was the SAVRY (k = 21); the START:AV and MEGA♪ were both examined by only one study. None of the included studies examined the predictive validity of the SDRRC, CSSRS or any of the RFL variants. Studies were conducted in Canada (k = 9), the United Kingdom (k = 4), the United States of America (k = 4), Australia (k = 2), Finland (k = 1), and Spain (k = 1). Samples were drawn from juvenile justice facilities (k = 10), psychiatric and forensic psychiatric inpatient units (k = 7), probation services (k = 5), custody (k = 4), assessment centres (k = 3), correctional or specialist schools (k = 2), and an outpatient psychology clinic (k = 1). The most commonly studied outcomes were “any violent offending” (k = 14 studies) and “general reoffending” (k = 10). Eleven other outcomes related to inpatient behavior, sexual recidivism, technical recidivism, self-harm, victimization, substance/street drug use were examined in between one and three studies each. See Table 2 for full study characteristics.

Table 2 Characteristics of studies included in meta-analysis

Characteristics of Included Tools

All three tools operationalize protective factors as distinct from risk factors (definition A). SAVRY and START:AV recommend incorporation of identified protective factors into a risk formulation through a process of structured professional judgment. In the MEGA♪, protective factors are integrated into a risk formulation through actuarial methods.

The SAVRY (Borum 2006)

The SAVRY comprises 30 items; 10 historical (e.g., history of violence, childhood history of maltreatment), six social (e.g., peer delinquency, poor parental management), eight individual/clinical (e.g., negative attitudes, anger management problems), and six protective (prosocial involvement, strong social support, strong attachments and bonds, positive attitude toward intervention and authority, strong commitment to school, resilient personality traits). It is recommended for use in adolescents aged between 12 and 18 years to assess risk of violence. Each risk item is rated as low (0), moderate (1), or high (2); the protective items are scored as absent (0) or present (1). For research purposes scale scores can be calculated by summing individual item scores; a total risk score can also be calculated by summing the historical, social, and clinical scales. Two studies (Lodewijks et al. 2010; Vincent et al. 2012) also calculated a dynamic items score by summing the social and individual scale items. Finally assessors are required to make an overall rating of low, moderate, or high risk for future violence based on presence and relevance of items and a range of additional information; one study (Hilterman et al. 2014) also reported a risk estimate of low, moderate, or high risk of general offending.

The START:AV (Viljoen et al. 2012)

The START:AV is a structured professional judgment tool comprising 23 dynamic items (sample items: School and work, Recreation, Substance use, Rule adherence) that pertain to adolescents and their social contexts and is adapted from the adult START (Webster et al. 2009). Each item is scored on two 3-point scales, once in terms of protective factors (strengths) and once in terms of risk (vulnerabilities). Specific risk estimates (SREs; low, moderate, or high) are then formed from consideration of the presence and relevance of the 23-items plus a range of additional information regarding the likelihood of eight adverse outcomes occurring over a maximum of three months (violence, self-harm, suicide, self-neglect, substance abuse, unauthorized leave, and victimization, general offending). Scoring criteria were adapted from the START to reflect adolescents’ developmental context and outcomes were adjusting to include risk behaviors relevant to adolescents, such as running away from home rather than the START-item “unauthorized leave”.

MEGA♪ (Miccio-Fonseca 2013, 2016; Miccio-Fonseca and Rasmussen 2013, 2015)

MEGA♪ is a 75-item tool intended to aid assessment of risk for sexually inappropriate or abusive behaviors in children and adolescents aged 4–19 years. It comprises four scales: (1) the risk scale, containing 45 historical and dynamic items assessing generalized risk for “coarse sexual improprieties and/or sexually abusive behaviors” (p. 627); (2) a protective scale, comprising ten historical and dynamic items (e.g., youth is rule bound, getting along better with others) that mitigate risk; (3) an estrangement scale containing 14 items related to family relationships assessing whether the use is a victim of any type of abuse (e.g., parental separation, exposure to domestic violence); and (4) the 6-item persistent sexual deviancy scale which captures the frequency and progression of sexually abusive behaviors (e.g., offender-victim age disparity). It is recommended that MEGA♪ be completed every 6 months after a case file review and interview where possible.

Risk of Bias

Study quality assessment indicated that the vast majority (k = 19) of studies were rated as unclear risk of bias; three studies were rated as low risk of bias and one was rated as high risk (see Appendix A). The most common source of potential bias was failure to provide evidence that participants had been randomly selected or were consecutive admissions.

Individual Study Effect Sizes

A total of 278 individual AUC values were contributed from the 23 studies. The number of AUC values contributed ranged from 1 to 48. The magnitude of AUC values ranged from 0.44 to 0.91; 171 (62%) of these were significantly greater than chance.

Mean Weighted Effect Sizes

Results of the meta-analysis are presented in Table 3. Moderators accounted for a significant proportion of the heterogeneity of effect sizes (Q[54] = 79.17, p = 0.014; I 2 = 33%) but there was still a significant amount of unexplained heterogeneity (Q[223] = 276.25, p = 0.009; I 2 = 20%). Estimated Hedge’s g value when values were equivalent to the reference categories (i.e., outcome = any inpatient aggression; scale type = risk; scale = MEGA♪ total; bias = low) was 1.38 (AUC = 0.84). There was no significant effect of scale type on estimated effect size (Q[2] = 1.57, p = 0.457; I 2 = 0%); however, effect sizes were smaller for protective scales and summary judgments compared with risk scales. There was a significant effect of individual scale (Q[12] = 22.20, p = 0.035; I 2 = 50%) with the SAVRY historical (−0.53, p = 0.017) and individual (−0.46, p = 0.038) scales performing significantly poorer than the MEGA♪ total score. Overall, estimates of effect size were not moderated by outcome (Q[12] = 19.77, p = 0.072; I 2 = 44%); however, estimated effect sizes were significantly poorer for a number of outcomes relative to any inpatient aggression: any reoffending (−0.44, p = 0.004), violent reoffending (−0.45, p = 0.002), general reoffending (−0.41, p = 0.007), sexual reoffending (−0.65, p < 0.001), inpatient physical aggression (−0.49, p = 0.043), inpatient sexual aggression (−0.54, p = 0.006), and inpatient verbal aggression (−1.00, p = 0.04). Estimates of effect size were not moderated by gender (Q[6] = 6.16, p = 0.405; I 2 = 19%), bias (Q[1] = 0.19, p = 0.664), or the interaction between scale type and outcome (Q[20] = 12.89, p = 0.882; I 2 = 0%).

Table 3 Estimated mean weighted effect sizes relative to reference category

Estimated mean weighted Hedge’s g values for each outcome are presented in Fig. 2. The largest effect size was for any inpatient aggression (1.32, AUC = 0.83) and the smallest was for inpatient verbal aggression (0.38, AUC = 0.61). All of the effect sizes were large, with the exception of those for inpatient verbal aggression, sexual reoffending, and substance use in the community; further, all estimates were significantly greater than 0, based on inspection of 95% confidence intervals, apart from inpatient verbal aggression and substance use in the community.

Fig. 2
figure 2

Forest plot of estimated effect sizes by outcome

Discussion

Structured tools to assist with the prediction of severe outcomes including violence and self-harm are increasingly used by mental health and criminal justice professionals working with adolescents. Some of these schemes have explicitly integrated theories and concepts about protective factors into their construction and guidance documentation. The evidence about whether, and the extent to which, these tools improve predictive accuracy has not thus far been collated. We therefore aimed to synthesize the evidence for the predictive efficacy of structured risk assessment schemes, selected on the basis of their explicit intention of facilitating protective factors, for use in adolescent mental health and criminal justice populations. Our systematic search strategy revealed that there is currently no empirical evidence for the predictive efficacy of the SDRRC, CSSRS, RFL-A, Brief RFL-A, RFL-YA, or RFL-CS. The first conclusion we can draw therefore is that clinical decisions based on the use of these tools are not evidence-based; the case for interventions or resource allocation predicated on them should be treated with extreme caution.

Each of the three tools actually subjected to one test or more of predictive efficacy (SAVRY, START-AV, MEGA♪) conceptualizes protective factors as distinct and separate from risk factors inasmuch as protective factors comprise a separate scale from each tools’ risk factor scale(s). This is congruent with problem behavior theory (Jessor 2014), which describes risk and protective factors as direct, independent predictors of involvement in problem behavior and pro-social behavior; and protective factors as also moderating the influence of risk factors on involvement in these behaviors. However, the integration of the protective factors identified as relevant to the individual under assessment is handled differently across tools. Both the SAVRY and the START-AV are structured professional judgment tools whose guidance indicates that practitioners should consider all contributing elements in the formulation of a risk level. The MEGA♪ is based on actuarial principles with risk level assignment dictated from gender- and age-specific cut-off scores; the one included study of the MEGA♪ lacked some transparency about the contribution of the protective scale to the risk level assignment. For the START:AV and SAVRY, our results revealed that protective factor scales, and summary judgments based on a combined consideration of risk and protective factors, performed no better than the risk scales for the “any inpatient aggression” reference category.

Given that we know a considerable amount about the role of protective factors for a range of problem behaviors or conditions including depression (Cairns and Yap 2014), substance abuse (Beyers et al. 2004), and internet addiction (Koo and Kwon 2014) for the wider adolescent population, it is disappointing that their integration into assessment schemes for high risk populations, and for some very severe and acute outcomes, has not demonstrably led to improved prediction. This is consistent with findings from our previous review of protective factor-oriented instruments for adult mental health and criminal justice populations (O’Shea and Dickens 2016). This might suggest that risk assessment schemes for both adolescents and adults in high-risk populations need to better integrate protective factor-related theory with risk assessment as a first step in an attempt to facilitate more accurate assessment. Extra urgency can be implied from findings that risk assessment tools with no explicit consideration of protective factors, while reasonably good at identifying low risk individuals, are limited in their ability to accurately identify those at high risk of, for example, violence (Fazel et al. 2012); better integration of protective factors might be one way of improving this situation. We propose that this requires new research in a number of key areas, and, if appropriate, translation into new protective factor-oriented assessment schemes. Notable questions for investigation are: (1) are the most relevant protective factors for the most relevant outcomes in high risk youth populations identified; (2) are they operationalized in a manner congruent with relevant theory; (3) is current theory sufficient to explain the most severe/acute outcomes in these high risk groups; (4) is the operationalization and identification communicated to assessors such that their practice is, in practice, fidelitous to the conceptualization; and (5) do new tools improve prediction?

The current study design did not allow us to investigate whether protective scales contribute uniquely to the formulation of a more accurate SRE, although this seems unlikely given the failure of SREs to outperform risk scales. While we have previously found evidence (O’Shea et al. 2013) for statistically significant incremental validity of the Strength scale of the original adult Short-Term Assessment of Risk and Treatability (START; Webster et al. 2009) in the case for prediction of violence, this was marginal, leading only to improved classification of violent and non-violent forensic inpatients of 1.6%. The tools examined in the current review appear to be conceptually premised on an empirically supported model of risk and protective factors (Jessor 2014). However, what appears to be in question is the manner in which, in these specific tools, those risk and protective factors are considered in formulation of a risk estimate given that this is conducted using structured professional judgment, a somewhat opaquely described method in research studies, and one which actively promotes rater discretion and use of clinical expertise (e.g., Guy et al. 2012). One of the major consequences of our study, therefore, is a fresh spotlight on the need to examine and articulate how decisions are made in structured professional judgment approaches including how risk and protective factors are weighed and synthesized into an overall risk estimate.

Some potential moderators (i.e., gender, study bias) were found not to contribute significantly to study heterogeneity; however, that significant heterogeneity remained suggests that other moderators may play a part. Candidates include age, ethnicity, intellectual functioning, diagnosis, independent authorship, study setting and relation to the investigators. Miccio-Fonseca (2009, 2016) and Miccio-Fonseca and Rasmussen (2013) have conducted extensive research into measurable differences between groups on the MEGA♪ including females and those with low intellectual functioning. It is advisable that violence risk predictive tools which incorporate protective factors are also validated in all populations subject to their assessment. Risk assessment tools for use with adults in inpatient settings have been demonstrated to be more predictive for females than males (O’Shea et al. 2013; O’Shea and Dickens 2015) and thus it should not be assumed that modifications should only aim to improve performance among women.

The studied tools examined a range of outcomes and predictive efficacy for outcomes significantly different from the reference category “inpatient physical aggression” (i.e., inpatient verbal aggression, sexual reoffending, and substance use in the community) were significantly poorer. It is somewhat unlikely that items included in the risk assessment schemes are equally predictive of all the outcomes they intend to cover. For the adult version of the START it has been demonstrated that subsets of items are significantly predictive of different outcomes (Braithwaite et al. 2010). Tool developers may wish to consider whether tools require modification and further validation for such outcomes; alternatively, tools should be very specific about which outcomes they have predictive validity for.

There were a number of records that we could not obtain, despite attempted communication with the authors; however, we did review a number of articles that would be considered as grey literature and ultimately included four theses, reducing the likelihood of publication bias. The overall quality of included studies was disappointing, with 20 of the 23 studies being rated as unclear or high risk of bias. Further, there were a number of scale-outcome combinations where there was only one available study which limited our ability to conduct more detailed analysis examining the interaction between outcome and scale. However, these criticisms are directed more at the state of the existing literature than at the current review per se, and highlight the need for more high quality studies investigating the role of protective factors for adverse outcomes in vulnerable adolescents.

Conclusion

The current study demonstrates that the integration of the assessment of protective factors in risk assessment among adolescents under the care of mental health and criminal justice services is seriously underdeveloped compared with studies of the predictive value of violence and sexual recidivism risk assessment tools (e.g., Olver et al. 2009). However, the importance of protective factors is more well-established in adolescent populations than adult ones, and there is good empirical support for at least one theoretical model (Jessor and Turbin 2014). In summary, the operationalization and predictive efficacy of protective factors in high risk mental health populations is currently of extremely limited value. There is considerable work to do to better integrate the wider protective factors research into relevant risk assessment tools in order to ensure resources are used appropriately and risk management is not disproportionately restrictive.