Introduction

‘Evidence-based policymaking’ (EBPM) is a political slogan. It is used by some to support a greater role for scientific evidence in policy, and others to give legitimacy to existing policies (Cairney 2016a; Boswell 2009, 2017). Its comparator is ‘policy-based evidence’ (PBE), invoked when critics of government policies argue that a policymaker decided what they wanted to do then ‘cherry-picked’ information to back it up. Critics too-readily describe PBE when the use of evidence in party political arenas does not meet an unrealistic standard developed in scientific discourse (Oliver et al. 2014a, b; Parkhurst 2017). It is therefore difficult to distinguish between egregious PBE and more defendable practices, such as responding to the evidence of a problem and combining governance principles, economic, and scientific factors to choose a solution.

The aim of this article is to go beyond these political slogans, to identify categories to help us generate more thoughtful evaluations of policymakers when they need or wish to maintain electoral popularity, make decisions quickly, act despite the limited availability of scientific evidence, make value judgements to prioritise problems and target populations, negotiate with other actors to make choices and ensure delivery, and/or use many forms of knowledge and voices to make policy. First, I use insights from policy theories to describe EBPM in the real world. I discuss the extent to which policymakers have to use major shortcuts to gather evidence and make decisions quickly in a complex policymaking system, the dilemmas involved in combining evidence-based policies with principles-based governance, and the use of different criteria and methods to combine evidence with judgement. This discussion helps us distinguish between the many examples of alleged PBE, to show that most categories do not fit into neat descriptive boxes.

Second, I examine many ways in which the UK Government has used evidence to inform ‘families policies’. I focus initially on ‘troubled families’ policy as an egregious example of PBE based on projecting rather than measuring policy effectiveness. When treated as part of a range of ‘families policies’, it seems like an extreme case. In other cases are different examples of problematic evidence use, regarding how to interpret scientific knowledge on the nature of policy problems and the effectiveness of solutions. In some examples, the problem lies primarily with the UK Government’s disproportionate response to limited evidence. In others, they face dilemmas about which approach to evidence use they should pursue, including learning from domestic or international experience, and using randomised control trials or qualitative evidence. The choice is ‘political’ but not in the sense that we associate with PBE and the conduct of politicians. Rather, the dynamics of political systems encourage politicians to make choices based on limited evidence, address problems that cannot be resolved by ‘the evidence’, and adjudicate between competing ideas about what counts as ‘good’ evidence.

Finally, I compare stories of policymaking from the contrasting perspectives of critical scholars and policymakers: one identifies egregious PBE when the government declared success on turning around ‘troubled families’; another highlights the limited availability of good evidence and a perception in government that the ends (good policy) can justify the means (using limited or bad evidence). Generating more insight on the latter perspective is crucial. It is easy to conclude that the UK Government has broken the rules of EBPM, but more difficult to (a) identify and adapt to the ‘rules of the game’ in Westminster politics, (b) engage with policymakers to produce a better, politically feasible, alternative. We need to do more than declare PBE if we seek to influence the relationship between evidence and policymaking.

We will always identify PBE if comparing the real world to an ideal-type

It is difficult to categorise evidence use if we compare policymaking to an under-defined or too-high standard, in which we fail to demonstrate:

  • what counts as good evidence, and how strongly we should insist that scientific evidence takes precedence over other forms of knowledge, governance principles, and other values (Cairney and Oliver 2017);

  • the difference between an allegedly pathological process in which ideology trumps evidence, and a complex policymaking system in which there are ever-present ‘barriers’ to the use of evidence to inform policy (Cairney 2016a).

These issues are not new. Epistemological debates have raged for centuries. Early post-war studies regarding the extent to which policymaking can be based on ‘rational’ decisions remain key reference points for modern discussions of ‘bounded rationality’ (Simon 1976; Cairney and Heikkila 2014). There is a well-established literature on the relationship between evidence, values, and policy in Science and Technology Studies (e.g. Jasanoff 1986), journals such as Evidence and Policy, and studies on research in policy (Weiss 1979) and ‘evidence-informed’ policy (Nutley et al. 2007). However, the idea of greater ‘rationality’ was given a new lease of life by exponents of ‘evidence-based policymaking’, many of whom have little knowledge of policy studies (Boaz et al. 2008, p. 242; Botterill and Hindmoor 2012, p. 367; Cairney 2016a, p. 19; Embrett and Randall 2014). Consequently, for example, it is common to hear scientists complain that EBPM is not like ‘evidence-based medicine’ (EBM) without explaining the difference or considering the idea that EBPM should not be like EBM (Cairney and Oliver 2017).

EBM is a useful reference point because its ideal is influential across several disciplines—including health, environmental, and management sciences—and, if represented too simplistically, it produces naïve expectations for EBPM (Oliver et al. 2014b; Cairney 2016a, p. 65; Cairney et al. 2016a). Despite more modest and thoughtful intentions to combine evidence and experience (Sackett et al. 1996; Oliver and Pearce 2017), EBM is often associated with (a) gathering the best evidence on interventions (policy solutions), based on a hierarchy of methods, in which randomised control trials (RCTs) and their systematic review are at the top; and, (b) ensuring its direct impact on policy and practice (Oliver et al. 2014b; Cairney et al. 2016a).

Criticisms of such a narrow view can be found in many disciplines, including social policy and social work. One criticism is normative: the pursuit of scientifically informed policy should not come at the expense of alternative sources of legitimate knowledge, including different research methods and knowledge from practitioner and service user experience (Glasby et al. 2007; Williams and Glasby 2010; Morrell 2008; Learmonth and Harding 2006; Beresford 2007; Beresford and Croft 2001). Another is empirical, describing ways in which policymakers and practitioners use evidence:

  • there is no equivalent commitment to a hierarchy of evidence;

  • the policymaking environment involves a larger, more heterogeneous, set of actors; and

  • the implications of scientific evidence for policy and practice are generally open to debate (Lomas and Brown 2009, p. 906; Elliott and Popay 2000, p. 467; Stoker 2010, p. 53; Bédard and Ouimet 2012, p. 625; Fox 2003; Hope 2004, p. 291).

Drawing on such insights, policy theory-driven studies show that narrow evaluations of EBPM do not appreciate three factors which—from a skewed standpoint—produce the appearance of PBE in all policymaking.

Policymakers have to take major short cuts to gather information quickly

What happens when policymakers cannot consider all of the information relevant to their decisions? They use two shortcuts to gather information quickly: ‘rational’ ways to establish the best evidence and sources of evidence, and ‘irrational’ ways to understand policy problems, drawing on emotions, habits, and deeply held beliefs (Cairney and Kwiatkowski 2017; Kahneman 2012, p. 20; Haidt 2001, p. 818; Alter and Oppenheimer 2009, p. 220). The accumulation of scientific knowledge, and large capacity of government, does not solve this problem (Simon 1976; Botterill and Hindmoor 2012; Cairney and Kwiatkowski 2017; Cairney and Weible 2017). Rather, policymakers have too many problems to which to pay attention, too many solutions to consider, and too many choices to make, based on more information than they can process. So, they combine judgements based on beliefs, and shortcuts based on familiarity with information, even if they are committed to evidence-informed processes.

Consequently, ‘the evidence’ is secondary to the ways in which policymakers understand it. They are receptive to particular kinds of information, to address problems to which they pay attention, and provide solutions consistent with their beliefs (Dearing and Rogers 1996, p. 1; Baumgartner and Jones 1993, pp. 11–12; Kingdon 1984, pp. 3–4; Cairney 2012a, p. 183). So, policy theories identify the links between evidence and persuasion when actors combine facts with emotional appeals (True et al. 2007, p. 161); produce feasible policy solutions and exploit times when policymakers have the motive to adopt them (Kingdon 1984); tell stories which manipulate biases, apportion praise and blame, and highlight the moral value of solutions (Jones et al. 2014); and interpret evidence through the lens of established beliefs (Weible et al. 2012).

They use evidence in an environment over which they have limited control

Policy theory is built on a rejection of ‘linearity’ and recognition of complexity in policymaking. A direct link between evidence and action requires a singular moment of authoritative choice—the policy—made and implemented in a ‘policy cycle’ with key ‘stages’ (Hogwood and Gunn 1984). Yet, there are two key problems with a stage-based understanding (Cairney 2012a, b, p. 18, 2016a, p. 34; John 2012; Sabatier 2007; Colebatch 2006). First, it implies a core group of policymakers making policy from the top down, without recognising the diffusion of policy responsibilities across multiple venues. Second, it downplays the tendency of policy to be made continuously, as many decisions to solve problems intersect. Instead, most policy theories identify multi-level policymaking environments with five key characteristics:

  • A wide range of actors make or influence policy in multiple venues at many levels of government.

  • A proliferation of formal and informal rules (‘institutions’) influence behaviour in each venue.

  • Policymakers and influential actors develop networks built partly on trust and the exchange of information.

  • Certain ways of thinking dominate discussion, and are asserted when actors use ideas in ‘good currency’ to bolster arguments, or taken for granted when actors operate within dominant ‘paradigms’.

  • Shifting policy conditions and events—only some of which are predictable—can prompt major shifts of policymaker attention at short notice (Cairney and Heikkila 2014; Cairney 2015; Hall 1993; Ostrom 2007; Weible et al. 2012).

Some theories identify complex policymaking systems in which the same inputs of evidence can receive no, or disproportionate attention, and policy outcomes often ‘emerge’ in the absence of central government control (Geyer and Cairney 2015; Cairney 2012b). A focus on this bigger picture shifts our attention from the use of evidence by an elite group at the ‘top’ to many influential actors in a multi-level process.

Policymakers combine several principles to produce ‘good policymaking’

Policymakers may refer to several principles of ‘good’ policymaking, only one of which is EBPM (Cairney 2016b). They may seek policy consensus, to reflect the general value of pragmatism and cooperation in politics (Lindblom 1979); combine ‘expert scientific advice with a responsiveness to public values’ (Jasanoff 1986, p. 5; Weale 2001, p. 414); and improve policy delivery by generating ‘ownership’ of policy among stakeholders (Jordan and Maloney 1997; Cairney 2012a, p. 90). Central governments also share responsibility with local policymakers, recognising more than one electoral mandate, the importance of partnerships between local public bodies and stakeholders, and the benefits of tailoring policy to communities.

Consequently, there is no single model of EBPM which satisfies all policymaking principles. Instead, national governments make reference to potentially contradictory aims, such as to ensure uniform delivery standards to avoid a ‘postcode lottery’, and encourage local autonomy and policy flexibility (Cairney et al. 2016b). Or they make a clear political choice to select one model, to reflect their beliefs about the best ways to generate evidence and make policy. In some fields, such as public health, there is a culture built on EBM principles and a hierarchy of evidence feeding directly into relatively uniform local practice. In others, scholars challenge such principles (Pawson 2006, pp. 52–54), or combine arguments on evidence and decentralised governance to support knowledge from local practitioners, service users, interest groups, and public ‘deliberation’ (Williams and Glasby 2010, p. 97). For some scholars there is a large gap between scientific evidence and policy when national policymakers do not adhere to principles of EBM, but not all scholars—and few policymakers—refer to this standard.

Categories of evidence use in policymaking

This discussion helps us separate egregious PBE from decisions based on principles, judgement, and many forms of evidence, As Table 1 suggests, egregious PBE refers to only three of twelve scenarios, compared with three examples of EBM-style EBPM. It leaves us with many cases which are not easily labelled PBE but might be criticised because policymakers draw on judgement and principles, economic considerations, and/or evidence from methods low on an evidential hierarchy.

Table 1 EBPM, PBE, and less-defined examples of evidence-informed policy

These examples highlight the inevitable role of judgement to identify good-enough explanation from the available evidence in a limited time, and weigh it up with experience and governance principles to produce a feasible strategy. This process opens up a wide range of contributors to policy, including sources—such as quantitative and qualitative evidence, expert advice, and/or counterfactuals—valued more highly by professions outside of EBM.

UK Government ‘troubled families’ policy: examples of egregious PBE

The UK Government’s ‘troubled families’ programme seems like the archetypal PBE in which ministers asserted a problem, based a decision on minimal evidence, and generated ‘evidence’ to demonstrate success (Fletcher et al. 2012). Ministers seemed determined to project certainty about the cause of, and solution to, a major social problem. The programme involves a massive ‘roll out’ of initiatives to intervene in the lives of particular families to (a) address anti-social behaviour, criminality, child truancy, and parental worklessness, and (b) prevent outcomes such as family eviction, by (c) providing support and threatening sanctions for non-engagement. It piloted programmes from the late 1990s and promised their expansion under Labour governments before the Conservative-led governments announced their massive expansion—to almost 120000 families from 2012–2015 and to 400,000 from 2015—as the solution to the 2011 riots in England (Hayden and Jenkins 2014; the third phase began in 2017—see Department for Work and Pensions 2017).

Within one week of the riots, then Prime Minister David Cameron (2011a) linked behaviour directly to ‘thugs’ and immorality—‘people showing indifference to right and wrong…people with a twisted moral code…people with a complete absence of self-restraint’—before identifying a breakdown in family life as a major factor. Cameron (2011b) stressed the need for people to take moral responsibility for their actions, and for the state to intervene earlier in their lives:

We’ve known for years that a relatively small number of families are the source of a large proportion of the problems in society. Drug addiction. Alcohol abuse. Crime. A culture of disruption and irresponsibility that cascades through generations. We’ve always known that these families cost an extraordinary amount of money…but now we’ve come up the actual figures. Last year the state spent an estimated £9 billion on just 120,000 families…that is around £75,000 per family.

At the heart of the programme is the assertion that we know who the ‘troubled families’ are, what causes their behaviour, and how to stop it. Yet, much is built on value judgements about feckless parents and tipping the balance from support to sanctions, and anecdotes about ‘worklessness’ or ‘welfare dependency’ passing down generations (Crossley 2015, 2017; MacMillan 2014a, b). The government also conflates criteria to identify families in need of support, such as the mental health of the mother, and sanction, such as criminality or anti-social behaviour (Gregg 2010, p. 14; Garrett 2007).

The government’s initial target of almost 120,000 families was based speculatively on previous Cabinet Office estimates in 2006 that about ‘2% of families in England experience multiple and complex difficulties’ (Kendall et al. 2010, p. 1; Social Exclusion Task Force 2007, p. 4; NAO 2013, p. 5; Hayden and Jenkins 2014, p. 635). This estimate was based on limited survey data and modelling to identify families who met five of seven criteria relating to unemployment, poor housing, parental education, the mental health of the mother, the chronic illness or disability of either parent, an income below 60% of the median, and an inability to buy certain items of food or clothing (Levitas 2012, p. 4; Hayden and Jenkins 2014, p. 635). It gave estimates to each local authority and asked them to find that number of families, identifying households with:

  • A child who has committed an offense in the last year, or is subject to an anti-social behaviour order (ASBO).

  • A child excluded from school permanently, or suspended on three consecutive terms, in a Pupil Referral Unit, off the school roll, or has over 15% unauthorised absences over three consecutive terms.

  • An adult receiving out of work benefits.

If the household met all three criteria, they would be included automatically (Crossley 2015, p. 3; NAO 2013, p. 5). Otherwise, local authorities had the discretion to identify more families meeting two of the criteria and other indicators of concerns about ‘high costs’ of late intervention such as, ‘a child who is on a Child Protection Plan’, ‘Families subject to frequent police call-outs or arrests’, and ‘Families with health problems’ linked to mental health, addiction, chronic conditions, domestic abuse, and teenage pregnancy (DCLG 2012, p. 5). Finally, it offered local authorities up to £4000 per family (some up front, some after indicating success) if they invested a further £6000 in effective interventions.

Its measure of success was even more problematic. It declared almost-complete success without meeting the standard of official statistics (DCLG 2015). It is measured by (1) the child no longer having three exclusions in a row, a reduction in the child offending rate of 33% or ASB rate of 60%, and/or the adult entering a relevant ‘progress to work’ programme; or (2) at least one adult moving from out of work benefits to continuous employment (Casey 2014, p. 61). So, the success of a policy to address multiple challenges is measured according to change in one (NAO 2013, p. 6). It was self-declared by local authorities—albeit subject to DCLG ‘spot checks’—and both parties had a high incentive to declare it. Local authorities received per-family payments and the government had a way to declare progress.

This declaration contrasts with an unpublished report stating that the programme had ‘no discernible effect on unemployment, truancy or criminality’ (Cook 2016), although when finally published the report was slightly less negative (DCLG 2016). Initial limited progress was partly confirmed by evidence that many families received no intervention but showed improvement anyway (Bawden 2015), local authorities could only identify families by departing from the DCLG’s criteria (Levitas 2014; Crossley 2015, p. 6, 2016; Hayden and Jenkins 2014, p. 641), only a tiny minority of local authorities invested more than £4000 per family, and there was huge local variation in performance (NAO 2013, pp. 7–9). There is no evidence to support the government’s claim that spending £10,000 per family saves £65,000.

UK Government ‘families policies’: a more complicated mix of EBPM and PBE

A wider discussion of families policies highlights a more complicated mix of government responses, from (1) misrepresenting limited evidence, to (2) making sincere attempts to resolve ever-present limits to knowledge. An exemplar of misrepresenting evidence (and PBE bullet points 2 and 3 in Table 1) is the government’s use of neuroscience to produce a politically feasible strategy, driven by a belief in the benefits of early intervention in the lives of children from ages 0–3 years. Policymakers emphasise the profound effect of stress and neglect on early brain development. This argument informs the ‘now or never’ exhortation found in the Munro review (2011, pp. 69–70) and made most vividly by the Allen reviews’ (2011a, p. 1; b, p. 1) use of images of the brains of ‘normal’ and ‘extremely neglected’ three-year-old children. It is critiqued heavily in social science and neuroscience (Rose and Rose 2016; Featherstone et al. 2013, p. 5). Wastell and White (2012) find no good quality scientific evidence behind it, suggesting that the images are used for shock value. The UK followed the US’ example in which neuroscience ‘was chosen as the scientific vehicle for the public relations campaign to promote early childhood programmes more for rhetorical, than scientific reasons’ (Bruer 2011, p. 2; 1999; see also Gillies 2014).

It is more difficult to identify an exemplar of more sincere attempts to use evidence. Instead, there are many examples corresponding to at least one bullet point under ‘difficult to categorise’ in Table 1. So, I set out the available evidence below and invite you to assess it from the perspective of a policymaker seeking action. Your choices highlight two main approaches available at the time, and only the second option meets the narrowest requirements for EBPM:

  • To generate largely qualitative evaluations of nascent domestic projects. One approach to policy learning is to draw on UK experiences by piloting and evaluating projects. ‘Family intervention projects’ (FIPs) were evaluated primarily with qualitative methods such as interviews with service users and practitioners and with reference to case files.

  • To import and test programmes whose reputations were bolstered by multiple randomised control trials of interventions in other countries. Another approach is to learn from the success of interventions from other countries (Rose 2005; Dolowitz and Marsh 2000; Cairney 2012a, p. 244). The Family Nurse Partnership, Triple P, and Incredible Years were evaluated primarily via RCTs, initially outside the UK.

Developing an evidence base: 1. Learning from UK experience and FIP pilots

The UK government’s expansion of FIPs began under the Labour government (1997–2010). Family intervention is exemplified by the Dundee Families Project (DFP) established in 1996. The DFP focused on low income, often lone parent, families “who are homeless or at severe risk of homelessness as a result of ‘antisocial behaviour’”. It provided 24/7 support, including after school clubs for children and parenting skills classes, and treatment for addiction or depression, in dedicated core accommodation with strict rules on access and behaviour, or via ‘dispersed tenancies’ or an outreach model (Dillane et al. 2001, p. 5). The UK government supported then ‘rolled out’ the model from 2006, built on independent evaluations which present qualified but positive accounts of early progress (Nixon et al. 2010, p. 306; Parr 2009, p. 1257; DCLG 2006; Social Exclusion Task Force 2008, p. 9).

Before 2006, expansion was driven by partnerships between individual local authorities and third sector bodies such as NCH Action for Children (which delivered the DFP), funded largely by central government (DCLG 2006, p. 3). Although driven by the perceived success of residential projects, most FIPs in England have offered outreach services (Nixon et al. 2010, p. 310; DCLG 2006, pp. 2–3). After funding 53 ‘Pathfinder’ pilots up to 2008 (Social Exclusion Task Force 2008), Labour’s proposal was to reach 50,000 families by 2009 (Gregg 2010, p. 1). FIPs largely symbolised a combination of values and evidence, to prompt a shift in the anti-social behaviour agenda, from enforcement to a ‘twin track’ approach including greater support and a reduction in the use of ASBOs (Parr 2009, p. 1262).

The evidence of FIP success

The DFP was evaluated qualitatively, using a small number of in-depth interviews from a sample of residents and staff (Dillane et al. 2001, p. 6), supplemented by self-reporting of success by the project’s management (33, or 59% of cases), and estimated savings based on counterfactuals: what would residential childcare have cost if they did not intervene early? The authors described qualified success built on good management and inter-organisational commitment, and the use of ‘specific intervention types that are tailored to individual families’ needs’ (2001, p. 9).

Subsequently, Pawson et al. (2009, p. 1; see also Nixon et al. 2010, p. 307) evaluated the DFP as part of expanded provision in Scotland. They report the comprehensive analysis of case study backgrounds and interventions, supplemented by 78 in-depth interviews of members of 51 families, and discuss success in terms of 70% of cases closed after completing the agreed programme, ‘reduced complaints of anti-social behaviour’ (‘94% of cases’), and staff assessment of the reduced likelihood of poor outcomes such as homelessness (81%) and continued drug addiction (2009, p. 5). Pawson et al. (2009, p. 6) use a similar ‘cost-consequences’ calculation to estimate short-term savings, but without making a definitive judgement.

The review of initial FIP expansion in England—supporting 256 families (370 adults, 743 children)—makes bolder claims about programme success, arguing that ‘the projects had helped them achieve remarkable changes’, including at least 80% of cases in which families were no longer vulnerable to eviction, coupled with ‘significant improvements in children’s health, well-being and educational attainment’ (DCLG 2006, p. 7). In each case, unlike in key programmes evaluated with RCTs testing the effect of a specific intervention, the evaluations of FIP pilots did not provide a ‘blueprint’ or model for emulation (2006, p. 7). Nor do they describe anything but the potential to save money by reducing demand for acute services (2006, p. 8). Instead, they identified principles of good practice, including the need for many agencies to form partnerships; a long-term programme of support; and an ethos of challenging individual and family behaviour ‘based on the professional values of listening, being non-judgemental, promoting well being, and establishing relationships of trust’ (2006, p. 7). Further, proponents value the potential for local variation, and the discretion afforded to support workers to deliver services without imposing the government’s ASB and problem family rhetoric (Parr 2009, pp. 1269–1270).

The next roll out in England produced 53 new or modified FIPs supporting 690 families in 2007. The authors of its evaluation made qualified claims of success based on interviews with FIP staff, and expanded on elements of good practice: ‘recruitment and retention of high quality staff, small caseloads, having a dedicated key worker who manages a family and works intensively with them, a whole-family approach, staying involved with a family for as long as necessary, scope to use resources creatively, using sanctions with support, and effective multi-agency relationships’ (White et al. 2008, p. 2). They also refer to the limitations of their study, including the absence of RCTs to determine effectiveness (2008, p. 7; Parr 2009, p. 1268).

Similarly, in their evaluation of ‘Family Pathfinder’ pilots in England, Kendall et al. (2010, p. 3) use interviews with, and online surveys of, practitioners in local initiatives to identify the potential cost savings of intervention, with an illustrative programme for 53 families suggesting that ‘One million pounds of family intervention costs is estimated to generate savings of £2.5 m by avoiding adverse outcomes for family members; a net benefit saving of £1.5 m’. In this case, there is a specific estimate of costs, using the Social Return on Investment approach (Nicholls et al. 2009). The authors stress the illustrative and preliminary nature of such estimates, and note that the savings ‘cannot necessarily be cashed by local authorities’ (2010, p. 3). Rather, ‘the benefit–cost–savings need to be viewed at a society, rather than a local authority level’. This approach gained some traction in the Treasury, which adapted its Payment By Results system somewhat to encourage local authorities to invest (interview, HM Treasury, 2015).

Overall, such evaluations provide qualified indicators of success. However, they are critiqued by Gregg (2010), who rejects the implication that FIPs turned around the lives of over 80% of the ‘worst’ families when the figures (a) often related to a small sample of the population, and (b) described people relatively willing to engage with FIPs because they were at risk of eviction. Further, their risk of eviction was often linked to issues of mental illness, debt, and unemployment, ‘hardly the image of ASB fed to the public’, and not solved during the evaluation period (Gregg 2010, pp. 3–5, 15).

Developing an evidence base: 2. Learning from international experience and RCTs

The UK Government supports a collection of interventions whose success has been generated with reference to EBM: evidence of success from multiple RCTs of interventions—initially outside the UK—requiring ‘fidelity’ to make sure that the ‘dosage’ and its effect can be measured (Oliver et al. 2014a, b; Cairney 2016a, b). The Family Nurse Partnership (FNP) began in the US: nurses engaged with first time mothers (at relatively high risk of poor life chances) once per month from pregnancy until the child was two. The criteria for inclusion relate to age (the UK focus is teenage pregnancies), income (low), and partnership status (unmarried). Nurses give advice on how mothers can look after their health, care for their child, prevent further pregnancy, and access education or employment. The FNP combines an intervention to address the immediate problems faced by mothers and ‘early intervention’ to influence the longer-term impact on children (Barnes 2010, p. 9).

FNP gained its reputation from RCTs which demonstrated high effectiveness and low cost. The Coalition for Evidence-Based Policy (2012) gave it ‘top tier’ status, which describes ‘Interventions shown in well-designed and implemented randomised controlled trials, preferably conducted in typical community settings, to produce sizable, sustained benefits to participants and/or society’. It describes common outcomes in at least two US RCTs, including reductions in pre-natal smoking, child abuse/neglect, and second pregnancies, and improvements in their child’s cognitive function and education attainment.

The Department of Health gauged its ability to replicate the US programme with a pilot, initially of 10 sites, followed by semi-structured interviews with 77 FNP staff and structured interviews with an 8–10% sample of service users (Barnes 2010, p. 16). Most respondents were positive about the programme, ‘not perceiving that they had been identified as parents likely to fail but as parents who would benefit from much needed support’ (Barnes 2010, p. 16). After piloting, it was rolled out in England to 9000 expectant mothers, with reference to its high cost-effectiveness and ‘strong evidence base’, which would be enhanced by an RCT to evaluate its effect in a new country (Family Nurse Partnership National Unit 2014; Barnes 2010, p. 10).

Crucially, the FNP requires fidelity to the US programme (you can only access the programme if you agree to the licensing conditions) based on evaluation results which showed that the programme was most effective when provided by nurses/midwives and using a license ‘setting out core model elements covering clinical delivery, staff competencies and organisational standards to ensure it is delivered well’ (Department of Health 2012, p. 6). Fidelity is a requirement because, ‘If evidence-based programmes are diluted or compromised when implemented, research shows that they are unlikely to replicate the benefits’ (2012, p. 6; Barnes 2010, p. 9), and the FNP website outlines ‘fidelity goals’ which resemble those for medicines. In practice, this produces a ‘fair degree of fidelity’ (Barnes 2010, p. 27).

Triple P (Positive Parenting Program) began in Australia, as a parenting programme with five levels to reflect severity of need, from ‘community information provision to intensive one-to one work’ (Lindsay et al. 2011, p. 3). It is ‘designed to prevent—as well as treat—behavioural and emotional problems in children and teenagers’ (Triple P 2016). It offers a standard programme and specialist courses. Its website emphasises high flexibility based on levels of intensity of intervention, evidence-based effectiveness, and low cost, to describe Triple P as an intervention that can be delivered as whole population or targeted programmes (by trained practitioners using the same manual).

Its claim to be one of the most evidence-based interventions in the world proved contentious. Wilson et al.’s (2012) review of 33 studies finds limited effectiveness at often high cost, and there has been some debate between the authors of studies promoting Triple P versus those who question its cost-effectiveness (Tellegen and Sofronoff 2015; Reijneveld et al. 2015; Coyne 2015; Sanders et al. 2012; Coyne and Kwakkenbos 2013). In Scotland, evaluators did not recommend the continuation of Triple P without an RCT to demonstrate its value (Marryat et al. 2014, p. 8).

There are similar concerns in England, partly because evaluations are patchy and provide highly qualified results. Burney and Geldsthorne (2008, p. 478) described it as a ‘programme much favoured by the government’ before being ‘independently evaluated’. Lindsay et al. (2011, pp. 3, 12) identify ‘positive changes in the small to medium range for child problem behaviour, parent well-being and parenting skills’ but the absence of an RCT and some difficulties in gathering data. Therefore, non-imported programmes ‘might be equally effective’ (Churchill and Clarke 2010, p. 49). Lewis (2011, p. 107) notes that courses are generally provided or commissioned by local authorities (as opposed to primary care settings in Australia) and expensive to provide. The Early Intervention Foundation (2016, pp. 106–109) provides qualified support for specific versions, while noting uncertainty about cost.

Incredible Years began in the US (developed by Professor Carolyn Webster-Stratton) as a training programme ‘for families with severely behaviourally disordered children’ (0–8 years). It uses a written curriculum, media, and short workshops to ‘teach parents how to manage difficult behavior’ (Waldfogel and Washbrook 2011, p. 7), train teachers to develop effective classroom techniques, and/or treat ‘clinic-referred’ children or whole classrooms (Bywater and Sharples 2012, p. 397). Overall, it has a ‘strong evidence base’ (2011, p. 7; Lindsay et al. 2011, p. 3). Incredible Years is the only relevant intervention to have received favourable evidence from UK RCTs (three in England and Wales), and is supported by NICE as part of the Improving Access to Psychological Therapies programme in NHS England (EIF 2016, pp. 103–104).

Notably, one specific application—Incredible Years Preschool BASIC (target-indicated)—has the highest (4 +) rating (EIF 2016, p. 97), ‘meaning that it has evidence from over three RCTs demonstrating short-term improvements in children’s behaviour’ (one study highlights benefits over 10 years). It also receives a low-medium cost rating. Designed for ages 3–6, it involves one 2-h session per week for 20 weeks, in which two trained practitioners use media and group work to encourage parents to ‘learn strategies for interacting and communicating positively with their child, promoting optimal social and emotional development and discouraging unwanted child behaviour’ (2016, p. 103). There is scope for additional phone, email, or home visit follow-ups (combined with at-home exercises) in individual cases.

Most notably, the programme with the strongest evidence of short- and long-term success is target-indicated: administered to parents of children already deemed in need of treatment. There is limited evidence of success in target-selective interventions based on identifying high risk from factors such as socio-economic conditions, partly because ‘parents may not have felt so motivated to continue to work on their child’s behavior, since it was less problematic at the outset and since they were not seeking treatment when recruited’ (Scott et al. 2014, p. 655; see also EIF 2016, pp. 105–106; Statham and Smith 2010, p. 5; these problems seem to be magnified in whole population initiatives—Boffley 2016). This experience reinforces a wider dilemma in early intervention: there are few examples of taking effective projects ‘to scale’, and there are major issues around ‘fidelity’ when you scale up, including the need to oversee a major expansion in well-trained practitioners (Dodge 2009).

Discussion: what can governments do with this evidence?

This focus on the limited availability of evidence, and lack of clarity on its implications, helps us identify three stories of UK government EBPM, from most unsympathetic to most sympathetic about the government’s motives. The former suggests that the government is responsible for egregious PBE, inventing statistics to declare the success of a programme built on arbitrary and stigmatising measures of a problem, and providing no neuroscientific or evaluation evidence to justify a massive expansion of a failed programme. This conclusion often dominates academic discussions, combining normative criticisms of the stigmatising effects of an expanded ‘troubled’ or ‘problem’ families policy with empirical concerns about its effectiveness (Garrett 2007; Parr 2009: 1260). Gregg (2010) describes the expansion as a ‘classic case’ of PBE based on limited evidence of FIP success from a small sample of people from a small number of pilots. Fletcher et al. (2012), referring to a government-commissioned systematic review (Newman et al. 2007), describe the evidence for FIP effectiveness as ‘weak’. The impact of international interventions, imported to the UK, is also limited, such as the Family Nurse Partnership (FNP) which has so far produced ‘no additional short-term benefit’ (Robling et al. 2015). On that basis, Crossley and Lambert (2016) suggest that “the weight of evidence surrounding ‘family intervention’ and similar approaches, over the longue durée, actually suggests that the approach doesn’t work”.

A more sympathetic account comes from viewing developments chronologically, to focus on the initially available evidence, and consider the political context in which Westminster governments have to make and defend policy. Policymakers have to act despite uncertainty: they recognise the limits to existing data, but have to choose quickly, especially if there is never any prospect of receiving completely supportive evidence. In Westminster systems in particular, they perceive the need to make choices unequivocally and demonstrate success to protect themselves and their investment in policy. Central governments have to project an image of control and governing competence because they know that other actors will try to hold them to account in elections and debate. Programmes such as ‘troubled families’ generate intense debate on government policy and performance. So, they contain elements which emphasise ‘muscular, effective government’ (Davies 2015, p. 17), including sustained ministerial commitment and a determination to demonstrate early success.

This account suggests that the UK central government found a way to turn limited but broadly supportive evidence into a high profile commitment to major policy change built on preventive spending and early intervention. Early intervention is a heuristic: policymakers think that it is important to their values, they generally receive good feedback on this approach, there is some evidence to support a causal link between childhood neglect or trauma on poor life chances and, for example, the use of vivid neuroscience may represent the best way to sell policy change. These ends (good preventive policy) justify the means (defending policy with problematic evidence), using the rhetorical value of neuroscience and political crises to encourage rapid policy change.

This national agenda provides ‘cover’ for local action. Local authorities retain the discretion to commission the most ‘evidence-based’ interventions, and practitioners modify FIPs according to professional values, under the cover of a high priority programme in which artificial short-term evaluation aids long-term support. The main effect of the policy is to invite local authorities and their partners to fund or deliver programmes designed to intervene as early as possible in people’s lives to improve their life chances. It is possible for this process—of setting national direction but encouraging local use of well-regarded projects—to become more ‘evidence-based’ than it first appears, since public bodies may choose programmes with reference to promising evaluations.

Viewed in this way, local governments have a suite of options which demonstrate their effectiveness in different ways. FIP evidence is built on principles of ‘good practice’ to solve individual problems such as risk to housing tenure: having a specific worker assigned to specific families, providing ‘hands on’ support to intervene directly in their choices, challenging their views about their problems, considering the effect of each family member on the other, and coordinating multi-agency partnership working (DCLG 2012, pp. 6–8). They are evaluated using case reports and surveys and interviews with support workers and parents, aided by counterfactuals (this programme is expensive but crucial, and the consequences of non-intervention would have been more expensive).

Programmes such as the FNP, Triple P, and Incredible Years are justified and evaluated in different ways—with reference to RCTs—but with similar levels of uncertainty about their effectiveness in local areas, or in ‘scaled up’ populations, across the UK. In fact the EIF (2016, p. 11) states that it is, ‘inappropriate to draw strong conclusions about which programmes will work or will not work when each programme only has a small number of evaluations and few have very rigorous or long-term evaluation across multiple sites’. Further, the intervention with the most impressive evidence (Incredible Years) seems only to be effective during a tertiary (‘last resort’) stage of prevention, rather than in primary/universal interventions or a secondary process of identifying high risk.

Therefore, in each case, the evidence does not act as a substitute for choice. Central government appears content to set a national framework and give some choice to local public bodies, backed increasingly by advice from the (partly government funded) EIF which maintains a database of evidence-based programmes and a star-rating system measuring effectiveness and cost. In practice, this approach limits central direction and there appears to be no central record of local choices (EIF 2016, p. 20). Short-term top-down policymaking becomes local choice aided by the EIF.

Consequently, there is also a third story which combines elements of the other two. Initially, scholars raise major concerns about the nature and tone of government policy: it is an agenda designed to punish vulnerable populations, not provide cover for supportive policies. Subsequently, they describe a tendency for policy to change as is implemented, such as when mediated by local authority choices and social workers maintaining a commitment to their professional values when delivering policy (Featherstone et al. 2013, p. 7; Morris and Featherstone 2010; Hayden and Jenkins 2013, p. 468; Butler 2014, p. 420).

Conclusion

PBE may be a dramatic political slogan, but it does not provide a useful empirical description of the many choices that combine to produce ‘policy’. As an umbrella term, it conflates a range of practices, from a cynical process to decide first and justify later, to the routine desire to make choices based on principles, values, and judgement despite high levels of scientific uncertainty. The alternative is to use policy studies to interpret the process partly from the perspective of policymakers facing their own ‘bounded rationality’, major limits to their influence in complex policymaking systems, and their need to justify policies only partly with reference to evidence.

This perspective allows us to categorise more effectively the use of evidence in policy, and consider the motives of policymakers. In ‘families policy’, we can see three main examples: the use of highly qualified positive evidence to justify the massive expansion of family intervention projects; the problematic development of indicators of ‘troubled families’ and policy success; and the misleading use of neuroscientific evidence to justify early intervention. From a critical outsider’s perspective, ministers decided what they wanted to do without supportive evidence to back up their policy, then developed ridiculously misleading, and stigmatising, measures of the problem and success of the solution. This strategy produced policy failure partly because it was not supported by good evidence. From an elected policymaker’s perspective, they began with evidence that was promising but difficult to understand, and never likely to provide a ‘magic bullet’. Then, they made a judgement on policy expansion, and used the tools of government necessary to sell and defend policy in high stakes Westminster systems. This strategy produced the potential for long-term success.

In such debates, it is difficult to separate ideological from empirical evaluations. Two actors can agree completely on the evidence base—on the size of the problem and effectiveness of existing solutions—but disagree completely on its implications for future policy, which involves deciding how we should describe and treat people, how much money we should spend on particular programmes, and the likely impact and value for money of those programmes. In such cases, focusing solely on the extent to which policy is ‘evidence-based’, or identifying PBE too broadly, can downplay the importance of the politics of evidence-based policymaking. No policy can, or should, be based entirely on evidence, and we need less binary categories and standards to reflect that point.

These categories help us identify the many ways in which policymakers think about the role of evidence in political systems, in which EBPM is one of the many possible measures of good policymaking. Policymakers try to combine evidence and values to create a narrative of policy change that they can sell to the public. To promote the greater use of evidence, we need to understand how governments create policy narratives, and show policymakers how evidence helps build their case. Otherwise, scholars will remain outsiders, able to convince a like-minded academic audience but peripheral to policy debate. This outcome may suit critical scholars, who often see their role as in fundamental opposition to government policy, and can use PBE primarily as a political slogan. It does not help the advocates of the use of evidence to improve policy. For evidence advocates, to naively call for EBPM and decry PBE is to miss the chance to influence policy debate.