There is a general agreement in the specialized literature on the need to design and conduct multi-strategy evaluation in health promotion and in social sciences. “Many community-based health interventions include a complex mixture of many disciplines, varying degrees of measurement difficulty and dynamically changing settings … understanding multivariate fields of action may require a mixture of complex methodologies and considerable time to unravel any causal relationship” (McQueen & Anderson, 2001, p. 77). The meaning of the term multi-strategy, however, varies greatly. For some, multi strategy corresponds to the use of multiple methods and information data that allow for the participative evaluation of multiple dimensions, like outcome, process, and social and political context (Carvalho, Bodstein, Hartz, & Matida, 2004; Pan American Health Organisation, 2003). For others, the support for using multiple methods and strategies is rooted in wills to deploy multi-paradigm designs (Goodstadt et al., 2001). More generally, however, the term refers to studies mixing qualitative and quantitative methods of enquiry (Gendron, 2001; Green & Caracelle, 1997). Exceptionally, in the evaluation literature, multi-strategy also refers to the possibility to mix all kinds of evaluation approaches or models from diverse categories, such as advocacy, responsive, and theory-driven evaluation (Yin, 1994; Datta, 1997a,b; Stufflebeam, 2001). In all these references, the use of multi-strategy evaluation is justified as the best approach to minimize validity problems in dealing with the complexity of multi-strategy interventions and in multi-centers evaluation research.

Unfortunately, in examining research synthesis studies it is often impossible to estimate the real utilization and the effective contribution of multi-strategy evaluation, despite the fact that such multi-strategy evaluation is largely recommended to improve knowledge resulting from health promotion intervention evaluations. Meta-analysis and other research synthesis methods are based on a very limited classification system of evaluation study design that consists in whether a Randomized Control Trial (RCT) has been used (Hulscher, Wensing, Grol, Weijden, & Weel, 1999; International Union for Health Promotion and Education, 1999). This impedes the capacity to judge the appropriateness of evaluation approaches, in particular for the multi-strategy interventions characterizing complex community-based actions.

Considering the additional difficulties associated with conceptual definitions of health promotion in community settings (Boutilier, Rajkumar, Poland, Tobin, & Badgley, 2001; Potvin & Richard, 2001) and the absence of a standardized typology for multi-strategy evaluations and their implications to research validity and practical utility, this chapter explores the approaches and multi-strategy models implemented by evaluators in health promotion. To do so, we carried out a systematic review of scientific articles reporting on community health promotion evaluation conducted in countries from any of the three Americas, between 2000 and 2005, and available through electronic databases until May 2005. We were further interested in assessing the quality of these evaluation studies using quality indicators derived from international standards of meta-evaluation adequacy and from health promotion principles and values.

Two questions guided our work: (1) What are the characteristics of health promotion intervention evaluation studies? and (2) To what extent do these studies conform to common and specific evaluation standards? The need for using specific standards comes from the fact that, in order to convincingly demonstrate both expected and unintended effects, evaluation must use methodological approaches that are congruent with the principles and values of complex community health promotion interventions.

Methods

The Meta-Evaluation Approach

Meta-evaluation, in an informal sense, has been around for as long as someone has recognized that evaluators are professionals and, like in other professional practices, the quality of their products must be assessed. Cooksy and Caracelli (2005) have underlined that meta-evaluations conducted on a set of studies are useful to identify strengths and weaknesses in evaluation practice. It serves the general goal of capacity development in the field of evaluation.

In short, meta-evaluation is the systematic evaluation of an evaluation study, mainly based on four categories of evaluation standards that have reached consensual agreement from the American Evaluation Associations (AEA) for the evaluation of social programs (Stufflebeam, 2001, 2004; Yarbrough, Shulha, & Caruthers, 2004), public health interventions (Centers for Diseases Control, 1999; Hartz, 2003; Moreira & Natal, 2006), and also for Community Programs (Baker, Davis, Gallerani, Sanchez, & Viadro, 2000). These four categories are defined as follows and the complete list of standards that were used in this study for each category is provided in Appendix 1.

The first category is labelled utility standards. It is composed of criteria concerned with whether the evaluation is useful. Together these criteria answer questions directly relevant to users. Three standards from this category were selected for this study. The second category is composed of feasibility standards that assess whether the evaluation makes sense. The single criterion selected from this category assesses whether interests from various relevant groups were taken into account in evaluation design. The third category is made of propriety standards and concern evaluation’s ethic. The three criteria selected assess whether evaluation was conducted in respect of the rights and interests of those involved in the intervention. The fourth category is composed of accuracy standards. The ten criteria selected relate to whether the evaluation conveyed technically adequate information regarding the determining features of merit of the evaluated program.

In addition to those four categories of standards and as an answer to concerns regarding international applications, the notion of open standards is now being developed to face the difficulties associated with transferring standard categories into different cultures and contexts (Love & Russon, 2004). According to Stufflebeam (2001), the main challenge in a meta-evaluation is one of balancing merit and worth in answering how the evaluation studies analyzed meet the requirements for a quality evaluation (merit) while fulfilling the audience’s needs for evaluative information (worth). Despite the fact that these standards are recognized by evaluators’ professional associations, these associations also recognize that standards are not recipes. They are useful starting points to develop trade-offs and adaptations for specific situations faced by meta-evaluators (Whorthen, Sanders, & Fitzpatrick, 1997).

Another category of open standards was defined for this meta-evaluation study. This category, called specificity standards, assesses whether the evaluation was theorized in accordance with community-based health promotion principles. Indeed, the complex nature of health promotion community interventions requires innovative and complex evaluative approaches, using a variety of methods that are coherent and consistent with initiatives that target changes at various levels. In addition, evaluation studies should be valid and allow the identification of theories and mechanisms by which actions and programs lead to changes in specific social contexts (Fawcett et al., 2001; Goodstadt et al., 2001; Goldberg, 2005). For this exploratory meta-evaluation, we adopted specific standards and criteria of a quality evaluation that follow three fundamental community-based health promotion principles: community capacity-building and accountability; disclosed theory or mechanisms of change; and multi-strategy evaluation. Multi-strategy evaluation was defined as an evaluation which combines quantitative/qualitative analyses and makes appropriate links between theory and methods, and process and outcome measures.

Based on these criteria, we assessed and gave a score to selected articles reporting on evaluation of community-based interventions. This scoring was performed in anonymous meta-evaluation format, in the same spirit than that of professional evaluators societies, i.e., to enhance the quality and credibility of knowledge resulting from evaluation studies (Stufflebeam, 2001).

Data Collection and Analysis

The first step was to select the articles to be included in the meta-evaluation. A systematic review of community-based health promotion program evaluation available in major data bases, such as CINHAL (Cumulative Index to Nursing & Allied Health Literature) and the Virtual Health Library of the Pan-American Health Organization registry, was undertaken. This registry was chosen for its ability to house English, French, Spanish, and Portuguese studies conducted throughout the Americas by taping into prominent scientific databases in the field of health promotion. These databases include Lilacs (Latin American and Caribbean Health Sciences), SCIELO (Scientific Electronic Library Online), and Medline (International Database for Medical Literature).

Three search terms were used to identify eligible references, namely: health promotion, program evaluation, and community. Our initial search lead to the identification of 58 references from the Lilacs-SCIELO (L&S), and 120 references from Medline and CINHAL (M&C), that moved on to the second round of analyses, where abstracts were reviewed for their adherence to the specified definition of community interventions in health promotion. Differences in Medline’s default search settings lead to slight modifications to our search specification, while restriction possibilities lead to fairly large differences to search results. Medline’s default search settings required the specification of residence characteristics attributed to community as a search term, and allowed both full text documents and evaluation studies to be used as search restrictions. The former restriction lead to the identification of 53 studies, which excluded systematic reviews of the literature, commentaries, books, and editorials, while the latter lead to the identification of 23 studies that were rated as evaluation studies by authors.

In a second step, 29% (17/58) of the abstracts referenced in L&S and 23% (28/120) of those in M&C were selected according to a broad definition of “community health promotion interventions”. The definition we used was based on Potvin & Richard (2001) and on Hills, Carrol, and O’Neill (2004), who restrict the term community interventions to interventions that use complex multiple strategies, focus on various targets of changes (individuals and environment changes), and engage communities with a minimum level of participation. Such interventions are generally characterized as community development, community mobilization, community-based intervention, and community-driven initiatives (Boutilier et al., 2001). The third and final step of article selection was based on the agreement of two reviewers who have read the complete texts. In the case of disagreement, a third reviewer was called. In this final stage, we selected articles that were designed to answer at least one evaluative question regarding the program under study, based on the Potvin, Haddad, and Frolich (2001, p. 51) five-category classification of evaluation questions. These are: (1) Relevance questions: How relevant are the program’s objectives to the target of change? (2) Coherence questions: How coherent with the theory of problem is the theory of treatment linking the program’s activities? (3) Responsiveness questions: How responsive is the program to changes in implementation and environmental conditions? (4) Achievement questions: What do the program’s activities and services achieve? (5) Results: To which changes are the program’s activities and services associated?

All 27 articles selected and listed in Appendix 2 (among which 19 are from North America) were read by two independent coders. Four dimensions adapted from Goodstadt et al. (2001, p. 530) were used to describe the program that was evaluated. These were: (1) the intervention goals (improve health and well-being, reduce mortality and morbidity, or both); (2) the level of the targeted changes as stated in the intervention objectives (enhance individual capacity, enhance community capacity, or develop supportive institutional and social environment); (3) the health promotion strategies used (health education, health communication, organizational development, policy development, intersectoral collaboration, or research); and (4) the main reported results. According to Goodstadt et al. (2001), model, health promotion actions should have goals that extend beyond reducing and preventing ill health to include improving health and well-being, focusing on different levels and determinants of health and adopting strategic and operational activities to reach objectives in the areas given priority by the Ottawa Charter.

Three dimensions have been coded to characterize the evaluation approaches used in evaluation studies. The first dimension relates to the question that guided the evaluation study (relevance, coherence, responsiveness, achievements, or results). The second dimension assesses the main focus of the evaluation (process, outcome, or both). The third dimension concerns the methods used (qualitative, quantitative, or mixte).

Finally, each evaluation study was rated using the four American Evaluation Association’s standards listed in Appendix 1 and the five criteria of the specificity standards designed for this study. Because many of the information required to assess the criteria of the American Evaluation Association standards were only available in original reports or in evaluability assessment studies, each standard category was assessed globally. Each standard category and each of the five specificity criteria were given a score ranging from 0 to 10 by two independent reviewers, following Stufflebeam’s (1999) classification: Poor 0–2; Fair 3–4; Good 5–6; Very Good 7–8; and Excellent 9–10. A correlation coefficient of 0.86 between the reviewer’s scores was estimated using three randomly selected articles. All statistical analyses were performed using Epiinfo 3.3.2.

Results

Table 14.1 presents the characteristics of programs evaluated in the selected articles. Two characteristics are in line with health promotion principles. As shown in Table 14.1, only a minority of the programs targeted the reduction of mortality and morbidity as program sole objectives. Another positive result is the fact that, in addition to individual level change objectives, the great majority of programs also target middle and macro level change objectives, this in 70% and 48% of cases respectively. Concerning the health promotion strategies adopted or the activities carried out to ensure that the objectives can be achieved, health education and communication are the two most often implemented and they appear to be always associated in local practices. Interestingly as well, all programs were composed of at least two types of actions meeting the minimal requirement for being labeled multi-strategy interventions. More interestingly, 20 out of 27 programs were made up of three or more components. The presence of research activities, as part of 13/27 interventions, seems also to indicate an integration of knowledge development as an intervention strategy. Less encouraging however is the fact that only a minority of programs address issues of public policies. As for the evaluation results, not surprisingly the majority of them reported improved awareness, skills, and behaviors. Only a few reported positive effects on public policies and increased equity.

Table 14.1 Main characteristics of evaluated interventions (n=27)

Table 14.2 describes the main characteristics of the evaluation approaches implemented in the articles selected. It is interesting to note that evaluation studies seem to be covering a broad range of evaluation questions, overcoming the traditional dichotomies between process versus result evaluations, or between formative versus summative evaluation. Indeed, our results clearly illustrate the richness of using a typology of questions to characterize the evaluation focus, compared to categorizations based on the traditional dichotomy. Our results also show that the use of multi-strategy approaches to evaluation is still somewhat limited. Only 40% of the reported studies focus on a mixture of process and outcome, and 36% used a mix of quantitative and qualitative analyses. We will come back to the relevance of this dimension as a quality indicator of health promotion evaluative research in the discussion.

Table 14.2 Main characteristics of evaluation designs (n = 27)

The second issue addressed in this chapter has to do with to the extent to which the evaluations meet common and specific evaluation standards. Figure 14.1 presents ratings given to the 27 selected evaluation studies on the five meta-evaluation standard categories and on the 5 criteria that form the specificity standard category. In general, published evaluation studies are of very high quality. Not surprisingly, standards of accuracy are the most commonly met, with almost 80% of studies (21/27) classified as very good or excellent. Conversely, specificity standards, related to whether the evaluation was theorized in accordance with community-based health promotion principles, are the least often met in our sample. Only 52% (14/27) obtained a very good or excellent rating. An examination of the various dimensions of the specificity standards show that 30% (8/27) of the reported evaluations had scores lower than 5,0 (fair) related to the appropriate use of theory (S1) and to the use of multi-strategy evaluation (S3).

Fig. 14.1
figure 14_1_978-0-387-79733-5

Histogram of ratings on quality standards (n = 27)

It is also worth noting that there seems to be greater variation in quality when evaluations are assessed with standards specific to health promotion, rather than with common standards. Figure 14.2 shows that although the medians of the rating distributions are similar across standards, the range of ratings is broader for standards specific to health promotion evaluation.

Fig. 14.2
figure 14_2_978-0-387-79733-5

Median, percentiles 25 and 75 of ratings on quality standards (n = 27)

Discussion and Conclusion

Overall, these results show that, unfortunately, there is not yet an appropriate relationship between interventions complexity level and approaches to evaluation. We agree with McKinlay’s (1996) regrets about the deficiency of process evaluation: “Most of disappointing large-scale and costly community interventions reported in recent years had no process evaluation, so it is impossible to know why they failed or whether perhaps they succeeded on some other level” (p. 240). However, there are examples of studies in which some reconciliation of process and outcome evaluation supported by an appropriate theory of change for complex community initiatives has been implemented. The study by Hughes and Traynor (2000), for example, illustrates how such an approach can enable accurate reporting on program’s results when implemented in different contexts.

Overall, evaluation practice needs to be better aligned with the principles of health promotion when evaluating community health promotion interventions. The intervention’s high degree of complexity is very seldom matched by multi-methods approaches to evaluation. With all the limitations associated with an explanatory meta-evaluation of a limited number of evaluation research reports, we think that three main messages can be drawn from this work.

The first message is that a relatively simple way to improve the usefulness and relevance of evaluation research for health promotion is to examine the quality of health promotion evaluation using both common and specific meta-evaluation standards. The systematic meta-evaluation using a broad range of common criteria and of criteria specific to health promotion allows a much better assessment of the field of health promotion than using the traditional dichotomous category such as process versus outcome evaluation, or experimental versus non-experimental designs. Our broad, inclusive strategy may have biased our sample of studies toward 19/25 showing positive results, in contradiction with Merzel and D’ Afflitti’s (2003) comments apparently, which state on the modest impact of community-based programs from the past 20 years. But it is also possible that Merzel and D’ Afflitti (2003) results were an artefact of them including criteria that limited their analysis to experiment-control study design, thus restricting the expression of intervention effectiveness “precisely because … the phenomena under study do not lend themselves to an application of that methodology” (De Leeuw & Skovgaard, 2005, p. 1338).

The second message is to plea for a better alignment of health promotion with the evaluation of health promotion. If we are really serious about the principles that health promotion interventions are multi-strategy, then we should require multi-strategy evaluations. This is the condition for us to be able to demonstrate both beneficial and detrimental effects. The development and use of health promotion specific quality criteria for the meta-evaluation of health promotion evaluation have to be encouraged. Our exploratory meta-evaluation shows that there are quality deficiencies on those specific criteria and that the performance of health promotion studies is much less consistent regarding such specific criteria compared to common criteria.

The third message is the reiteration of the hypothesis that the interventions demonstrated that effectiveness is not independent from the evaluation models implemented to study it. Given that among the six evaluation studies that showed negative results, five were multiple strategy interventions evaluated with single data analysis strategy, it would be interesting to conduct a larger meta-evaluation to test the relationship between the use of multi-strategy evaluation and the conclusion of the evaluation. Conversely, it would be critical to analyze the real meaning of positive evaluation results for studies with a low rating on health promotion specificity criteria. A meta-evaluation based in a “realistic synthesis” review (Pawson, 2003), grouping different programs and contexts with a common theoretical framework and mechanisms, could also increase the ability for highlighting the role of multi-strategy evaluation for constructing the case of an effective intervention. As noted, there is “a tendency to underrate and invalidate knowledge derived from a deductive process applied to theoretical knowledge and to overrate the accumulation of empirical observations even if the empirical basis is not sufficient” (Potvin, 2005, p. S97).

These messages, however, are to be taken in the light of the inherent problems of our meta-evaluation study that limit the generalization of our observations. The first has to do with the content validity of the ratings according to the four common standard categories (utility, feasibility, propriety, and accuracy), using an overall impression instead of a series of detailed criteria. The only source of information regarding the programs that were evaluated was the published evaluation result papers, thus limiting our ability to provide nuances to quality assessment substantially. Another problem, particularly important for evaluations carried out in Latin America, was the fact that, given the time and resource restrictions, we were not able to include the grey literature, and half of the selected studies were part of graduate study thesis. The small number of published articles available also limited our possibility to contrast the patterns of evaluation in North America and South America.

We would like to conclude this chapter with some empirical remarks. At this point, we do not have meta-evaluation criteria for the evaluation of complex multi-strategy health promotion interventions. It is therefore quite difficult to assess whether multi-strategy evaluations are most capable to provide valid results while evaluating such interventions.