Theory of Programme Evaluation, China
KeywordsEvaluation Design Effective Evaluation Program Theory Citizen Participation Mixed Method Design
Evaluation: A research, assessment, and appraisal of a program in order to identify the degree and efficiency of the implementation and outcome of the program. The research finding can usually help people to draw lessons from the program and to make better decision and judgment about future programming.
A type of policy research, designed to help people make wise choices about future programming. Evaluation does not aim to replace decision makers’ experience and judgment, but rather offers systematic evidence that informs experience and judgment. Evaluation strives for impartiality and fairness. At its best, it strives to represent the range of perspectives of those who have a stake in the program. (Cited in Clarke 1999: 2–3)
An effective evaluation will be identified as the product of four main dimensions of activity: methodology, evaluation design, evaluation in the field, and evaluation report. As methodology is considered as the most influential factor in evaluation, an overview of the three approaches (scientific approach, narration, and participatory approach) will be provided in the first part of this chapter. However, none of these approaches can be used to conduct an effective evaluation independently. It will be argued that mixed method evaluations are the best way to conduct an effective evaluation.
This will be organized into four parts: the first part will begin with comparisons of the three main methods of evaluation and move to explain why mixed method design is the best approach of evaluation; the second part will focus on the other key integrates of effective evaluation, the evaluation design, implementation, and report; the third part will concentrate on the application of evaluation in China, especially on the public governance and public management evaluation; and finally, it will draw a conclusion on the key ingredients of effective evaluation.
An Overview of Competing Methods
The Scientific Approach
This approach can yield representative and broadly generalizable information using standardized methods from the data collected. Thus, by applying those systematic statistical analyses, it will invest the interpretation and the findings of evaluations with a higher level of credibility and confidence (Denscombe 1998: 204–205). Furthermore, it is replicable and can be analyzed using sophisticated statistical techniques especially with the aid of computer software. Therefore, the findings are more reliable and relevant to the decision maker or stakeholders due to objective evidence which is based on numbers, percentages, and graphs (Denscombe 1998: 205).
However, quantitative analysis is not always as scientifically objective as it might seem to be theoretically (Denscombe 1998: 205). It seeks to control the context by using random assignment and multivariable analyses which tend to ignore small deviance. The techniques applied such as postal questionnaire surveys and laboratory experiments do not even require any contact with the people (Bryman 1996: 101). Quantitative researchers believe that some of the data collected may not be accurate or valid but choose to ignore the impact on the evaluation. Besides, the credibility of findings for the preferential evaluation designed for specific stakeholders is questionable. Hence, quantitative researchers only emphasize the quantity of data rather than quality of an evaluation (Bryman 1996: 94).
The Narration Approach
As a contrast to the scientific approach, this approach will provide meaningful and in-depth information to evaluators, which is used it to interpret the finding from the clues obtained (Evans 2005: 10). Another merit of this approach is that it will reflect the social reality by classifying and analyzing the descriptive data (Denscombe 1998: 220). Besides, qualitative researchers emphasize the importance of understanding the context, and by studying the deviant cases, it will provide an important insight into the subject (Bryman 1996: 94–96). Furthermore, they often reject the idea of applying theory as a precursor to an investigation which may not reflect subjects’ views and argue that there is no objective social reality. As a consequence, they are more concerned with the discovery of theory rather than the verification of theory (Bryman 1996: 97).
Nevertheless, qualitative analysis is often critiqued as unrepresentative and atypical due to it subjective findings which are constructed in line with the beliefs of the researcher (Marsh and Stoker 2002: 202). Thus, the insider standpoint will result in a situation in which “the researcher loses the awareness of being a researcher and is seduced by the participants perspective” (Bryman 1996: 96). Qualitative research will provide results which are neither replicable nor comparable (Marsh and Stoker 2002: 204). Furthermore, the meaning of the data may be lost or transformed by coding and categorizing during the analysis process (Denscombe 1998: 222). It also requires good staff skills and considerable supervision to yield trustworthy data whichever qualitative methods are applied. Hence, it may be even more time consuming on qualitative evaluation and any possible further exploration of evaluation and not even to mention the acknowledgement of social complexity (Denscombe 1998: 222).
An evaluation need express not only the opinions of the decision-makers or funding body but also the other parties involved in the program. The emergence of the participatory approach has met the needs of finding out the views of stakeholder or expert knowledge to help decision-making (Gregory 2000: 180). The most commonly used models are stakeholder evaluation and participatory evaluation (Robson 2000: 18–22).
In the stakeholder evaluation model, the evaluators are the principal investigators, while the practitioners only act as consultants. Large numbers of stakeholders have an active role in “shaping and focusing the evaluation so that it links in with their interests and concerns” (Robson 2000: 18). However, the criticism points out that this model is ideal because it is impossible for everyone with a stake to take part in every stage of the evaluation process. If only key stakeholders can participate some stages, it undermines the participants’ ability in planning and execution of the evaluation (Gregory 2000: 183).
In the participatory evaluation, while the evaluator still plays the major evaluative role, a small number of practitioners engage in the evaluation process as assistants to the evaluators. The connection between evaluators and practitioners is a joint responsibility that can help to maintain the rigorousness of evaluation after the program management or staffs take part in it. However, the first difficulty of this model is how to avoid the impact of power relations in the participation of the evaluation process. The second is that the practitioners may feel greater “inhibitions” to their colleagues if the data and findings are negative (Robson 2000: 19–20).
Mixed Method Evaluation
Why Mixed Method Evaluation Is the Best Methodology?
By using different sources and methods at various points in the evaluation process, the evaluation team can build on the strength of each type of data collection and minimize the weaknesses of any single approach. A multi-method approach to evaluation can increase both the validity and reliability of evaluation data.
In short, a mixed method design can not only address “all aspects of the research question” but also increase the validity of research by covering the drawbacks of each other (Marsh and Stoker 2002: 237).
How to Combine Methods
First of all, methodological triangulation is considered as an advantage approach of the mixed method evaluation. It “combines two or more different research method in the study of the same empirical issue” (Marsh and Stoker 2002: 237). The aim of this approach is seeking to acquire reliable and accurate information by using multiple sources of information (Pierce 2004: 69).
Creswell also identified three basic models to integrate different methods. The first model is a two-phase design approach. This approach separates the research into a qualitative phase and quantitative phase in which the evaluators can “operate within the appropriate epistemological paradigm” (Marsh and Stoker 2002: 239). If the quantitative phase comes first, the findings of the survey can inform the context of the qualitative study. On the contrary, if the study starts from a qualitative analysis, the following quantitative phases will enhance the “the validity of the concepts, hypotheses and questions” (Marsh and Stoker 2002: 239–240).
The second is a dominant/less dominant model in which one method will dominate the overall study, while the other one just provides a supplement to the study. “The advantage of this method is that it retains a single consistent paradigm, but allows other data to be collected from a smaller or larger population, depending on which methodology is dominant” (Marsh and Stoker 2002: 240).
The third one is the mixed-methodology model that can be used at any stage of the evaluation process. This is the most popular approach because it “allows the researcher to take advantage of each research methodology” (Marsh and Stoker 2002: 240). Moreover, it uses both inductive and deductive approaches to offer a better reflection of research practice because there are no constraints of the sequence of the methods (Marsh and Stoker 2002: 240).
Problems of the Mixed Method Approach
Although there is an increasing use of the mixed method approach in evaluation, there are some problems that need to be considered in practice. Firstly, the mixed-methodology design cannot avoid the problem that the evaluators’ ontological and epistemological positions have an impact on methodology. Secondly, this approach has been questioned because different evaluators are likely to analyze the data in their own ways and to “make different claims about their results” (Marsh and Stoker 2002: 241). Thirdly, it requires considering that the evaluators need varied skills to deal with those approaches and how to allocate limited resources to cover different methods (Marsh and Stoker 2002: 242).
There are three key components in evaluation design: evaluability assessment, evaluation questions, methods, and sampling strategy. At the designing stage, the goals and program theory will determine the design of the evaluation questions. After that, the evaluators will select data-collection approaches and develop a sampling strategy according to the questions (Robson 2000: 79–80).
Evaluability assessment is “a pre-evaluation appraisal of whether a program’s performance can be evaluated” (Rossi and Freeman 1999: 187). The purpose of this assessment is to find out whether the program theory is well defined, or if not, what changes can be made to ensure the evaluation will attain its goals. It involves reassessing the goals and objectives of program, performance criteria, “stakeholders,” and side effects (Rossi and Freeman 1999: 187; Shaw 1999: 124).
Clarifying the Goals and Objectives and Program Theory
As Sabstier observes, ambiguous and inconsistent objectives will lead to the failure of policies or programs (Marsh and Rhodes 1992: 9). It is important to clarify the objectives of the program at the outset to provide guidance to the implementation and evaluation. Moreover, the assessment of the program theory is also important because it is difficult for a program to attain its goals if the program theory is weak or faulty (Rossi and Freeman 1999: 187).
Identifying Stakeholder Interest in Evaluation Findings
Stakeholders can be individuals or groups of people that are “typically involved in, affected by, or interested in the evaluation” (Rossi and Freeman 1999: 55). Nine categories of stakeholders are identified by P. Rossi and H. Freeman: the policy-maker and decision-makers, program sponsors, evaluation sponsors, target participants, program managers, program staff, program competitors, contextual stakeholders, and evaluation and research community (Rossi and Freeman 1999: 55). Each group of people has their own interests in the program. The purpose of identifying stakeholders is to design different questions and select various evaluation research methods for a particular target (Hanberger 2001: 52).
Formulating Evaluation Questions
The criteria for good evaluation questions are, can they “identify clear, observable dimensions of program performance that are relevant to the program’s goals and represent domains in which the program can realistically be expected to have accomplishments” (Rossi and Freeman 1999: 116). The evaluation questions usually cover the following dimensions: the program design, implementation, management and enhancement, staff training and support, impact of policy output, etc. (Evans 2005: 9). In practice, the evaluators will set priorities among a set of candidate evaluation questions according to the concern of various stakeholders (Rossi and Freeman 1999: 117).
Selecting Data-Collecting Methods
As mentioned above, mixed method evaluations combine the advantages of both quantitative and qualitative methods and can “yield richer, more valid and more reliable findings” (Sharp and Frechtling 1997: 3–7). There is no fixed formula for the mixed method design; however, some consideration should be taken in selecting methods. Firstly, it is important to develop sampling strategies which can ensure the sample is representative and the data collected are relevant to the evaluation questions. It can increase the accuracy by enlarging the sample size; however, this takes time and increases the costs. Secondly, costs are “the various inputs required to set up and run a program which determines the selection of methods, scale of evaluation and the quality of data” (Robson 2000: 136). Thirdly, the general rule of timing is to collect data before “an innovation is first introduced and after it has been in operation for a sizable period of time” (Sharp and Frechtling 1997: 3–8).
Evaluation in the Field
Evaluation in the field is a process of collecting data and information. In order to get trustworthy answers to the evaluation questions, more attention should be paid to the quality of data. As the policies or programs may evolve in the implementation stage, many new aspects of the evaluation design will emerge during the fieldwork. So a flexible design will allow more room for modification of the questions during the evaluation (Robson 2000: 102–103).
Scriven has identified two categories of evaluation: formative evaluation and summative evaluation (Robson 2000: 50). The formative evaluations are usually conducted during the implementation of a program, whereas the summative evaluations are held after the program has been achieved. In the evaluation framework designed by A. Hanberger (2001: 49), different guidelines have been developed for these two types of evaluations.
What line of action is followed in practice?
How does the implementing organization work in practice?
Is enough competence integrated?
Are resources used effectively and in the right way?
Do unexpected problems occur?
(Hanberger 2001: 49)
It can be seen from the above questions that a process evaluation generally provides the stakeholders with the information about whether the program is “delivered as intended to the targeted recipients” (Rossi and Freeman 1999: 231). This information involves the program operations, service delivery, resources’ allocation, and unintended problems (Rossi and Freeman 1999: 231). Furthermore, process evaluation is precondition of conducting an impact evaluation. It provides information about the ways programs are implemented which can be integrated with findings of an impact evaluation (Rossi and Freeman 1999: 199).
To what extent are the intended goals reached?
Are there any unexpected results?
What are the effects?
Who benefits from the policy?
(Hanberger 2001: 49)
The focuses of an outcome evaluation are on whether the program achieves the intended goals and the implications of the policy (Hanberger 2001: 50). The implications of the policy could address either the positive or negative side of the outcome: the positive side can be the benefits and effects, whereas the negative side involves the unexpected results.
Finally, two issues are considered as essential at this stage. One is that good communication skills are required for every evaluator to obtain good answers for the evaluation questions. The other one is that the evaluators should be rational in analyzing data and generating the findings.
Reporting the Results of Evaluation
The final stage of evaluations is to report and communicate the findings. An evaluation report will draw the conclusions of what the evaluators have done during the whole processes, what experience or lessons they have learned, and “how others might benefit from this projects experience” (Sharp and Frechtling 1997: 7-1).
In order to write a comprehensive report, the following issues need to be considered. Before writing the report, it is important to identify the interests and needs of different audiences. The audiences can be the sponsors, stakeholders, program management and staff, etc. The evaluators can prepare various forms of report (executive summary, detailed report, or briefings tailored to special stakeholders) to satisfy different interests and needs (Sharp and Frechtling 1997: 7-2).
Additionally, the organization of an evaluation report usually consists of five parts. It will provide the background stating the goals and objectives of the program from the outset. This is followed by a description of the evaluation design with the evaluation questions and the methodology used to collect and analyze data. The findings section is the most important part which “should provide a concise context for understanding the conditions in which results were obtained and identifying specific factors that affected the results” (Sharp and Frechtling 1997: 7-3). The final part will draw a conclusion of the findings, strengths, and weaknesses of the program and give recommendations for improvements (Sharp and Frechtling 1997: 7-4).
The Application of the Theory of Evaluation in China
The policy learning of public governance evaluation by the Chinese Government and pubic organizations from the experience of the developed countries or international organization, such as United Nation, Word Bank, WTO, and etc., has started from the end of the twentieth century (Bao and Zhou 2009: P12). After nearly 20 years of development, government performance evaluation has been widely adopted by all levels of governments, central or local, as one of the most important tools to assess public management performance and efficiency (Wang and Lan 2012: P43).
The systematically study of the public governance evaluation theory and criterion can be concluded into four main streams. Firstly, an evaluation system of public governance has developed by K. YU with 15 criterion, such as legislation, citizen participation in politics, political transparency, civil rights, party and government supervision, intra-party democracy and multiparty cooperation, etc. Secondly, ten evaluation criteria were suggested by Z. HE from the perspective of good governance, which included equity, legitimacy, legislation, responsiveness, etc. (Bao and Zhou 2009: P12). Thirdly, Chu (2008: P37) identifies that public governance assessments are examined by three core indicators: deployment of power, citizen participation, and degree of civil satisfaction. It is argued that “the deployment of power involves arrangement and balance of powers, citizen participation covers citizens’ engagement in social affairs and public policies, while the degree of civil satisfaction includes the public’s attitude to both the behaviors and the results of government” (Chu 2008: P37). Finally, Bao and Zhou (2009: P12) provide an evaluation system to guide the practice of public governance evaluation with seven criteria, and each criterion has detailed guideline of assessment. The seven criteria are equity, legislation, sustainability, participation, transparency, accountability, and efficiency (Bao and Zhou 2009: P12).
As mentioned above, the evaluators are required to be rational and have good communication skills in the fieldwork. In China, the database of CNKI (China National Knowledge Infrastructure) 2005–2011 shows that 96.8% of the governance evaluation were conducted by the evaluators from the higher educational institutions, such as the scholars in the universities and colleges, whereas the research departments within all levels governments only accounted for 0.45% of government performance assessments studies (Wang and Lan 2012: P41). The third-party evaluation institutions, non-profit organization such as University of Lanzhou and commercial research institute such as Horizon Research Consultancy Group, are also introduced to be the leading participants in the public governance evaluations (Bao and Zhou 2009: P13).
The study of CNKI database 2005–2011 also identified the sponsors and research funding of public governance evaluation. The statistics shows that almost 85% of evaluation studies within the public sectors are volunteer research, and only 10% of the research funding are from the national level, for example, the National Social Science Fund, NSFC (Natural Science Foundation of China), and the China Postdoctoral Science Foundation (Wang and Lan 2012: P41).
To sum up, the awareness of the importance of public governance evaluation leads to gradually increasing researches getting involved in these areas. It is argued that the theoretical study, local research, and research fund of governance evaluation in China are comparatively inadequate (Wang and Lan 2012: P43). It is also suggested that intensive attention would be paid to the implementation of public governance evaluation, which are the cultural differences between China and the West, the differences in foundation of governance, and regional development differences, in the further studies (Bao and Zhou 2009: P11).
This chapter has examined four phases of the evaluation process – methodology, the evaluation design, fieldwork, and communicating the report – to identify the key ingredients consisting of an effective evaluation at each stage.
As the methodology is considered as the essential component of a successful evaluation, the first section of this chapter focuses on the selection of the evaluation approaches. By comparing the strengths and weaknesses of scientific, narration, and participatory approaches, it draws a conclusion that a mixed method evaluation which combines the strength of each single approach is the best approach in evaluation. There are three forms of the mixed method approach: two-phase design, dominant/less dominant model, and the mixed-methodology model. However, this evaluation requires that the evaluators have various skills and avoid the impact of personal ontological and epistemological positions on the evaluation results.
Apart from the methodology, three other main issues have also been identified at evaluation design stage. The evaluability assessment includes clarifying the goals and objectives of the program, reassessing program theory and performance criteria, and identifying stakeholders’ interest and side effects. The evaluation questions need to be clear and relevant to the program goals, cover as many aspects as possible, and be flexible to allow the policy room to evolve. Mixed data-collection methods will be selected according to the types of question and the considerations of sampling strategies, costs, and timing.
The implementation evaluation and outcome evaluation have different goals and emphases in the fieldwork. The former focuses on the program operations, service delivery, resource allocation, and unintended problems, whereas the latter concentrate on whether the program achieves the intended goals and whether there are any unintended results. The evaluators are required to be rational and have good communication skills. After that, the final task of evaluation is reporting the results of evaluation. The report will summarize the findings, implications, and lessons of the program. Apart from considering the preferences of various audiences, an evaluation report is better to presented in the standard format.
Finally, the application of the theory of evaluation in Chinese public governance has been explored though the four categories of evaluation criterion, the evaluators from higher educational institution and the third-party evaluation organization, and the recourses of research funding.
In conclusion, an effective evaluation was determined by three dimensions of these activities: evaluation design, fieldwork, and reporting the findings. The selection of evaluation methodologies is the core of the evaluation process, which has significant impact on the validity and reliability of the data collected and the findings.
- Bao, Zhou (2009) Several problems of public governance evaluation in China. Chin Publ Adm 284(2):11–13Google Scholar
- Bryman A (1996) Quantity and quality in social research. Routledge, LondonGoogle Scholar
- Chu (2008) Core elements for the assessment of public governance in China. Chin Publ Adm 279(9):37Google Scholar
- Denscombe M (1998) The good research guide for small-scale social research projects. Open University Press, BuckinghamGoogle Scholar
- Evans M (2005) Lecture notes of evaluation. University of York, York, UKGoogle Scholar
- Marsh D, Rhodes RAW (1992) Implementing Thatcherite policies: audit of an era. Open University Press, BuckinghamGoogle Scholar
- Marsh D, Stoker G (2002) Theory and methods in political Science, 2nd edn. Palgrave MacMillan, New YorkGoogle Scholar
- Pierce R (2004) Handout of graduate research methods. University of York, York, UKGoogle Scholar
- Rossi P, Freeman H (1999) Evaluation: a systematic approach, 6th edn. Sage, Thousand Oaks, CAGoogle Scholar
- Sharp L, Frechtling J (1997) User-friendly handbook for mixed method evaluations. Available on-line at: http://www.ehr.nsf.gov/EHR/REC/pubs/NSF97-153/start.htm
- Wang, Lan (2012) Status and prospect of government performance evaluation research of China – an analysis based on CNKI 2005–2011 academic document database. J Inn Mong Normal Univ (Philos Soc Sci) 41(4):41–43Google Scholar