Academic Evaluation in Higher Education
KeywordsAnalytical Perspective Disciplinary Difference Pragmatist Perspective Fair Judgment Academic Evaluation
Academic evaluation is a social process taking place in different arenas in which values, worths, virtues, or meanings are produced, diffused, assessed, legitimated, or institutionalized with respect to academic products and their producers.
The world of academia is permeated with evaluations. Academic processes of evaluation play a central role in both the production and reception of scholarly work as well as for the status of academic entities like scholars, departments, or universities. Some of these evaluations are largely informal, taking place, for example, in small-group interactions. But there is also a wide array of evaluations in academia that are fairly formalized, such as letters of recommendation and peer reviews of journal manuscripts. Rankings of universities according to research performance are among the most standardized forms of evaluation.
Evaluation has a central place in academia because of the crucial role recognition plays in academic fields. Modern academic disciplines are fundamentally status economies. They revolve around the construction and stabilization of recognition via symbolic capital (Bourdieu 1988). Scholars produce knowledge in the pursuit of recognition from their peers, and recognition, in turn, is the basis for the construction of academic careers. Thus, as a process that ascribes worth, evaluation is also a boundary practice that negotiates, for example, disciplinary turfs, and signals which scholars and ideas are integrated into or excluded from a field (Gieryn 1983; Lamont and Molnár 2002). While the study of evaluation processes in academia has traditionally been the purview of the sociology of science (cf. Merton 1973), it is increasingly studied using analytical tools from the nascent field of the sociology of evaluation and valuation (Lamont 2012; Zuckerman 2012).
In this article, we first map out the diversity of academic evaluations, before discussing different analytical perspectives that scholars have drawn on to study evaluation processes in academia. In a fourth section, we discuss scholarship that has pointed to variation in scholarly evaluations across disciplines. Lastly, we put to changes in the social organization of academic evaluation that are the result of recent changes in the governance of academic work as well as technological changes.
Academic Evaluation: A Variety of Practices and Arenas
Academic evaluation aims at a variety of objects, it is accomplished through a multitude of practices, and it is performed by different actors. Considering this diversity, it is striking that we can identify a number of forms and arenas of evaluation that exist across communities and disciplines.
All academic communities and disciplines are affected by higher education governance regimes that try to assess and audit the output of departments and universities in terms of research performance and societal impact (Martin 2011). Although this kind of evaluation is becoming increasingly influential in many countries, it is not typically the center of attention of scholarship on academic evaluation. We will discuss some of the effects of this systematic, policy-oriented evaluation toward the end of our article. But then of course, scholars are not only evaluated from the outside. Positioning discourses across all disciplines locate and anchor scholars in knowledge-based communities as well as in bureaucratic positions in institutions (Angermuller 2013), thereby straddling different logics of academic worlds. The ascription of values and worth in academia largely operates through peer review. This is evident across a number of different institutionalized arenas of evaluation.
Among these arenas are, for example, funding panels. Not only do funding panels exist in all disciplines, often enough several disciplines are congregated in one panel. In order to evaluate proposals for fellowships and research grants (Lamont 2009), they rank submissions according to criteria of excellence, thus facing the challenge of agreeing on what criteria like “clarity,” “originality,” or “impact” actually mean (cf. Derrick and Samuel 2016). Furthermore, there are different arenas in which publications are evaluated. Before publication, editors assess manuscripts for their journals (cf. the overview by Meruane et al. 2016). Editorial judgments can be understood as a result of the intellectual milieus the editors are situated in, the impressions the editors gained by reading a manuscript, and the discussions in which they rationalize their judgments toward the editorial committee (Hirschauer 2010). After publication, editors judge articles in case of minor and major errors that need to be met with errata or retractions (Hesselmann et al. 2016), while book reviews provide a critical assessment of newly published books (Riley and Spreitzer 1970). They examine whether books contribute new knowledge to the field, thus providing an important source of orientation in the face of an ever-increasing stock of academic publications (Nicolaisen 2002).
Although funding and publications are vital resources in all communities and disciplines, evaluative practices and arenas go far beyond that. Appointments of professors, for example, are a consequential arena of academic evaluation where national traditions (Musselin 2009) influence how different academic criteria like networks and publications (Combes et al. 2008) intertwine with various non-academic criteria like gender (van den Brink and Benschop 2012). Academic obituaries are another example for a widely neglected arena of evaluation that consecrates deceased colleagues and demonstrates the customary rules according to which academic life-time achievements are narrated and assessed (Hamann 2016a; Macfarlane and Chan 2014).
Last but not least, processes of evaluation also play a crucial role in the very production of scholarly knowledge. While philosophers of science have developed varying accounts of how scientific knowledge is produced and evolves – whether describing an incremental progression toward objective knowledge (Popper 1972), a conservative authority that prevents change (Feyerabend 1975), or a mediator for interchanging stages of revolutionary and normal science (Kuhn 1962) – their theories all acknowledge that scientific inquiry is centrally dependent on the evaluation of epistemic claims. This notion of an intimate connection between evaluation and epistemology has also been confirmed and highlighted by science studies of actual scientific practices (Knorr Cetina 1999; Latour 1988).
Analytical Perspectives on Academic Evaluation
Existing scholarship has examined academic evaluation from a number of analytical perspectives. These perspectives are far from distinct and mutually exclusive, but we want to suggest five tentative strands. First, academic evaluations can be examined from a functionalist perspective, focusing on how well evaluative procedures serve their purposes. Research using this perspective examines, among other things, the validity, reliability, and fairness of judgments (Armstrong 1997; Bornmann and Daniel 2005; Reinhart 2009) and studies possible biases (Cole et al. 1981; Roumbanis 2016). Power-analytical approaches complement functionalist approaches with a second perspective on academic evaluation. This perspective focuses on dysfunctional effects in terms of structural inequalities like, for example, nepotism in peer review (Sandström and Hällsten 2008) or unequal opportunities of resource accumulation that follow from it (Hamann 2016b). The critical intention of this literature is shared by a third perspective that is concerned with the performativity of evaluations and evaluative devices. Scholarship using this analytical perspective has drawn attention to how journal peer review exerts discipline over scholarship (Siler and Strang 2016; Strang and Siler 2015; Teplitskiy 2016), how rankings trigger organizational change (Sauder and Espeland 2009), or how indicators incite strategic behavior or lead to goal displacement (see the overview in de Rijcke et al. 2015). Fourth, academic evaluations have been studied from a social-constructivist perspective, emphasizing that ideas and personas can be positioned and evaluated differently in various social and historical contexts (Angermuller 2015; Baert 2012). This has been illustrated for conceptions of merit and originality (Guetzkow et al. 2004; Tsay et al. 2003), for philosophical ideas (Collins 2000), or for thinkers like Jacques Derrida (Lamont 1987) and Richard Rorty (Gross 2008). Related to this, and fifth, there is a pragmatist perspective on academic evaluation that focuses on the practices reviewers perform to actually reach a consensus on, for example, “quality” (Hirschauer 2010; Lamont 2009). Pragmatist perspectives emphasize the situatedness of evaluative practices, highlighting that evaluations are accomplished in concrete contexts and interactions. However, academic communities and disciplines are also important explanatory factors for evaluative practices. This brings us to the next section.
Disciplinarity and Academic Evaluation
While above we have discussed how most forms and arenas of academic evaluation are institutionalized across all disciplines, we want to emphasize in this section that the criteria of evaluation can differ substantially between and within scholarly communities. We will discuss, first, intradisciplinary aspects of evaluation criteria within disciplines; second, interdisciplinary aspects of evaluation criteria between disciplines; and third, transdisciplinary aspects of evaluation criteria across disciplines.
To begin with, academic communities and disciplines vary on an intradisciplinary spectrum with respect to their internal diversity of evaluation criteria. Members of a discipline can widely agree on the core questions, methods, and theories, or they can be characterized by a plurality of notions of what is relevant, “good” research. Usually, this continuum spans from the natural sciences, where scholars share most evaluation criteria, over the less paradigmatic social sciences to the even less consensual humanities (Cole 1983; Evans et al. 2016). The degree to which disciplines share evaluation criteria has become a marker for their value. From Kuhn (1962), who remarkably equals paradigmatic closure with a discipline’s maturity, has evolved a powerful symbolic boundary that distinguishes “hard,” paradigmatic, and thus more “valuable” sciences from “soft,” pre-paradigmatic, and thus less “valuable” sciences (Peterson 2015; Smith et al. 2000). The paradigmaticness and scholarly consensus on evaluation criteria can, in turn, influence journal rejection rates (Hargens 1988).
The diversity of interdisciplinary evaluation criteria, and especially their contestation between different communities and disciplines, has been studied for the social sciences and humanities. One important difference between the two disciplinary clusters is the value of subjectivity in the pursuit of knowledge. Humanists and those social scientists that are influenced by the cultural turn find subjectivity and interpretative skills to be vital for research that is “good” in terms of being, for example, “fascinating.” Many social scientists, especially in the quantitative strands, prefer validity and reliability in order to produce research that is “good” in terms of being “true” (Lamont 2009). This finding applies not only to the funding panels studied by Lamont but also, for example, to book reviews. Reviews in most humanities and social science disciplines have been found to be not only longer and more discursive than in the natural sciences but also to be critical of both content and style of argument e.g., by valuing the quality and detail of exposition over demonstration and proof (East 2011; Hyland 2004). Furthermore, interdisciplinary differences also become apparent in graduate school admission committees, where economists believe that excellence inheres in what is being evaluated, while philosophers see it as an ideal that reviewers socially construct (Posselt 2015).
Transdisciplinary differences are illustrated, for example, by varying definitions of the evaluative criteria of “originality” between social sciences and humanities. Both disciplinary clusters employ a broad definition of “originality” that includes new perspectives, methods, questions, and arguments. But there are significant differences between the disciplinary clusters. In humanities and history, the most important aspect of originality is an innovative approach, while humanists also value original data. In comparison, social scientists privilege originality with respect to methods and also theories and research topics (Guetzkow et al. 2004).
Arguably, intra-, inter-, and transdisciplinary differences are linked to distinct epistemological cultures (Knorr Cetina 1981), tribal affiliations and belongings (Becher and Trowler 2001), and disciplinary rhetoric (Bazerman 1981). Arenas of evaluation that have to deal with this pluralism illustrate not only the challenges that come with this but also strategies to overcome them. For instance, interdisciplinary panels do not merely draw on a combination of disciplinary criteria. Rather, hybrid criteria and standards emerge from practices and deliberations between evaluators (Lamont 2009). Transdisciplinary evaluation is characterized by respect for disciplinary sovereignty and deference to expertise. The respective arenas rely on trust between reviewers of different expertise that their respective judgments are unbiased and disinterested (Lamont et al. 2006). Procedures that are supposed to facilitate outcomes perceived as fair include either the application of the same set of general evaluation criteria to different, say, proposals (cf. Collins and Evans 2002) or the application of criteria that seem appropriate to each proposal in terms of being most relevant to the discipline from which the proposal emanates (Mallard et al. 2009). In turn, an obstacle for fair judgments could be that evaluations also have a boundary function. Evaluators use their judgments to reproduce or redefine the boundaries of their respective fields (Posselt 2016). Evaluative practices that establish fair judgments are not only influenced by the disciplinary composition of panels. Other questions that have an influence include, for example, whether panelists rate or rank proposals or whether they have an advisory or a decisional role (Lamont and Huutoniemi 2011).
Apart from the rather deliberative strategies described up to this point, there are also more comprehensive strategies to conceptualize and measure academic quality and research performance by drawing on quantitative techniques. These approaches are increasingly mindful of disciplinary differences. Nonetheless, since criteria for “good” research and publication practices vary markedly across communities and disciplines, quantitative techniques have been proven to be less appropriate – and less acknowledged – in the social sciences and humanities (Mustajoki 2013; Ochsner et al. 2016). For example, customary methods of research performance evaluation are not appropriate for the humanities (Moed et al. 2002; Nederhof 2006). Alternative metrics, based on data from the social web and designed to assess non-academic criteria like popularity or media impact, share these limitations (Hammarfelt 2014). While the academic literature on research quality and performance assessments is distinctly aware of these restrictions, higher education policies do not always share this insight. Most large-scale audits famously ignore and overlook disciplinary differences. We discuss their attempts of academic evaluation in the following section.
Recent Developments in the Organization of Academic Evaluation
The linchpin of academic evaluation has long rested on the notion of academic autonomy and self-governance (Whitley 1984). According to this idea, the work of academics is first and foremost evaluated by other scholars. Thus, the primary form of recognition that counts in the world of academia is peer recognition. This is echoed not only in the more classical literature from Popper to Bourdieu that we have cited throughout this contribution. The vital role of peer recognition is also reflected by the central role that peer review has in academic disciplines, whether it is deployed for the distribution of research grants, the allocation of journal space, or the determination of winners of scholarly prizes and awards.
In the last 10–20 years, however, there has been a series of developments that have weakened the relative autonomy of academic fields and that have added new dominant evaluative procedures and institutions. The most important factor contributing to this trend has probably been the rise of new public management, changing how higher education and its members are governed in many countries across the globe. The main thrust of this new form of governance has been to reduce government funding and introduce more market-like competition in higher education (e.g., for the case of the United Kingdom, see Deem et al. 2008). Additionally, new public management initiatives have also sought to increase the accountability of universities and its members (Strathern 2000).
These developments have gone hand in hand with a stronger emphasis on external standards of evaluation in the assessment of scholarly work. One important example is the rise of rankings of academic departments and entire universities, promoted by both media corporations and government agencies (Collins and Park 2016; Espeland and Sauder 2016; Hazelkorn 2014). Another central concomitant has been the growing reliance on quantitative indicators to measure and track scholarly productivity and quality (Burrows 2012; de Rijcke et al. 2015). Taking the form of bibliometrics and citation indexes, these indicators have rapidly diffused into the scientific community, particularly the natural sciences, in part due to changes in the capability of information technology. Research shows that the growing reliance on such indicators has had a host of feedback effects on the content and organization of scholarship (Fochler et al. 2016; Hamann 2016b). While many scholars have been very critical of indicators, arguing that they render academic evaluation more mechanical and numerical (Lorenz 2012), new evaluative procedures and institutions seem to have become an established part of the wide range of academic evaluations.
- Angermuller, Johannes. 2013. How to become an academic philosopher. Academic discourse as multileveled positioning practice. Sociología Histórica 2013: 263–289.Google Scholar
- Angermuller, Johannes. 2015. The moment of theory. The rise and decline of structuralism in France and beyond. London: Continuum.Google Scholar
- Becher, Tony, and Paul Trowler. 2001. Academic tribes and territories: Intellectual enquiry and the cultures of disciplines. Philadelphia: Open University Press.Google Scholar
- Bourdieu, Pierre. 1988. Homo Academicus. Cambridge: Polity Press.Google Scholar
- Collins, Randall. 2000. The sociology of philosophies: A global theory of intellectual change. Harvard: Harvard University Press.Google Scholar
- Deem, Rosemary, Sam Hillyard, and Mike Reed. 2008. Knowledge, higher education, and the new managerialism: The changing management of UK universities. Oxford: Oxford University Press.Google Scholar
- East, John W. 2011. The scholarly book review in the humanities. An academic cinderella? Journal of Scholarly Publishing 43: 52–67.Google Scholar
- Espeland, Wendy N., and Michael Sauder. 2016. Engines of anxiety. Rankings, reputation, and accountability in a quantified world. New York: Russel Sage Foundation.Google Scholar
- Feyerabend, Paul. 1975. Against method: Outline of an anarchist theory of knowledge. New York: New Left Books.Google Scholar
- Hesselmann, Felicitas, Verena Graf, Marion Schmidt, and Martin Reinhart. 2016. The visibility of scientific misconduct: A review of the literature on retracted journal articles. Current Sociology online first: 1–32.Google Scholar
- Knorr Cetina, Karin. 1981. The manufacture of knowledge. An essay on the constructivist and contextual nature of science. Oxford: Pergamon Press.Google Scholar
- Knorr Cetina, Karin. 1999. Epistemic cultures. How the sciences make knowledge. Harvard: Harvard University Press.Google Scholar
- Kuhn, Thomas S. 1962. The structure of scientific revolutions. Chicago: University of Chicago Press.Google Scholar
- Lamont, Michèle, and Katri Huutoniemi. 2011. Comparing customary rules of fairness: Evaluative practices in various types of peer review panels. In Social knowledge in the making, ed. Charles Camic, Neil Gross, and Michèle Lamont, 209–232. Chicago: University of Chicago Press.Google Scholar
- Latour, Bruno. 1988. Science in action. How to follow scientists and engineers through society. Harvard: Harvard University Press.Google Scholar
- Merton, Robert K. 1973. The sociology of science. Theoretical and empirical investigations. Chicago: University of Chicago Press.Google Scholar
- Moed, Henk F., Marc Luwel, and Anton J. Nederhof. 2002. Towards research performance in the humanities. Library Trends 50: 498–520.Google Scholar
- Musselin, Christine. 2009. The market for academics. New York: Routledge.Google Scholar
- Ochsner, Michael, Sven E. Hug, and Hans-Dieter Daniel, eds. 2016. Research assessment in the humanities. Towards criteria and procedures. Dordrecht: Springer.Google Scholar
- Popper, Karl R. 1972. Objective knowledge. An evolutionary approach. Oxford: Clarendon Press.Google Scholar
- Riley, Lawrence E., and Elmer A. Spreitzer. 1970. Book reviewing in the social sciences. The American Sociologist 5: 358–363.Google Scholar
- Roumbanis, Lambros. 2016. Academic judgments under uncertainty: A study of collective anchoring effects in Swedish Research Council panel groups. Social Studies of Science, online first.Google Scholar
- Whitley, Richard D. 1984. The intellectual and social organization of the sciences. Oxford: Oxford University Press.Google Scholar