A rejoinder on energy versus impact indicators
Citation distributions are so skewed that using the mean or any other central tendency measure is ill-advised. Unlike G. Prathap’s scalar measures (Energy, Exergy, and Entropy or EEE), the Integrated Impact Indicator (I3) is based on non-parametric statistics using the (100) percentiles of the distribution. Observed values can be tested against expected ones; impact can be qualified at the article level and then aggregated.
KeywordsCitation Scalar Integration EEE I3 Quality
Prathap (2011a) in his Letter applies newly developed scalar measures for bibliometrics (Energy, Exergy, and Entropy; EEE) to the data provided in Table 1 of Van Raan (2006, p. 495). EEE operates on averages and ignores the shape of the underlying distributions of citations (“the citation curves”). (Let us note about EEE that energy and exergy share dimensionality, but entropy is expressed in Watts/Kelvin. Thus, the expression Energy − Exergy = Entropy as suggested by Prathap (2011b) is, in our opinion, invalid without the specification of a meta-physical analog on of the “temperature.”)
Like Prathap (2011b) and following Bornmann and Mutz (2011), Leydesdorff et al. (2011) have elaborated the percentile-rank as a scalar sum by using the same dataset that led to the original contention about how citation data should be normalized (Opthof and Leydesdorff 2010; Van Raan et al. 2010). More recently, Leydesdorff and Bornmann (2011) have developed this scalar measure into the Integrated Impact Indicator (I3).1
The difference between I3 and EEE is that I3 takes the shapes of the distribution into account and allows for non-parametric significance tests, whereas Prathap’s systems view ignores this shape and uses averages on the assumption of the Central Limit Theorem (Glänzel 2010). However, citation distributions are extremely skewed (Seglen 1992, 1997; cf. Leydesdorff 2008) and central tendency statistics give misleading results. Using parametric statistics, one can neither reliably test the significance of observations nor the significance of differences in rankings.
Prathap (2011a) was able to compute using the mean values of JCS (journal citation scores) and FCS (field citation scores) because his concept of entropy is no longer probabilistic entropy (cf. Leydesdorff 1995; Theil 1972), but thermodynamic entropy (Prathap 2011b, p. 523f). However, the impact of two hits is not their average, but their sum. In the case of collisions, this is the vector sum of the momenta. We agree that in the case of citations one should use a scalar sum.
Citations are discrete events and therefore the integral is in this case a step function: using Eq. 1, the frequency of papers in each percentile (x i ) is multiplied by the percentile of each paper (f(x i )). The resulting scalar (Σ) of the total impact can then be scaled (i) in terms of various evaluation schemes (e.g., quartiles, or the six evaluation categories used in the U.S. Science and Engineering Indicators NSB (2010) and by Bornmann and Mutz (2011)); (ii) tested for their significance against a theoretically specified expectation; (iii) expressed as a single number, namely a percentage of total impact contained in the reference set; and (iv) used to compare among and between various units of analysis such as journals, countries, institutes, and cities; by aggregating cases in a statistically controllable way (Theil 1972).
In summary, the discussion over Rates of Averages versus Averages of Rates (Gingras and Larivière 2011) has taught us that a rate of averages is merely a quotient number that does not allow for testing, and is mathematically inconsistent (Waltman et al. 2011). The mean observed citation ratio (MOCR) should not be divided by the mean expected citation ratio (RCR = MOCR/MECR; Schubert and Braun 1986; cf. Glänzel et al. 2009, p. 182), but observed values can be tested against expected values by using appropriate statistics.
Secondly, citation indicators based on averaging skewed distributions—such as Prathap’s EEE and the new “crown indicator” MNCS—are unreliable. For example, Leydesdorff et al. (2011) have shown that in the case of seven Principal Investigators at the Academic Medical Center of the University of Amsterdam, the number one ranked PI would fall to fifth position, whereas the sixth-ranked PI would become the highest-ranked author if percentiles or percentile ranks are used.
Thirdly, one should not test sets of documents as independent samples against each other, but as subsets of a reference set (Bornmann et al. 2008): each subset contributes a percentage impact to the set. The reference set allows for normalization and the specification of an expectation. (This specification can further be informed on theoretical grounds.) Using quantiles and percentile ranks, the observed values can be tested against the expected ones using non-parametric statistics.
Furthermore, and not specific as criticism of EEE, field delineations do not have to be based on ex ante classification schemes such as the ISI Subject Categories. Hitherto, journal classifications have been unprecise and unreliable (Boyack and Klavans 2011; Leydesdorff 2006; Pudovkin and Garfield 2002; Rafols and Leydesdorff 2009). Fractional attribution of citations in the citing documents, however, can be used for normalization of differences in citation potentials (Garfield 1979) reflecting differences in citation behavior at the level of individual papers (Leydesdorff and Bornmann 2011b; Leydesdorff and Opthof 2010; Moed 2010).
Given these recent improvements in citation normalization—such as the use of paper-based measures both cited and citing—the theoretical question remains whether citations can be used as indicators of scientific quality, and if so, when? (Amsterdamska and Leydesdorff 1989; Bornmann et al. 2008; Garfield 1979; Leydesdorff 1998; Leydesdorff and Amsterdamska 1990). Opthof and Leydesdorff (2011) opened this discussion by asking whether citation analysis enables us to legitimate the strategic selection of “excellent” as against merely “good” research?
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- Boyack, K. W., & Klavans, R. (2011). Multiple dimensions of journal specificity: Why journals can’t be assigned to disciplines. In E. Noyons, P. Ngulube & J. Leta (Eds.), The 13th Conference of the International Society for Scientometrics and Informetrics (Vol. I, pp. 123–133). Durban, South Africa: ISSI, Leiden University and the University of Zululand.Google Scholar
- Gingras, Y., & Larivière, V. (2011). There are neither “king” nor “crown” in scientometrics: Comments on a supposed “alternative” method of normalization. Journal of Informetrics, 5(1), 226–227.Google Scholar
- Glänzel, W., Thijs, B., Schubert, A., & Debackere, K. (2009). Subfield-specific normalized relative indicators and a new generation of relational charts: Methodological foundations illustrated on the assessment of institutional research performance. Scientometrics, 78(1), 165–188.CrossRefGoogle Scholar
- Leydesdorff, L. (1995). The challenge of scientometrics: The development, measurement, and self-organization of scientific communications. Leiden: DSWO Press, Leiden University. Retrieved from http://www.universal-publishers.com/book.php?method=ISBN&book=1581126816. Accessed 12 Sep 2011.
- Leydesdorff, L., & Bornmann, L. (2011a). Integrated Impact Indicators (I3) compared with Impact Factors (IFs): An alternative design with policy implications. Journal of the American Society for Information Science and Technology (in press).Google Scholar
- National Science Board. (2010). Science and engineering indicators. Washington, DC: National Science Foundation. Retrieved from http://www.nsf.gov/statistics/seind10/. Accessed 12 Sep 2011.
- Opthof, T., & Leydesdorff, L. (2011). A comment to the paper by Waltman et al., Scientometrics, 87, 467–481, 2011. Scientometrics, 88(3), 1011–1016.Google Scholar
- Prathap, G. (2011a). A comment to the papers by Opthof and Leydesdorff, Scientometrics, 88, 1011–1016, 2011 and Waltman et al., Scientometrics, 88, 1017–1022, 2011. Scientometrics. doi: 10.1007/s11192-011-0500-0.
- Rousseau, R. (2011). Percentile rank scores are congruous indicators of relative performance, or aren’t they? Retrieved from http://arxiv.org/abs/1108.1860. Accessed 12 Sep 2011.
- Schubert, A., & Braun, T. (1986). Relative indicators and relational charts for comparative assessment of publication output and citation impact. Scientometrics, 9(5), 281–291.Google Scholar
- van Raan, A. F. J. (2006). Comparison of the Hirsch-index with standard bibliometric indicators and with peer judgment for 147 chemistry research groups. Scientometrics, 67(3), 491–502.Google Scholar