# A rejoinder on energy versus impact indicators

## Abstract

Citation distributions are so skewed that using the mean or any other central tendency measure is ill-advised. Unlike G. Prathap’s scalar measures (Energy, Exergy, and Entropy or EEE), the Integrated Impact Indicator (*I*3) is based on non-parametric statistics using the (100) percentiles of the distribution. Observed values can be tested against expected ones; impact can be qualified at the article level and then aggregated.

## Keywords

Citation Scalar Integration EEE*I*3 Quality

Prathap (2011a) in his Letter applies newly developed scalar measures for bibliometrics (Energy, Exergy, and Entropy; EEE) to the data provided in Table 1 of Van Raan (2006, p. 495). EEE operates on averages and ignores the shape of the underlying distributions of citations (“the citation curves”). (Let us note about EEE that energy and exergy share dimensionality, but entropy is expressed in Watts/Kelvin. Thus, the expression Energy − Exergy = Entropy as suggested by Prathap (2011b) is, in our opinion, invalid without the specification of a meta-physical analog on of the “temperature.”)

Like Prathap (2011b) and following Bornmann and Mutz (2011), Leydesdorff et al. (2011) have elaborated the percentile-rank as a scalar sum by using the same dataset that led to the original contention about how citation data should be normalized (Opthof and Leydesdorff 2010; Van Raan et al. 2010). More recently, Leydesdorff and Bornmann (2011) have developed this scalar measure into the Integrated Impact Indicator (*I*3).^{1}

The difference between *I*3 and EEE is that *I*3 takes the shapes of the distribution into account and allows for non-parametric significance tests, whereas Prathap’s systems view ignores this shape and uses averages on the assumption of the Central Limit Theorem (Glänzel 2010). However, citation distributions are extremely skewed (Seglen 1992, 1997; cf. Leydesdorff 2008) and central tendency statistics give misleading results. Using parametric statistics, one can neither reliably test the significance of observations nor the significance of differences in rankings.

Prathap (2011a) was able to compute using the *mean* values of JCS (journal citation scores) and FCS (field citation scores) because his concept of entropy is no longer *probabilistic* entropy (cf. Leydesdorff 1995; Theil 1972), but thermodynamic entropy (Prathap 2011b, p. 523f). However, the impact of two hits is not their average, but their sum. In the case of collisions, this is the vector sum of the momenta. We agree that in the case of citations one should use a scalar sum.

*I*3) can be formalized as an integration as follows:

Citations are discrete events and therefore the integral is in this case a step function: using Eq. 1, the frequency of papers in each percentile (*x* _{ i }) is multiplied by the percentile of each paper (*f*(*x* _{ i })). The resulting scalar (Σ) of the total impact can then be scaled (i) in terms of various evaluation schemes (e.g., quartiles, or the six evaluation categories used in the U.S. *Science and Engineering Indicators* NSB (2010) and by Bornmann and Mutz (2011)); (ii) tested for their significance against a theoretically specified expectation; (iii) expressed as a single number, namely a percentage of total impact contained in the reference set; and (iv) used to compare among and between various units of analysis such as journals, countries, institutes, and cities; by aggregating cases in a statistically controllable way (Theil 1972).

In summary, the discussion over *Rates of Averages* versus *Averages of Rates* (Gingras and Larivière 2011) has taught us that a rate of averages is merely a quotient number that does not allow for testing, and is mathematically inconsistent (Waltman et al. 2011). The mean observed citation ratio (MOCR) should not be divided by the mean expected citation ratio (RCR = MOCR/MECR; Schubert and Braun 1986; cf. Glänzel et al. 2009, p. 182), but observed values can be *tested against* expected values by using appropriate statistics.

Secondly, citation indicators based on averaging skewed distributions—such as Prathap’s EEE and the new “crown indicator” MNCS—are unreliable. For example, Leydesdorff et al*.* (2011) have shown that in the case of seven Principal Investigators at the Academic Medical Center of the University of Amsterdam, the number one ranked PI would fall to fifth position, whereas the sixth-ranked PI would become the highest-ranked author if percentiles or percentile ranks are used.

Thirdly, one should not test sets of documents as independent samples against each other, but as subsets of a reference set (Bornmann et al. 2008): each subset contributes a percentage impact to the set. The reference set allows for normalization and the specification of an expectation. (This specification can further be informed on theoretical grounds.) Using quantiles and percentile ranks, the observed values can be tested against the expected ones using non-parametric statistics.

Furthermore, and not specific as criticism of EEE, field delineations do not have to be based on *ex ante* classification schemes such as the ISI Subject Categories. Hitherto, journal classifications have been unprecise and unreliable (Boyack and Klavans 2011; Leydesdorff 2006; Pudovkin and Garfield 2002; Rafols and Leydesdorff 2009). Fractional attribution of citations in the citing documents, however, can be used for normalization of differences in citation potentials (Garfield 1979) reflecting differences in citation behavior at the level of individual papers (Leydesdorff and Bornmann 2011b; Leydesdorff and Opthof 2010; Moed 2010).

Given these recent improvements in citation normalization—such as the use of paper-based measures both cited and citing—the theoretical question remains whether citations can be used as indicators of scientific quality, and if so, when? (Amsterdamska and Leydesdorff 1989; Bornmann et al. 2008; Garfield 1979; Leydesdorff 1998; Leydesdorff and Amsterdamska 1990). Opthof and Leydesdorff (2011) opened this discussion by asking whether citation analysis enables us to legitimate the strategic selection of “excellent” as against merely “good” research?

## Footnotes

- 1.
The software for measuring this indicator is available at http://www.leydesdorff.net/software/i3.

## Notes

### Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

## References

- Amsterdamska, O., & Leydesdorff, L. (1989). Citations: Indicators of significance?
*Scientometrics,**15*(5–6), 449–471.CrossRefGoogle Scholar - Bornmann, L., & Mutz, R. (2011). Further steps towards an ideal method of measuring citation performance: The avoidance of citation (ratio) averages in field-normalization.
*Journal of Informetrics,**5*(1), 228–230.CrossRefGoogle Scholar - Bornmann, L., Mutz, R., Neuhaus, C., & Daniel, H. D. (2008). Citation counts for research evaluation: Standards of good practice for analyzing bibliometric data and presenting and interpreting results.
*Ethics in Science and Environmental Politics,**8*(1), 93–102.CrossRefGoogle Scholar - Boyack, K. W., & Klavans, R. (2011). Multiple dimensions of journal specificity: Why journals can’t be assigned to disciplines. In E. Noyons, P. Ngulube & J. Leta (Eds.),
*The 13th Conference of the International Society for Scientometrics and Informetrics*(Vol. I, pp. 123–133). Durban, South Africa: ISSI, Leiden University and the University of Zululand.Google Scholar - Garfield, E. (1979). Is citation analysis a legitimate evaluation tool?
*Scientometrics,**1*(4), 359–375.CrossRefGoogle Scholar - Gingras, Y., & Larivière, V. (2011). There are neither “king” nor “crown” in scientometrics: Comments on a supposed “alternative” method of normalization.
*Journal of Informetrics*,*5*(1), 226–227.Google Scholar - Glänzel, W. (2010). On reliability and robustness of scientometrics indicators based on stochastic models. An evidence-based opinion paper.
*Journal of Informetrics,**4*(3), 313–319.CrossRefGoogle Scholar - Glänzel, W., Thijs, B., Schubert, A., & Debackere, K. (2009). Subfield-specific normalized relative indicators and a new generation of relational charts: Methodological foundations illustrated on the assessment of institutional research performance.
*Scientometrics,**78*(1), 165–188.CrossRefGoogle Scholar - Leydesdorff, L. (1995).
*The challenge of scientometrics*:*The development*,*measurement*,*and self*-*organization of scientific communications*. Leiden: DSWO Press, Leiden University. Retrieved from http://www.universal-publishers.com/book.php?method=ISBN&book=1581126816. Accessed 12 Sep 2011. - Leydesdorff, L. (2006). Can scientific journals be classified in terms of aggregated journal–journal citation relations using the journal citation reports?
*Journal of the American Society for Information Science and Technology,**57*(5), 601–613.CrossRefGoogle Scholar - Leydesdorff, L. (2008).
*Caveats*for the use of citation indicators in research and journal evaluation.*Journal of the American Society for Information Science and Technology,**59*(2), 278–287.CrossRefGoogle Scholar - Leydesdorff, L., & Amsterdamska, O. (1990). Dimensions of citation analysis.
*Science, Technology and Human Values,**15*(3), 305–335.CrossRefGoogle Scholar - Leydesdorff, L., & Bornmann, L. (2011a). Integrated Impact Indicators (I3) compared with Impact Factors (IFs): An alternative design with policy implications.
*Journal of the American Society for Information Science and Technology*(in press).Google Scholar - Leydesdorff, L., & Bornmann, L. (2011b). How fractional counting affects the Impact Factor: Normalization in terms of differences in citation potentials among fields of science.
*Journal of the American Society for Information Science and Technology,**62*(2), 217–229.CrossRefGoogle Scholar - Leydesdorff, L., & Opthof, T. (2010).
*Scopus*’ Source Normalized Impact per Paper (SNIP)*versus*the Journal Impact Factor based on fractional counting of citations.*Journal of the American Society for Information Science and Technology,**61*(11), 2365–2396.CrossRefGoogle Scholar - Leydesdorff, L., Bornmann, L., Mutz, R., & Opthof, T. (2011). Turning the tables in citation analysis one more time: Principles for comparing sets of documents.
*Journal of the American Society for Information Science and Technology,**62*(7), 1370–1381.CrossRefGoogle Scholar - Moed, H. F. (2010). Measuring contextual citation impact of scientific journals.
*Journal of Informetrics,**4*(3), 265–277.CrossRefGoogle Scholar - National Science Board. (2010).
*Science and engineering indicators*. Washington, DC: National Science Foundation. Retrieved from http://www.nsf.gov/statistics/seind10/. Accessed 12 Sep 2011. - Opthof, T., & Leydesdorff, L. (2010).
*Caveats*for the journal and field normalizations in the CWTS (“Leiden”) evaluations of research performance.*Journal of Informetrics,**4*(3), 423–430.CrossRefGoogle Scholar - Opthof, T., & Leydesdorff, L. (2011). A comment to the paper by Waltman et al., Scientometrics, 87, 467–481, 2011.
*Scientometrics, 88*(3), 1011–1016.Google Scholar - Prathap, G. (2011a). A comment to the papers by Opthof and Leydesdorff, Scientometrics, 88, 1011–1016, 2011 and Waltman et al., Scientometrics, 88, 1017–1022, 2011.
*Scientometrics*. doi: 10.1007/s11192-011-0500-0. - Prathap, G. (2011b). The Energy–Exergy–Entropy (or EEE) sequences in bibliometric assessment.
*Scientometrics,**87*(3), 515–524.CrossRefGoogle Scholar - Pudovkin, A. I., & Garfield, E. (2002). Algorithmic procedure for finding semantically related journals.
*Journal of the American Society for Information Science and Technology,**53*(13), 1113–1119.CrossRefGoogle Scholar - Rafols, I., & Leydesdorff, L. (2009). Content-based and algorithmic classifications of journals: Perspectives on the dynamics of scientific communication and indexer effects.
*Journal of the American Society for Information Science and Technology,**60*(9), 1823–1835.CrossRefGoogle Scholar - Rousseau, R. (2011). Percentile rank scores are congruous indicators of relative performance, or aren’t they? Retrieved from http://arxiv.org/abs/1108.1860. Accessed 12 Sep 2011.
- Schubert, A., & Braun, T. (1986). Relative indicators and relational charts for comparative assessment of publication output and citation impact.
*Scientometrics*,*9*(5), 281–291.Google Scholar - Seglen, P. O. (1992). The skewness of science.
*Journal of the American Society for Information Science,**43*(9), 628–638.CrossRefGoogle Scholar - Seglen, P. O. (1997). Why the impact factor of journals should not be used for evaluating research.
*British Medical Journal,**314*, 498–502.CrossRefGoogle Scholar - Theil, H. (1972).
*Statistical decomposition analysis*. Amsterdam: North-Holland.MATHGoogle Scholar - van Raan, A. F. J. (2006). Comparison of the Hirsch-index with standard bibliometric indicators and with peer judgment for 147 chemistry research groups.
*Scientometrics,**67*(3), 491–502.Google Scholar - van Raan, A. F. J., van Leeuwen, T. N., Visser, M. S., van Eck, N. J., & Waltman, L. (2010). Rivals for the crown: Reply to Opthof and Leydesdorff.
*Journal of Informetrics,**4*(3), 431–435.CrossRefGoogle Scholar - Waltman, L., van Eck, N. J., van Leeuwen, T. N., Visser, M. S., & van Raan, A. F. J. (2011). Towards a new crown indicator: Some theoretical considerations.
*Journal of Informetrics,**5*(1), 37–47.CrossRefGoogle Scholar