Skip to main content

The Substantive and Practical Significance of Citation Impact Differences Between Institutions: Guidelines for the Analysis of Percentiles Using Effect Sizes and Confidence Intervals

  • Chapter
  • First Online:
Book cover Measuring Scholarly Impact

Abstract

In this chapter we address the statistical analysis of percentiles: How should the citation impact of institutions be compared? In educational and psychological testing, percentiles are already used widely as a standard to evaluate an individual’s test scores—intelligence tests for example—by comparing them with the scores of a calibrated sample. Percentiles, or percentile rank classes, are also a very suitable method for bibliometrics to normalize citations of publications in terms of the subject category and the publication year and, unlike the mean-based indicators (the relative citation rates), percentiles are scarcely affected by skewed distributions of citations. The percentile of a certain publication provides information about the citation impact this publication has achieved in comparison to other similar publications in the same subject category and publication year. Analyses of percentiles, however, have not always been presented in the most effective and meaningful way. New APA guidelines (Association American Psychological, Publication manual of the American Psychological Association (6 ed.). Washington, DC: American Psychological Association (APA), 2010) suggest a lesser emphasis on significance tests and a greater emphasis on the substantive and practical significance of findings. Drawing on work by Cumming (Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. London: Routledge, 2012) we show how examinations of effect sizes (e.g., Cohen’s d statistic) and confidence intervals can lead to a clear understanding of citation impact differences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Cumming (2012) refers to the CI obtained from an analysis as “One from the dance.” What he means is that it is NOT correct to say that there is a 95 % chance that the true value of the mean lies within the confidence interval. Either the true value falls within the interval or it doesn’t. It is correct to say that, if this process were repeated an infinite number of times, then 95 % of the time the CI would include the true value of the mean while 5 % of the time it would not. Whether it does in the specific data we are analyzing, we don’t know.

  2. 2.

    Cumming (2012) notes various cautions about using Cohen’s d (p. 283). For example, while it is common to use sample standard deviations as we do here, other “standardizers” are possible, e.g., you might use the standard deviation for a reference population, such as elite institutions. Researchers should be clear exactly how Cohen’s d was computed.

  3. 3.

    With independent samples there are two different types of t-tests that can be conducted. The first type, used here, assumes that the variances for each group are equal. The second approach allows the variances for the two groups to be different. In our examples, it makes little difference which approach is used, since, as Table 12.2 shows, the standard deviations for the three groups are similar. In cases where the variances do clearly differ the second approach should be used. Most, perhaps all, statistical software packages can compute either type of t-test easily.

  4. 4.

    Nonetheless, as we found for other measures in our analysis, Cohen’s d seems robust to violations of its assumptions. When we estimated Cohen’s d using binary dependent variables, we got almost exactly the same numbers as we did for Cohen’s h.

References

  • Acock, A. (2010). A gentle introduction to Stata (3rd ed.). College Station, TX: Stata Press.

    Google Scholar 

  • Association American Psychological. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: American Psychological Association (APA).

    Google Scholar 

  • Bornmann, L., & Leydesdorff, L. (2013). Statistical tests and research assessments: A comment on Schneider (2012). Journal of the American Society for Information Science and Technology, 64(6), 1306–1308. doi:10.1002/asi.22860.

    Article  Google Scholar 

  • Bornmann, L., Leydesdorff, L., & Mutz, R. (2013). The use of percentiles and percentile rank classes in the analysis of bibliometric data: opportunities and limits. Journal of Informetrics, 7(1), 158–165.

    Article  Google Scholar 

  • Bornmann, L., & Mutz, R. (2013). The advantage of the use of samples in evaluative bibliometric studies. Journal of Informetrics, 7(1), 89–90. doi:10.1016/j.joi.2012.08.002.

    Article  Google Scholar 

  • Bornmann, L., & Williams, R. (2013). How to calculate the practical significance of citation impact differences? An empirical example from evaluative institutional bibliometrics using adjusted predictions and marginal effects. Journal of Informetrics, 7(2), 562–574. doi:10.1016/j.joi.2013.02.005.

    Article  Google Scholar 

  • Bornmann, L., de Moya Anegon, F., & Leydesdorff, L. (2012). The new Excellence Indicator in the World Report of the SCImago Institutions Rankings 2011. Journal of Informetrics, 6(2), 333–335. doi: 10.1016/j.joi.2011.11.006.

  • Cameron, A. C. & Trivedi, P. K. (2010). Microeconomics using Stata (Revised ed.). College Station, TX: Stata Press.

    Google Scholar 

  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.

    MATH  Google Scholar 

  • Cox, N. J. (2005). Calculating percentile ranks or plotting positions. Retrieved May 30, from http://www.stata.com/support/faqs/stat/pcrank.html

  • Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. London: Routledge.

    Google Scholar 

  • Glänzel, W., Thijs, B., Schubert, A., & Debackere, K. (2009). Subfield-specific normalized relative indicators and a new generation of relational charts: methodological foundations illustrated on the assessment of institutional research performance. Scientometrics, 78(1), 165–188.

    Article  Google Scholar 

  • Huber, C. (2013). Measures of effect size in Stata 13. The Stata Blog. Retrieved December 6, 2013, from http://blog.stata.com/2013/09/05/measures-of-effect-size-in-stata-13.

  • Hyndman, R. J., & Fan, Y. N. (1996). Sample quantiles in statistical packages. American Statistician, 50(4), 361–365.

    Google Scholar 

  • International Committee of Medical Journal Editors. (2010). Uniform requirements for manuscripts submitted to biomedical journals: Writing and editing for biomedical publication. Journal of Pharmacology and Pharmacotherapeutics, 1(1), 42–58. Retrieved April 10, 2014 from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3142758/.

  • Leydesdorff, L. (2012). Accounting for the uncertainty in the evaluation of percentile ranks. Journal of the American Society for Information Science and Technology, 63(11), 2349–2350.

    Article  Google Scholar 

  • Leydesdorff, L., & Bornmann, L. (2011). Integrated impact indicators (I3) compared with impact factors (IFs): An alternative research design with policy implications. Journal of the American Society of Information Science and Technology, 62(11), 2133–2146.

    Article  Google Scholar 

  • Leydesdorff, L., & Bornmann, L. (2012). Percentile ranks and the integrated impact indicator (I3). Journal of the American Society for Information Science and Technology, 63(9), 1901–1902. doi:10.1002/asi.22641.

    Article  Google Scholar 

  • Long, S., & Freese, J. (2006). Regression models for categorical dependent variables using Stata (2nd ed.). College Station, TX: Stata Press.

    MATH  Google Scholar 

  • Lundberg, J. (2007). Lifting the crown - citation z-score. Journal of Informetrics, 1(2), 145–154.

    Article  Google Scholar 

  • Moed, H. F., De Bruin, R. E., & Van Leeuwen, T. N. (1995). New bibliometric tools for the assessment of national research performance - database description, overview of indicators and first applications. Scientometrics, 33(3), 381–422.

    Article  Google Scholar 

  • Opthof, T., & Leydesdorff, L. (2010). Caveats for the journal and field normalizations in the CWTS (“Leiden”) evaluations of research performance. Journal of Informetrics, 4(3), 423–430.

    Article  Google Scholar 

  • Pudovkin, A. I., & Garfield, E. (2009). Percentile rank and author superiority indexes for evaluating individual journal articles and the author’s overall citation performance. Paper presented at the Fifth International Conference on Webometrics, Informetrics & Scientometrics (WIS).

    Google Scholar 

  • Schneider, J., & Schneider, J. (2012). Testing university rankings statistically: Why this is not such a good idea after all. Some reflections on statistical power, effect sizes, random sampling and imaginary populations. In E. Archambault, Y. Gingras, & V. Lariviere (Eds.), The 17th International Conference on Science and Technology Indicators (pp. 719–732). Montreal, Canada: Repro-UQAM.

    Google Scholar 

  • Schreiber, M. (2012). Inconsistencies of recently proposed citation impact indicators and how to avoid them. Journal of the American Society for Information Science and Technology, 63(10), 2062–2073. doi:10.1002/asi.22703.

    Article  MathSciNet  Google Scholar 

  • Schreiber, M. (2013). Uncertainties and ambiguities in percentiles and how to avoid them. Journal of the American Society for Information Science and Technology, 64(3), 640–643. doi:10.1002/asi.22752.

    Article  Google Scholar 

  • Schubert, A., & Braun, T. (1986). Relative indicators and relational charts for comparative assessment of publication output and citation impact. Scientometrics, 9(5–6), 281–291.

    Article  Google Scholar 

  • StataCorp. (2013). Stata statistical software: Release 13. College Station, TX: Stata Corporation.

    Google Scholar 

  • Tressoldi, P. E., Giofre, D., Sella, F., & Cumming, G. (2013). High impact = high statistical standards? not necessarily so. PLoS One, 8(2). doi: 10.1371/journal.pone.0056180.

  • van Raan, A. F. J., van Leeuwen, T. N., Visser, M. S., van Eck, N. J., & Waltman, L. (2010). Rivals for the crown: Reply to Opthof and Leydesdorff. Journal of Informetrics, 4, 431–435.

    Article  Google Scholar 

  • Waltman, L., Calero-Medina, C., Kosten, J., Noyons, E. C. M., Tijssen, R. J. W., van Eck, N. J., et al. (2012). The Leiden Ranking 2011/2012: Data collection, indicators, and interpretation. Journal of the American Society for Information Science and Technology, 63(12), 2419–2432.

    Article  Google Scholar 

  • Waltman, L., & Schreiber, M. (2013). On the calculation of percentile-based bibliometric indicators. Journal of the American Society for Information Science and Technology, 64(2), 372–379.

    Article  Google Scholar 

  • Williams, R. (2012). Using the margins command to estimate and interpret adjusted predictions and marginal effects. The Stata Journal, 12(2), 308–331.

    Google Scholar 

  • Zhou, P., & Zhong, Y. (2012). The citation-based indicator and combined impact indicator—new options for measuring impact. Journal of Informetrics, 6(4), 631–638. doi:10.1016/j.joi.2012.05.004.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Williams .

Editor information

Editors and Affiliations

Appendix: Stata Code Used for These Analyses

Appendix: Stata Code Used for These Analyses

* Stata code for Williams & Bornmann book chapter on effect sizes.

* Be careful when running this code -- make sure it doesn't

* overwrite existing files or graphs that use the same names.

version 13.1

use "http://www3.nd.edu/~rwilliam/statafiles/rwlbes", clear

gen inst12 = inst if inst!=3

gen inst13 = inst if inst!=2

gen inst23 = inst if inst!=1

gen top10 = perc <= 10

* Limit to 2001 & 2002; this can be changed

keep if py <=2002

* Table 12.2

* Single group designs - pages 286-287 of Cumming

* For each institution, test whether percentile mu = 50

* Note that negative differences mean better than average performance

forval instnum = 1/3 {

    display

    display "Institution `instnum'"

    ttest perc = 50 if inst==`instnum'

    display

    display "Cohen's d = " r(t) / sqrt(r(N_1))

    * DOUBLE CHECK: Compares above CIs and t-tests with bootstrap

    * Results from the test command should be similar to the t-test

    * significance level

    bootstrap, reps(100): reg perc if inst==`instnum'

    test _cons = 50

}

* Table 12.3

* Two group designs - Test whether two institutions

* differ from each other on mean percentile rating.

* Starts around p. 155

* Get both the t-tests and the ES stats, e.g. Cohen's d

* Note: you should flip the signs for the 3 vs 2 comparison

foreach iv of varlist inst12 inst13 inst23 {

    display    "perc is dependent, `iv'"

    ttest perc, by(`iv')

    scalar n1 = r(N_1)

    scalar n2 = r(N_2)

    scalar s1 = r(sd_1)

    scalar s2 = r(sd_2)

    display

    display "Pooled sd is " ///

        sqrt(((n1 - 1) * s1^2 + (n2 - 1) * s2^2 ) / (n1 + n2 - 2))

    display

    esize two perc, by(`iv') all

    display

    * DOUBLE CHECKS: Compare Mann-Whitney & bootstrap results with above

    * Mann-Whitney test

    ranksum perc, by(`iv')

    * Bootstrap

    bootstrap, rep(100): reg perc i.`iv'

}

* Table 12.4

* Proportions in Top 10, pp. 399-402

* Single institution tests

* Numbers in table are multiplied by 100

forval instnum = 1/3 {

    display

    display "Institution `instnum'"

    prtest top10 = .10 if inst==`instnum'

    display

    display

    scalar phi1 = 2 * asin(sqrt(r(P_1)))

    scalar phi2 = 2 * asin(sqrt(.10))

    di "h effect size = " phi1 - phi2

    display

}

* Table 12.5

* Proportions in Top 10 - pairwise comparisons of institutions

* Numbers in table are multiplied by 100

foreach instpair of varlist inst12 inst13 inst23 {

    display

    display "`instpair'"

    prtest top10, by (`instpair')

    display

    scalar phi1 = 2 * asin(sqrt(r(P_1)))

    scalar phi2 = 2 * asin(sqrt(r(P_2)))

    di "h effect size = " phi1 - phi2

    display

    * NOTE: Cohen's d provides very similar results to Cohen's h

    esize two top10, by (`instpair') all

    display

}

* Do graphs with Stata

* NOTE: Additional editing was done with the Stata Graph Editor

* Use ciplot for Univariate graphs

* Figure 12.1 - Average percentile score by inst with CI

ciplot perc, by(inst) name(fig1, replace)

* Figure 12.3

* Was edited to multiply by 100

ciplot top10, bin by(inst) name(fig3, replace)

*** Save figures before running figure 12.2 code

* Figure 12.2 - Differences in mean percentile rankings

* Use statsby and serrbar for tests of group differences

* Note: Data in memory is overwritten

gen inst32 = inst23 * -1 + 4

tab2 inst32 inst23

statsby _b _se, saving(xb12, replace) : reg perc i.inst12

statsby _b _se, saving(xb13, replace) : reg perc i.inst13

statsby _b _se, saving(xb32, replace) : reg perc i.inst32

clear all

append using xb12 xb13 xb32, gen(pairing)

label define pairing 1 "1 vs 2" 2 "1 vs 3" 3 "3 vs 2"

label values pairing pairing

serrbar _stat_2 _stat_5 pairing, scale(1.96) name(fig2, replace)

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Williams, R., Bornmann, L. (2014). The Substantive and Practical Significance of Citation Impact Differences Between Institutions: Guidelines for the Analysis of Percentiles Using Effect Sizes and Confidence Intervals. In: Ding, Y., Rousseau, R., Wolfram, D. (eds) Measuring Scholarly Impact. Springer, Cham. https://doi.org/10.1007/978-3-319-10377-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10377-8_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10376-1

  • Online ISBN: 978-3-319-10377-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics