Scientometrics

, Volume 118, Issue 2, pp 653–671

# Large enough sample size to rank two groups of data reliably according to their means

• Zhesi Shen
• Liying Yang
• Zengru Di
• Jinshan Wu
Article

## Abstract

Often we need to compare two sets of data, say X and Y, and often via comparing their means $$\mu _{X}$$ and $$\mu _{Y}$$. However, when two sets are highly overlapped (say for example $$\sqrt{\sigma ^{2}_{X}+\sigma ^{2}_{Y}}\gg \left| \mu _{X}-\mu _{Y}\right|$$), ranking the two sets according to their means might not be reliable. Based on the observation that replacing the one-by-one comparison, where we take one sample from each set at a time and compare the two samples, with the $$K_{X}$$-by-$$K_{Y}$$ comparison, where we take $$K_{X}$$ samples $$\left\{ x_{1}, x_{2}, \ldots , x_{K_{X}}\right\}$$ from one set and $$K_{Y}$$ samples $$\left\{ y_{1}, y_{2},\ldots , y_{K_{X}}\right\}$$ from the other set at a time and compare the averages $$\frac{\sum _{j=1}^{K_{X}}x_{j}}{K_{X}}$$ and $$\frac{\sum _{j=1}^{K_{Y}}y_{j}}{K_{Y}}$$, reduces the overlap and thus improves the reliability, we propose a definition of the minimum representative size $$\kappa$$ of each set for comparing sets by requiring roughly speaking $$\sqrt{\sigma ^{2}_{K_X}+\sigma ^{2}_{K_Y}}\ll \left| \mu _{X}-\mu _{Y}\right|$$). Applied to journal comparison, this minimum representative size $$\kappa$$ might be used as a complementary index to the journal impact factor (JIF) to indicate a measure of reliability of comparing two journals using their JIFs. Generally, this idea of minimum representative size can be used when any two sets of data with overlapping distributions are compared.

## Keywords

Journal impact factor Minimum representative size Bootstrap sampling

## Notes

### Acknowledgements

We thank Jianlin Zhou, Per Ahlgren, Ludo Waltman and Lawrence Smolinsky for valuable discussions since this paper’s early version (Shen et al. 2017). We would also like to thank the anonymous referees for their suggestions and criticisms, which have greatly improved the paper’s presentation. This work was supported by the NSFC under Grant No. 61374175, the China Postdoctoral Science Foundation under Grant 2017 M620944, and Fundamental Research Funds for the Central Universities.

## References

1. Anonymous. (2011). Dissecting our impact factor. Nature Materials, 10, 645.Google Scholar
2. Bar-Ilan, J. (2008). Informetrics at the beginning of the 21st century–a review. Journal of Informetrics, 2, 1–52.
3. Bornmann, L., Leydesdorff, L., & Mutz, R. (2013). The use of percentiles and percentile rank classes in the analysis of bibliometric data: Opportunities and limits. Journal of Informetrics, 7, 158–165.
4. Bornmann, L., Marx, W., Gasparyan, A. Y., & Kitas, G. D. (2012). Diversity, value and limitations of the journal impact factor and alternative metrics. Rheumatology International, 32, 1861–1867.
5. Bornmann, L., & Mutz, R. (2011). Further steps towards an ideal method of measuring citation performance: The avoidance of citation (ratio) averages in field-normalization. Journal of Informetrics, 5, 228–230.
6. Bornmann, L., Stefaner, M., de Moya Anegón, F., & Mutz, R. (2014). Ranking and mapping of universities and research-focused institutions worldwide based on highly-cited papers. Online Information Review, 38, 43–58. .
7. Callaway, E. (2016). Beat it, impact factor! publishing elite turns against controversial metric. Nature, 535, 210–211.
8. Church, J. D., & Harris, B. (1970). The estimation of reliability from stress-strength relationships. Technometrics, 12, 49–54. .
9. DORA (2013). San francisco declaration on research assessment. http://www.ascb.org/dora/. Accessed 20 December 2016.
10. Downton, F. (1973). The estimation of pr $$(\text{ y } < \text{ x })$$ in the normal case. Technometrics, 15, 551–558.
11. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. Monographs on statistics and applied probability (Vol. 57). Boca Raton: Chapman & Hall/CRC.
12. Garfield, E. (1999). Journal impact factor: A brief review. Canadian Medical Association Journal, 161, 979–80.Google Scholar
13. Glänzel, W. (2010). On reliability and robustness of scientometrics indicators based on stochastic models. An evidence-based opinion paper. Journal of Informetrics, 4, 313–319.
14. Glänzel, W., & Moed, H. F. (2002). Journal impact measures in bibliometric research. Scientometrics, 53, 171–193.
15. Glänzel, W., & Moed, H. F. (2013). Opinion paper: Thoughts and facts on bibliometric indicators. Scientometrics, 96, 381–394.
16. Herrnstein, R. J., Loveland, D. H., & Cable, C. (1976). Natural concepts in pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 2, 285–302.Google Scholar
17. Hicks, D., Wouters, P., Waltman, L., De, R. S., & Rafols, I. (2015). The leiden manifesto for research metrics. Nature, 520, 429–31.
18. Kurmis, A. P. (2003). Understanding the limitations of the journal impact factor. Journal of Bone and Joint Surgery American, 85–A, 2449–54.
19. Larivière, V., Kiermer, V., MacCallum, C. J., McNutt, M., Patterson, M., Pulverer, B., Swaminathan, S., Taylor, S., & Curry, S. (2016). A simple proposal for the publication of journal citation distributions. bioRxiv, . http://biorxiv.org/content/early/2016/09/11/062109. .
20. Leydesdorff, L., & Bornmann, L. (2011a). How fractional counting of citations affects the impact factor: Normalization in terms of differences in citation potentials among fields of science. Journal of the Association for Information Science and Technology, 62, 217–229.Google Scholar
21. Leydesdorff, L., & Bornmann, L. (2011b). Integrated impact indicators compared with impact factors: An alternative research design with policy implications. Journal of the Association for Information Science and Technology, 62, 2133–2146.Google Scholar
22. Leydesdorff, L., & Opthof, T. (2010). Normalization at the field level: Fractional counting of citations. Journal of Informetrics, 4, 644–646.
23. Mann, H., & Whitney, D. (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, 50–60.
24. Milojević, S., Radicchi, F., & Bar-Ilan, J. (2017). Citation success index an intuitive pair-wise journal comparison metric. Journal of Informetrics, 11, 223–231.
25. Mingers, J., & Leydesdorff, L. (2015). A review of theory and practice in scientometrics. European Journal of Operational Research, 246, 1–19.
26. Mingers, J., & Yang, L. (2017). Evaluating journal quality: A review of journal citation indicators and ranking in business and management. European Journal of Operational Research, 257, 323–337.
27. Mutz, R., & Daniel, H. D. (2012). Skewed citation distributions and bias factors: Solutions to two core problems with the journal impact factor. Journal of Informetrics, 6, 169–176.
28. NSB (2016). National science board science and engineering indicators 2016. https://www.nsf.gov/statistics/2016/nsb20161/#/report/chapter-5/outputs-of-s-e-research-publications-and-patents. Accessed 18 June 2017
29. Radicchi, F., Fortunato, S., & Castellano, C. (2008). Universality of citation distributions: Toward an objective measure of scientific impact. Proceedings of the National Academy of Sciences, 105, 17268–17272. .
30. Reiser, B., & Guttman, I. (1986). Statistical inference for $$\text{ pr }(\text{ y } < \text{ x })$$: The normal case. Technometrics, 28, 253–257.
31. Seglen, P. O. (1992). The skewness of science. Journal of the Association for Information Science and Technology, 43, 628–638.Google Scholar
32. Seglen, P. O. (1997). Why the impact factor of journals should not be used for evaluating research. Bmj Clinical Research, 314, 498–502.
33. Shen, Z., Yang, L., Di, Z., & Wu, J. (2017). How large is large enough? In Proceedings of ISSI 2017, (pp. 288–299)Google Scholar
34. Stringer, M. J., Sales-Pardo, M., & Amaral, L. A. N. (2008). Effectiveness of journal ranking schemes as a tool for locating information. PLoS ONE, 3, e1683.
35. Waltman, L. (2016). A review of the literature on citation impact indicators. Journal of Informetrics, 10, 365–391.
36. Waltman, L., Calero-Medina, C., Kosten, J., Noyons, E. C., Tijssen, R. J., Eck, N. J., et al. (2012). The leiden ranking 2011/2012: Data collection, indicators, and interpretation. Journal of the Association or Information Science & Technology, 63, 2419–2432.Google Scholar
37. Wasserman, L. (2004). All of statistics. New York: Springer.
38. Welch, B. L. (1947). The generalization of ‘Student’s’ problem when several different population variances are involved. Biometrika, 34, 28–35. .
39. Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1, 80–83.
40. Zhou, W. (2008). Statistical inference for $$P(X < Y)$$. Statistics in Medicine, 27, 257–279. .

## Authors and Affiliations

• Zhesi Shen
• 1
• 2
• Liying Yang
• 1
• Zengru Di
• 2
• Jinshan Wu
• 2
1. 1.National Science LibraryChinese Academy of SciencesBeijingPeople’s Republic of China
2. 2.School of Systems ScienceBeijing Normal UniversityBeijingPeople’s Republic of China