, Volume 95, Issue 3, pp 1179–1188 | Cite as

The effect of database dirty data on h-index calculation

  • Fiorenzo Franceschini
  • Domenico Maisano
  • Luca Mastrogiacomo


As all databases, the bibliometric ones (e.g. Scopus, Web of Knowledge and Google Scholar) are not exempt from errors, such as missing or wrong records, which may obviously affect publication/citation statistics and—more in general—the resulting bibliometric indicators. This paper tries to answer to the question “What is the effect of database uncertainty on the evaluation of the h-index?”, breaking the paradigm of deterministic database analysis and treating responses to database queries as random variables. Precisely an informetric model of the h-index is used to quantify the variability of this indicator with respect to the variability stemming from errors in database records. Some preliminary results are presented and discussed.


Citations h-index h-index robustness Uncertain data Dirty database 


  1. Alonso, S., Cabrerizo, F. J., Herrera-Viedma, E., & Herrera, F. (2009). h-Index: A review focused in its variants, computation and standardization for different scientific fields. Journal of Informetrics, 3(4), 273–289.CrossRefGoogle Scholar
  2. Bar-Ilan, J., Levene, M., & Lin, A. (2007). Some measures for comparing citation databases. Journal of Informetrics, 1(1), 26–34.CrossRefGoogle Scholar
  3. Bornmann, L., & Daniel, H. D. (2005). Does the h-index for ranking of scientists really work? Scientometrics, 65(3), 391–392.CrossRefGoogle Scholar
  4. Braun, T., Glänzel, W., & Schubert, A. (2006). A Hirsch-type index for journals. Scientometrics, 69(1), 169–173.CrossRefGoogle Scholar
  5. Casella, G., & Berger, R. L. (2001). Statistical inference (2nd ed., pp. 240–245). North Scituate: Duxbury Press.Google Scholar
  6. Courtault, J. M., & Hayek, N. (2008). On the Robustness of the h-index: a mathematical approach. Economics Bulletin, 3(78), 1–9.Google Scholar
  7. Egghe, L. (1990). The duality of informetric systems with applications to the empirical laws. Journal of Information Science, 16(1), 17–27.CrossRefGoogle Scholar
  8. Egghe, L. (2005a). Power laws in the information production process: Lotkaian informetrics. London: Academic Press.Google Scholar
  9. Egghe, L. (2005b). Relations between the continuous and the discrete Lotka power function. Journal of the American Society for Information Science and Technology, 56(7), 664–668.CrossRefGoogle Scholar
  10. Egghe, L. (2006). An improvement of the h-index: The g-index. ISSI Newsletter, 2(1), 8–9.MathSciNetGoogle Scholar
  11. Egghe, L. (2009). Lotkaian informetrics and applications to social networks. Bulletin of the Belgian Mathematical Society-Simon Stevin, 16(4), 689–703.MathSciNetMATHGoogle Scholar
  12. Egghe, L., & Rousseau, R. (2006). An informetric model for the Hirsch-index. Scientometrics, 69(1), 121–129.CrossRefGoogle Scholar
  13. Franceschini, F., Galetto, M., Maisano, D., & Mastrogiacomo, L. (2012a). The success-index: An alternative approach to the h-index for evaluating an individual’s research output. Scientometrics, 92(3), 621–641.CrossRefGoogle Scholar
  14. Franceschini, F. M., Galetto, D. M., & Mastrogiacomo, L. (2012a). An informetric model for the success-index. Forthcoming on Journal of Informetrics.Google Scholar
  15. Franceschini, F., & Maisano, D. (2010a). Analysis of the Hirsch index’s operational properties. European Journal of Operational Research, 203(2), 494–504.MATHCrossRefGoogle Scholar
  16. Franceschini, F., & Maisano, D. (2010b). The Hirsch spectrum: A novel tool for analyzing scientific journals. Journal of Informetrics, 4(1), 64–73.CrossRefGoogle Scholar
  17. Glänzel, W. (2006a). On the h-index-a mathematical approach to a new measure of publication activity and citation impact. Scientometrics, 67(2), 315–321.CrossRefGoogle Scholar
  18. Glänzel, W. (2006b). On the opportunities and limitations of the h-index. Science focus, 1(1), 10–11Google Scholar
  19. Henzinger, M., Suñol, J., & Weber, I. (2010). The stability of the h-index. Scientometrics, 84(2), 465–479.CrossRefGoogle Scholar
  20. Hernández, M. A., & Stolfo, S. J. (1998). Real-world data is dirty: Data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery, 2(1), 9–37.CrossRefGoogle Scholar
  21. Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National academy of Sciences of the United States of America, 102(46), 16569–16572.CrossRefGoogle Scholar
  22. Jacso, P. (2006). Deflated, inflated and phantom citation counts. Online Information Review, 30(3), 297–309.CrossRefGoogle Scholar
  23. Jacsó, P. (2008). The pros and cons of computing the h-index using Web of Science. Online Information Review, 32(5), 673–688.CrossRefGoogle Scholar
  24. Jacsó, P. (2011a). Google Scholar duped and deduped–the aura of “robometrics”. Online Information Review, 35(1), 154–160.CrossRefGoogle Scholar
  25. Jacsó, P. (2011b). The h-index, h-core citation rate and the bibliometric profile of the Scopus database. Online Information Review, 35(3), 492–501.CrossRefGoogle Scholar
  26. JCGM100:2008 (2008). Evaluation of measurement data—Guide to the expression of uncertainty in measurement. International Organization for Standardization, Geneve, SwitzerlandGoogle Scholar
  27. Kim, W., Choi, B. J., Hong, E. K., Kim, S. K., & Lee, D. (2003). A taxonomy of dirty data. Data Mining and Knowledge Discovery, 7(1), 81–99.MathSciNetCrossRefGoogle Scholar
  28. Lotka, A. J. (1926). The frequency distribution of scientific productivity. Journal of Washington Academy Sciences, 16, 317–323.Google Scholar
  29. Montgomery, D. C. (2009). Statistical quality control: A modern introduction. Hoboken: Wiley.MATHGoogle Scholar
  30. Scopus-Elsevier. (2012). Scopus Content Coverage. Retrieved September 2012, from
  31. Thomson-Reuters (Ed.) (2012) 2011 Journal Citation Reports® Science Edition.Google Scholar
  32. Times Higher Education. (2012). The World University Rankings. Retrieved September 2012, from
  33. Van Raan, A. F. J. (2006). Comparison of the Hirsch-index with standard bibliometric indicators and with peer judgment for 147 chemistry research groups. Scientometrics, 67(3), 491–502.Google Scholar
  34. Vanclay, J. K. (2007). On the robustness of the h-index. Journal of the American Society for Information Science and Technology, 58(10), 1547–1550.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2012

Authors and Affiliations

  • Fiorenzo Franceschini
    • 1
  • Domenico Maisano
    • 1
  • Luca Mastrogiacomo
    • 1
  1. 1.Dipartimento di Ingegneria Gestionale e della Produzione (DIGEP)Politecnico di TorinoTurinItaly

Personalised recommendations