The effect of database dirty data on h-index calculation

Franceschini, Fiorenzo; Maisano, Domenico; Mastrogiacomo, Luca

doi:10.1007/s11192-012-0871-x

The effect of database dirty data on h-index calculation

Published: 25 October 2012

Volume 95, pages 1179–1188, (2013)
Cite this article

Scientometrics Aims and scope Submit manuscript

Fiorenzo Franceschini¹,
Domenico Maisano¹ &
Luca Mastrogiacomo¹

753 Accesses
11 Citations
Explore all metrics

Abstract

As all databases, the bibliometric ones (e.g. Scopus, Web of Knowledge and Google Scholar) are not exempt from errors, such as missing or wrong records, which may obviously affect publication/citation statistics and—more in general—the resulting bibliometric indicators. This paper tries to answer to the question “What is the effect of database uncertainty on the evaluation of the h-index?”, breaking the paradigm of deterministic database analysis and treating responses to database queries as random variables. Precisely an informetric model of the h-index is used to quantify the variability of this indicator with respect to the variability stemming from errors in database records. Some preliminary results are presented and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Confidence distributions and hypothesis testing

Article Open access 29 March 2024

Eugenio Melilli & Piero Veronese

Information overload in the information age: a review of the literature from business administration, business psychology, and related disciplines with a bibliometric approach and framework development

Article Open access 06 July 2018

Peter Gordon Roetzel

The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis

Article 26 March 2021

Vivek Kumar Singh, Prashasti Singh, … Philipp Mayr

References

Alonso, S., Cabrerizo, F. J., Herrera-Viedma, E., & Herrera, F. (2009). h-Index: A review focused in its variants, computation and standardization for different scientific fields. Journal of Informetrics, 3(4), 273–289.
Article Google Scholar
Bar-Ilan, J., Levene, M., & Lin, A. (2007). Some measures for comparing citation databases. Journal of Informetrics, 1(1), 26–34.
Article Google Scholar
Bornmann, L., & Daniel, H. D. (2005). Does the h-index for ranking of scientists really work? Scientometrics, 65(3), 391–392.
Article Google Scholar
Braun, T., Glänzel, W., & Schubert, A. (2006). A Hirsch-type index for journals. Scientometrics, 69(1), 169–173.
Article Google Scholar
Casella, G., & Berger, R. L. (2001). Statistical inference (2nd ed., pp. 240–245). North Scituate: Duxbury Press.
Google Scholar
Courtault, J. M., & Hayek, N. (2008). On the Robustness of the h-index: a mathematical approach. Economics Bulletin, 3(78), 1–9.
Google Scholar
Egghe, L. (1990). The duality of informetric systems with applications to the empirical laws. Journal of Information Science, 16(1), 17–27.
Article Google Scholar
Egghe, L. (2005a). Power laws in the information production process: Lotkaian informetrics. London: Academic Press.
Google Scholar
Egghe, L. (2005b). Relations between the continuous and the discrete Lotka power function. Journal of the American Society for Information Science and Technology, 56(7), 664–668.
Article Google Scholar
Egghe, L. (2006). An improvement of the h-index: The g-index. ISSI Newsletter, 2(1), 8–9.
MathSciNet Google Scholar
Egghe, L. (2009). Lotkaian informetrics and applications to social networks. Bulletin of the Belgian Mathematical Society-Simon Stevin, 16(4), 689–703.
MathSciNet MATH Google Scholar
Egghe, L., & Rousseau, R. (2006). An informetric model for the Hirsch-index. Scientometrics, 69(1), 121–129.
Article Google Scholar
Franceschini, F., Galetto, M., Maisano, D., & Mastrogiacomo, L. (2012a). The success-index: An alternative approach to the h-index for evaluating an individual’s research output. Scientometrics, 92(3), 621–641.
Article Google Scholar
Franceschini, F. M., Galetto, D. M., & Mastrogiacomo, L. (2012a). An informetric model for the success-index. Forthcoming on Journal of Informetrics.
Franceschini, F., & Maisano, D. (2010a). Analysis of the Hirsch index’s operational properties. European Journal of Operational Research, 203(2), 494–504.
Article MATH Google Scholar
Franceschini, F., & Maisano, D. (2010b). The Hirsch spectrum: A novel tool for analyzing scientific journals. Journal of Informetrics, 4(1), 64–73.
Article Google Scholar
Glänzel, W. (2006a). On the h-index-a mathematical approach to a new measure of publication activity and citation impact. Scientometrics, 67(2), 315–321.
Article Google Scholar
Glänzel, W. (2006b). On the opportunities and limitations of the h-index. Science focus, 1(1), 10–11
Google Scholar
Henzinger, M., Suñol, J., & Weber, I. (2010). The stability of the h-index. Scientometrics, 84(2), 465–479.
Article Google Scholar
Hernández, M. A., & Stolfo, S. J. (1998). Real-world data is dirty: Data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery, 2(1), 9–37.
Article Google Scholar
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National academy of Sciences of the United States of America, 102(46), 16569–16572.
Article Google Scholar
Jacso, P. (2006). Deflated, inflated and phantom citation counts. Online Information Review, 30(3), 297–309.
Article Google Scholar
Jacsó, P. (2008). The pros and cons of computing the h-index using Web of Science. Online Information Review, 32(5), 673–688.
Article Google Scholar
Jacsó, P. (2011a). Google Scholar duped and deduped–the aura of “robometrics”. Online Information Review, 35(1), 154–160.
Article Google Scholar
Jacsó, P. (2011b). The h-index, h-core citation rate and the bibliometric profile of the Scopus database. Online Information Review, 35(3), 492–501.
Article Google Scholar
JCGM100:2008 (2008). Evaluation of measurement data—Guide to the expression of uncertainty in measurement. International Organization for Standardization, Geneve, Switzerland
Kim, W., Choi, B. J., Hong, E. K., Kim, S. K., & Lee, D. (2003). A taxonomy of dirty data. Data Mining and Knowledge Discovery, 7(1), 81–99.
Article MathSciNet Google Scholar
Lotka, A. J. (1926). The frequency distribution of scientific productivity. Journal of Washington Academy Sciences, 16, 317–323.
Google Scholar
Montgomery, D. C. (2009). Statistical quality control: A modern introduction. Hoboken: Wiley.
MATH Google Scholar
Scopus-Elsevier. (2012). Scopus Content Coverage. Retrieved September 2012, from http://www.scopus.com.
Thomson-Reuters (Ed.) (2012) 2011 Journal Citation Reports® Science Edition.
Times Higher Education. (2012). The World University Rankings. Retrieved September 2012, from http://www.timeshighereducation.co.uk/world-university-rankings/.
Van Raan, A. F. J. (2006). Comparison of the Hirsch-index with standard bibliometric indicators and with peer judgment for 147 chemistry research groups. Scientometrics, 67(3), 491–502.
Google Scholar
Vanclay, J. K. (2007). On the robustness of the h-index. Journal of the American Society for Information Science and Technology, 58(10), 1547–1550.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Ingegneria Gestionale e della Produzione (DIGEP), Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
Fiorenzo Franceschini, Domenico Maisano & Luca Mastrogiacomo

Authors

Fiorenzo Franceschini
View author publications
You can also search for this author in PubMed Google Scholar
Domenico Maisano
View author publications
You can also search for this author in PubMed Google Scholar
Luca Mastrogiacomo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fiorenzo Franceschini.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Franceschini, F., Maisano, D. & Mastrogiacomo, L. The effect of database dirty data on h-index calculation. Scientometrics 95, 1179–1188 (2013). https://doi.org/10.1007/s11192-012-0871-x

Download citation

Received: 31 July 2012
Published: 25 October 2012
Issue Date: June 2013
DOI: https://doi.org/10.1007/s11192-012-0871-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The effect of database dirty data on h-index calculation

Abstract

Access this article

Similar content being viewed by others

Confidence distributions and hypothesis testing

Information overload in the information age: a review of the literature from business administration, business psychology, and related disciplines with a bibliometric approach and framework development

The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The effect of database dirty data on h-index calculation

Abstract

Access this article

Similar content being viewed by others

Confidence distributions and hypothesis testing

Information overload in the information age: a review of the literature from business administration, business psychology, and related disciplines with a bibliometric approach and framework development

The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation