Advertisement

Types of DOI errors of cited references in Web of Science with a cleaning method

  • Shuo Xu
  • Liyuan Hao
  • Xin AnEmail author
  • Dongsheng Zhai
  • Hongshen Pang
Article
  • 7 Downloads

Abstract

Though the bibliographic databases, such as Web of Science (WoS), largely promote the development of scientometrics and informetrics, these databases are not free of errors. The main purpose of this work is to figure out which types of DOI errors of cited references exist, how often each type of errors occur, and whether it is possible to automatically correct these errors. After careful analysis, several classic DOI errors of cited references, such as prefix-, suffix- and other-type errors, are identified, Then, a cleaning method is put forward on the basis of regular expressions. Experimental results on the bibliographic data in the gene editing field from the WoS database indicate that our cleaning approach can improve largely the quality of DOI names of cited references.

Keywords

DOI errors Cleaning method Web of Science Cited references Regular expression 

Notes

Acknowledgements

Our gratitude goes to the anonymous reviewers and the editor for their valuable comments.

References

  1. Boundry, C., & Chartron, G. (2017). Availability of digital object identifiers in publications archived by PubMed. Scientometrics, 110(3), 1453–1469.  https://doi.org/10.1007/s11192-016-2225-6.CrossRefGoogle Scholar
  2. Buchanan, R. A. (2006). Accuracy of cited references: The role of citation databases. College and Research Libraries, 67(4), 292–303.  https://doi.org/10.5860/crl.67.4.292.CrossRefGoogle Scholar
  3. Chandrakar, R. (2006). Digital object identifier system: An overview. The Electronic Library, 24(4), 445–452.  https://doi.org/10.1108/02640470610689151.CrossRefGoogle Scholar
  4. Franceschini, F., Maisano, D., & Mastrogiacomo, L. (2013). A novel approach for estimating the omitted-citation rate of bibliometric databases with an application to the field of bibliometrics. Journal of the Association for Information Science and Technology, 64(10), 2149–2156.  https://doi.org/10.1002/asi.22898.Google Scholar
  5. Franceschini, F., Maisano, D., & Mastrogiacomo, L. (2014). Scientific journal publishers and omitted citations in bibliometric databases: Any relationship? Journal of Informetrics, 8(3), 751–765.  https://doi.org/10.1016/j.joi.2014.07.003.CrossRefGoogle Scholar
  6. Franceschini, F., Maisano, D., & Mastrogiacomo, L. (2015). Errors in indexing bybibliometric databases. Scientometrics, 102(3), 2181–2186.  https://doi.org/10.1007/s11192-014-1503-4.CrossRefGoogle Scholar
  7. Franceschini, F., Maisano, D., & Mastrogiacomo, L. (2016). The museum of errors/horrors in Scopus. Journal of Informetrics, 10(1), 174–182.  https://doi.org/10.1016/j.joi.2015.11.006.CrossRefGoogle Scholar
  8. Goldstein, M., & Uchida, S. (2016). A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE, 11(4), e0152173.  https://doi.org/10.1371/journal.pone.0152173.CrossRefGoogle Scholar
  9. Gorraiz, J., Melero-Fuentes, D., Gumpenberger, C., & Valderrama-Zurián, J.-C. (2016). Availability of digital object identifiers (DOIs) in Web of Science and scopus. Journal of Informetrics, 10(1), 98–109.  https://doi.org/10.1016/j.joi.2015.11.008.CrossRefGoogle Scholar
  10. Haustein, S., Costas, R., & Larivière, V. (2015). Characterizing social media metrics of scholarly papers: The effect of document properties and collaboration patterns. PLoS ONE, 10(5), e0127830.  https://doi.org/10.1371/journal.pone.0120495.CrossRefGoogle Scholar
  11. Huang, M., & Liu, W. (2019). Substantial numbers of easily identifiable illegal DOIs still exist in Scopus. Journal of Informetrics,.  https://doi.org/10.1016/j.joi.2019.03.019.Google Scholar
  12. Jacso, P. (2006). Deflated, inflated and phantom citation counts. Online Information Review, 30(3), 297–309.  https://doi.org/10.1108/14684520610675816.CrossRefGoogle Scholar
  13. Jobmann, A., Hoffmann, C. P., Künne, S., Peters, I., Schmitz, J., & Wollnik-Korn, G. (2014). Altmetrics for large, multidisciplinary research groups: Comparison of current tools. Bibliometrie-Praxis und Forschung, 3(1), 1–19.  https://doi.org/10.5283/bpf.205.Google Scholar
  14. Krauskopf, E. (2019). Missing documents in Scopus: The case of the journal enfermeria nefrologica. Scientometrics, 119(1), 543–547.  https://doi.org/10.1007/s11192-019-03040-z.CrossRefGoogle Scholar
  15. Liu, W., Hu, G., & Tang, L. (2018). Missing author address information in Web of Science-an explorative study. Journal of Informetrics, 12(3), 985–997.  https://doi.org/10.1016/j.joi.2018.07.008.CrossRefGoogle Scholar
  16. Neumann, J., & Brase, J. (2014). DataCite and names for research data. Journal of Computer-Aided Molecular Design, 28(10), 1035–1041.  https://doi.org/10.1007/s10822-014-9776-5.CrossRefGoogle Scholar
  17. Paskin, N. (1999). The digital object identifier system: Digital technology meets content management. Interlending & Document Supply, 27(1), 13–16.  https://doi.org/10.1108/02641619910255829.CrossRefGoogle Scholar
  18. Paskin, N. (2010). Digital object identifier (DOI) system. In A. Kent (Ed.), Encyclopedia of library and information sciences (3rd ed., pp. 1586–1592). Milton Park: Taylor and Francis.Google Scholar
  19. Sidman, D., & Davidson, T. (2001). A practical guide to automating the digital supply chain with the digital object identifier (DOI). Publishing Research Quarterly, 17(2), 9–23.  https://doi.org/10.1007/s12109-001-0019-y.CrossRefGoogle Scholar
  20. Simmonds, A. W. (1999). The digital object identifier (DOI). Publishing Research Quarterly, 15(2), 10–13.  https://doi.org/10.1007/s12109-999-0022-2.CrossRefGoogle Scholar
  21. Tang, L., Hu, G., & Liu, W. (2017). Funding acknowledgement analysis: Queries and caveats. Journal of the Association for Information Science and Technology, 68(3), 790–794.  https://doi.org/10.1002/asi.23713.CrossRefGoogle Scholar
  22. Valderrama-Zurián, J.-C., Aguilar-Moya, R., Melero-Fuentes, D., & Aleixandre- Benavent, R. (2015). A systematic analysis of duplicate records in Scopus. Journal of Informetrics, 9(3), 570–576.  https://doi.org/10.1016/j.joi.2015.05.002.CrossRefGoogle Scholar
  23. Wang, J. (2007). Digital object identifiers and their use in libraries. Serials Review, 33(3), 161–164.  https://doi.org/10.1016/j.serrev.2007.05.006.CrossRefGoogle Scholar
  24. Xu, S., Liu, J., Zhai, D., An, X., Wang, Z., & Pang, H. (2018). Overlapping thematic structures extraction with mixed-membership stochastic blockmodel. Scientometrics, 117(1), 61–84.  https://doi.org/10.1007/s11192-018-2841-4.CrossRefGoogle Scholar
  25. Zhu, J., Hu, G., & Liu, W. (2019). DOI errors and possible solutions for Web of Science. Scientometrics, 118(2), 709–718.  https://doi.org/10.1007/s11192-018-2980-7.CrossRefGoogle Scholar
  26. Zhu, J., Liu, F., & Liu, W. (2019). The secrets behind Web of Science’s search. Scientometrics, 4, 1745–1753.  https://doi.org/10.1007/s11192-019-03091-2.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2019

Authors and Affiliations

  • Shuo Xu
    • 1
  • Liyuan Hao
    • 1
  • Xin An
    • 2
    Email author
  • Dongsheng Zhai
    • 1
  • Hongshen Pang
    • 3
  1. 1.Research Base of Beijing Modern Manufacturing Development, College of Economics and ManagementBeijing University of TechnologyBeijingPeople’s Republic of China
  2. 2.School of Economics and ManagementBeijing Forestry UniversityBeijingPeople’s Republic of China
  3. 3.Library, Shenzhen UniversityShenzhenPeople’s Republic of China

Personalised recommendations