Skip to main content

OpenAIRE’s DOIBoost - Boosting Crossref for Research

  • Conference paper
  • First Online:
Digital Libraries: Supporting Open Science (IRCDL 2019)

Abstract

Research in information science and scholarly communication strongly relies on the availability of openly accessible datasets of scholarly entities metadata and, where possible, their relative payloads. Since such metadata information is scattered across diverse, freely accessible, online resources (e.g. Crossref, ORCID), researchers in this domain are doomed to struggle with (meta)data integration problems, in order to produce custom datasets of often undocumented and rather obscure provenance. This practice leads to waste of time, duplication of efforts, and typically infringes open science best practices of transparency and reproducibility of science. In this article, we describe how to generate DOIBoost, a metadata collection that enriches Crossref with inputs from Microsoft Academic Graph, ORCID, and Unpaywall for the purpose of supporting high-quality and robust research experiments, saving times to researchers and enabling their comparison. To this end, we describe the dataset value and its schema, analyse its actual content, and share the software Toolkit and experimental workflow required to reproduce it. The DOIBoost dataset and Software Toolkit are made openly available via Zenodo.org. DOIBoost will become an input source to the OpenAIRE information graph.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Crossref APIs, https://www.crossref.org/services/metadata-delivery/rest-api.

  2. 2.

    Microsoft Academic Graph, https://aka.ms/msracad.

  3. 3.

    ORCID, http://orcid.org.

  4. 4.

    Unpaywall, http://unpaywall.org.

  5. 5.

    OpenAIRE EXPLORE, http://explore.openaire.eu.

  6. 6.

    GRID database, https://www.grid.ac.

  7. 7.

    The field “access-rights” can assume the values OPEN, EMBARGO, RESTRICTED, CLOSED, UNKNOWN.

  8. 8.

    Apache Oozie, http://oozie.apache.org.

  9. 9.

    Affero General Public License, https://en.wikipedia.org/wiki/Affero_General_Public_License.

  10. 10.

    Crossref REST API - GitHub, https://github.com/Crossref/rest-api-doc.

  11. 11.

    MAG Schema, https://microsoftdocs.github.io/MAG/Mag-ADLS-Schema.

  12. 12.

    Unpaywall data format, https://unpaywall.org/data-format.

  13. 13.

    Levenshtein Distance, https://en.wikipedia.org/wiki/Levenshtein_distance.

References

  1. Chawla, D.S.: Unpaywall finds free versions of paywalled papers. Nature News (2017)

    Google Scholar 

  2. Sinha, A., et al.: An overview of Microsoft Academic Service (MAS) and applications. In: Proceedings of the 24th International Conference on World Wide Web (WWW 2015 Companion), pp. 243–246. ACM, New York (2015)

    Google Scholar 

  3. Haak, L.L., Fenner, M., Paglione, L., Pentz, E., Ratner, H.: ORCID: a system to uniquely identify researchers. Learn. Publish. 25, 259–264 (2012). https://doi.org/10.1087/20120404

    Article  Google Scholar 

  4. Manghi, P., Bolikowski, L., Manold, N., Schirrwagen, J., Smith, T.: OpenAIREplus: the European scholarly communication data infrastructure. D-Lib Mag. 18(9), 1 (2012)

    Google Scholar 

  5. Fortunato, S., et al.: Science of science. Science 359(6379), eaao0185 (2018)

    Article  Google Scholar 

  6. La Bruzzo, S., Manghi, P., Mannocci, A.: DOIBoost Dataset Dump (Version 1.0) [Data set]. Zenodo (2018). http://doi.org/10.5281/zenodo.1438356

  7. La Bruzzo, S.: DOIBoost Software Toolkit (Version 1.0). Zenodo, 1 October 2018. http://doi.org/10.5281/zenodo.1441058

Download references

Acknowledgements

This work could be delivered thanks to the Open Science policies enacted by Microsoft, Unpaywall, ORCID, and Crossref, which are allowing researchers to openly collect their metadata records for the purpose of research under CC-0 and CC-BY licenses. The MAG dataset is available with ODC-BY license thanks to the Azure4research sponsorship signed between Microsoft Research and KMi. This work was partially funded by the EU projects OpenAIRE2020 (H2020-EINFRA-2014-1, grant agreement: 643410) and OpenAIRE-Advance H2020 project (grant number: 777541; call: H2020-EINFRA-2017) [4].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Mannocci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

La Bruzzo, S., Manghi, P., Mannocci, A. (2019). OpenAIRE’s DOIBoost - Boosting Crossref for Research. In: Manghi, P., Candela, L., Silvello, G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, Cham. https://doi.org/10.1007/978-3-030-11226-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-11226-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-11225-7

  • Online ISBN: 978-3-030-11226-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics