Skip to main content

The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2019 (ISWC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11779))

Included in the following conference series:

Abstract

In this paper, we present the Microsoft Academic Knowledge Graph (MAKG), a large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study. The data set is licensed under the Open Data Commons Attribution License (ODC-By). By providing the data as RDF dump files as well as a data source in the Linked Open Data cloud with resolvable URIs and links to other data sources, we bring a vast amount of scholarly data to the Web of Data. Furthermore, we provide entity embeddings for all 210 million represented publications. We facilitate a number of use case scenarios, particularly in the field of digital libraries, such as (1) entity-centric exploration of papers, researchers, affiliations, etc.; (2) data integration tasks using RDF as a common data model and links to other data sources; and (3) data analysis and knowledge discovery of scholarly data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The values are based on SPARQL queries executed against our data set presented in Sect. 3.

  2. 2.

    See https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/.

  3. 3.

    Both the initial MAG data set and the MAKG provided by us are licensed under the Open Data Commons Attribution License (ODC-By; https://opendatacommons.org/licenses/by/1-0/index.html; last access: April 9, 2019).

  4. 4.

    The source code is available online at https://github.com/michaelfaerber/MAG2RDF.

  5. 5.

    See https://www.grid.ac/.

  6. 6.

    The MAKG is also available at the persistent URI https://w3id.org/makg/.

  7. 7.

    See http://doi.org/10.5281/zenodo.2159723.

  8. 8.

    See the S3 bucket arn:aws:s3:::ma-kg.

  9. 9.

    See, e.g., curl -H"Accept:text/n3" http://ma-graph.org/entity/2826592117 and curl -H "Accept:text/ttl" http://ma-graph.org/entity/2826592117.

  10. 10.

    See https://www.springernature.com/de/researchers/scigraph.

  11. 11.

    See http://wikicite.org/.

  12. 12.

    In our paper, the term “citations” refers to in-text citations while “references” refers to links on the document level.

  13. 13.

    See http://clair.eecs.umich.edu/aan/index.php.

  14. 14.

    See https://www.comp.nus.edu.sg/~sugiyama/Dataset2.html.

  15. 15.

    The source code is online available at https://github.com/michaelfaerber/makg-linking. The mappings are available as nt files with owl:sameAs statements on our website.

  16. 16.

    Note that only the number of citations is listed and not the number of references, because references are modeled in the MAKG via a relation (cito:cites). There are 1,380,196,397 references in the MAKG.

  17. 17.

    See https://docs.microsoft.com/en-us/academic-services/graph/get-started-setup-provisioning#open-data-license-odc-by.

  18. 18.

    See http://lov.okfn.org/vocommons/voaf.

  19. 19.

    See http://www.w3.org/TR/void/.

  20. 20.

    See http://5stardata.info/.

  21. 21.

    Sinha et al. [1] have obtained 187 citations as of March 29, 2019, according to Google Scholar.

  22. 22.

    See https://doi.org/10.5281/zenodo.2159723 (as of April 10, 2019). Note that the data set is also available at http://ma-graph.org/ and on Amazon S3.

  23. 23.

    See http://ma-graph.org/usage-statistics/ for usage statistics concerning the website and the SPARQL endpoint.

  24. 24.

    See https://www.openacademic.ai/oag/.

References

  1. Sinha, A., et al.: An overview of Microsoft Academic Service (MAS) and applications. In: Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, pp. 243–246 (2015)

    Google Scholar 

  2. Peroni, S., Dutton, A., Gray, T., Shotton, D.M.: Setting our bibliographic references free: towards open citation data. J. Doc. 71(2), 253–277 (2015)

    Article  Google Scholar 

  3. Aleman-Meza, B., Hakimpour, F., Arpinar, I.B., Sheth, A.P.: SwetoDblp ontology of computer science publications. J. Web Semant. 5(3), 151–155 (2007)

    Article  Google Scholar 

  4. Wang, R., et al.: AceKG: a large-scale knowledge graph for academic data mining. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, pp. 1487–1490 (2018)

    Google Scholar 

  5. Aslam, M.A., Aljohani, N.R.: SPedia: a central hub for the linked open data of scientific publications. Int. J. Semant. Web Inf. Syst. 13(1), 128–146 (2017)

    Article  Google Scholar 

  6. Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A.: Conference linked data: the scholarlydata project. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 150–158. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_16

    Chapter  Google Scholar 

  7. Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A.: Semantic web conference ontology - a refactoring solution. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 84–87. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47602-5_18

    Chapter  Google Scholar 

  8. Gentile, A.L., Acosta, M., Costabello, L., Nuzzolese, A.G., Presutti, V., Recupero, D.R.: Conference live: accessible and sociable conference semantic data. In: Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, pp. 1007–1012 (2015)

    Google Scholar 

  9. Konstantinou, N., Spanos, D., Houssos, N., Mitrou, N.: Exposing scholarly information as Linked Open Data: RDFizing DSpace contents. Electron. Libr. 32(6), 834–851 (2014)

    Article  Google Scholar 

  10. Peroni, S., Shotton, D.: The SPAR ontologies. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 119–136. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00668-6_8

    Chapter  Google Scholar 

  11. Zhang, L., Rettinger, A.: X-LiSA: cross-lingual semantic annotation. PVLDB 7(13), 1693–1696 (2014)

    Google Scholar 

  12. Färber, M., Thiemann, A., Jatowt, A.: A high-quality gold standard for citation-based tasks. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, pp. 1885–1889 (2018)

    Google Scholar 

  13. Saier, T., Färber, M.: Bibliometric-enhanced arXiv: a data set for paper-based and citation-based tasks. In: Proceedings of the 8th International Workshop on Bibliometric-enhanced Information Retrieval, BIR 2019, pp. 14–26 (2019)

    Google Scholar 

  14. Herrmannova, D., Knoth, P.: An analysis of the Microsoft academic graph. D-Lib Mag. 22(9/10) (2016)

    Google Scholar 

  15. Janowicz, K., Hitzler, P., Adams, B., Kolas, D., Vardeman, C.: Five stars of linked data vocabulary use. Semant. Web 5(3), 173–176 (2014)

    Article  Google Scholar 

  16. Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 498–514. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_30

    Chapter  Google Scholar 

  17. Carrasco, M.H., Luján-Mora, S., Maté, A., Trujillo, J.: Current state of linked data in digital libraries. J. Inf. Sci. 42(2), 117–127 (2016)

    Article  Google Scholar 

  18. Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Towards a knowledge graph representing research findings by semantifying survey articles. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 315–327. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_25

    Chapter  Google Scholar 

  19. Färber, M., Nishioka, C., Jatowt, A.: ScholarSight: visualizing temporal trends of scientific concepts. In: Proceedings of the 19th ACM/IEEE on Joint Conference on Digital Libraries, JCDL 2019, pp. 436–437 (2019)

    Google Scholar 

  20. Färber, M., Sampath, A., Jatowt, A.: PaperHunter: a system for exploring papers and citation contexts. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds.) ECIR 2019. LNCS, vol. 11438, pp. 246–250. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15719-7_33

    Chapter  Google Scholar 

  21. Hug, S.E., Ochsner, M., Brändle, M.P.: Citation analysis with Microsoft academic. Scientometrics 111(1), 371–378 (2017)

    Article  Google Scholar 

  22. Mohapatra, D., Maiti, A., Bhatia, S., Chakraborty, T.: Go wide, go deep: quantifying the impact of scientific papers through influence dispersion trees. In: Proceedings of the 19th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019, pp. 305–314 (2019)

    Google Scholar 

  23. Fire, M., Guestrin, C.: Over-optimization of academic publishing metrics: observing Goodhart’s law in action. CoRR abs/1809.07841 (2018)

    Google Scholar 

  24. Hoffman, M.R., Ibáñez, L.-D., Fryer, H., Simperl, E.: Smart papers: dynamic publications on the blockchain. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 304–318. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_20

    Chapter  Google Scholar 

  25. Jaradeh, M.Y., Auer, S., Prinz, M., Kovtun, V., Kismihók, G., Stocker, M.: Open research knowledge graph: towards machine actionability in scholarly communication. CoRR abs/1901.10816 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Färber .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Färber, M. (2019). The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data. In: Ghidini, C., et al. The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science(), vol 11779. Springer, Cham. https://doi.org/10.1007/978-3-030-30796-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30796-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30795-0

  • Online ISBN: 978-3-030-30796-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics