Advertisement

Introduction

  • Philipp Cimiano
  • Christian Chiarcos
  • John P. McCrae
  • Jorge Gracia
Chapter

Abstract

Digital language resources, comprising spoken and written material, are key to many fields, including linguistics research, lexicography, typology, the study of minority or extinct languages, but also to the development of machine-learned models for automated natural language processing (NLP). Language resources are thus an important cultural asset that need not only to be preserved, we need to also make sure that these resources can be reused as much as possible. In particular, a crucial issue is to maximize secondary reuse of language resources, that is ensuring that the data can be used by others for a different purpose than it was originally collected for. However, secondary reuse is in many cases hindered by a number of proprietary choices made by the data collector. Language resources (dictionaries, terminologies, corpora, etc.) developed in the fields of corpus linguistics, computational linguistics and natural language processing (NLP) are often encoded in heterogeneous formats and developed in isolation from one another. This makes their discovery, reuse and integration for both the development of NLP tools and daily linguistic research a difficult and cumbersome task. In order to alleviate such an issue and to enhance interoperability of language resources on the Web, a community of language technology experts and practitioners has started adopting techniques coming from the field of linked data (LD). The LD paradigm emerged as a series of best practices and principles for exposing, sharing and connecting data on the Web.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M.D. Wilkinson, M. Dumontier, I.J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.W. Boiten, L.B. da Silva Santos, P.E. Bourne et al., The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016)CrossRefGoogle Scholar
  2. 2.
    C. Bizer, T. Heath, T. Berners-Lee, Linked data-the story so far. Int. J. Semant. Web Inf. Syst. 14, 205 (2009)Google Scholar
  3. 3.
    T. Berners-Lee, R. Fielding, L. Masinter, Uniform Resource Identifier (URI): Generic Syntax (RFC 3986). Technical Report W3C (2005), http://www.ietf.org/rfc/rfc3986.txt
  4. 4.
    G. Klyne, J. Carroll, B. McBride, Resource Description Framework (RDF): Concepts and Abstract Syntax. Technical Report W3C Recommendation (2004), http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/
  5. 5.
    S. Harris, A. Seaborne, SPARQL 1.1 query language. W3C recommendation, World Wide Web Consortium (2013)Google Scholar
  6. 6.
    T. Berners-Lee, J. Hendler, O. Lassila et al., The Semantic Web. Sci. Am. 284(5), 28 (2001)Google Scholar
  7. 7.
    C. Chiarcos, S. Hellmann, S. Nordhoff, The Open Linguistics Working Group of the Open Knowledge Foundation, in Linked Data in Linguistics (Springer, Heidelberg, 2012), pp. 153–160Google Scholar
  8. 8.
    J. McCrae, C. Chiarcos, F. Bond, P. Cimiano, T. Declerck, The Open Linguistics Working Group: developing the Linguistic Linked Open Data cloud, in Proceedings of the 10th Language Resources and Evaluation Conference (LREC), Portoroz, 2016, pp. 2435–2441Google Scholar
  9. 9.
    T. Declerck, P. Lendvai, K. Mörth, G. Budin, T. Váradi, Towards linked language data for digital humanities, in Linked Data in Linguistics (Springer, Berlin, 2012), pp. 109–116CrossRefGoogle Scholar
  10. 10.
    C. Chiarcos, J. McCrae, P. Cimiano, Towards open data for linguistics: linguistic linked data, in New Trends of Research in Ontologies and Lexical Resources (Springer, Berlin, 2013), pp. 7–25Google Scholar
  11. 11.
    S. Hellmann, J. Lehmann, S. Auer, M. Brümmer, Integrating NLP using linked data, in Proceedings of the International Semantic Web Conference (ISWC) (Springer, Berlin, 2013), pp. 98–113Google Scholar
  12. 12.
    P. Cimiano, J.P. McCrae, T. Gornostay, B. Siemoneit, A. Lagzdins, Linked terminology: applying linked data principles to terminological resources, in Proceedings of the 4th Biennial Conference on Electronic Lexicography (eLex) (2015), pp. 1–11Google Scholar
  13. 13.
    J. McCrae, C. Fellbaum, P. Cimiano, Publishing and linking WordNet using lemon and RDF, in Proceedings of the 3rd Workshop on Linked Data in Linguistics (2014)Google Scholar
  14. 14.
    T. Flati, R. Navigli, Three birds (in the LLOD cloud) with one stone: BabelNet, Babelfy and the Wikipedia Bitaxonomy, in Proceedings of SEMANTiCS (2014)Google Scholar
  15. 15.
    I. El Maarouf, E. Alferov, D. Cooper, Z. Fang, H. Mousselly-Sergieh, H. Wang, The GuanXi network: a new multilingual LLOD for language learning applications, in Proceedings of the 2nd Workshop on Natural Language Processing and Linked Open Data (NLP&LOD2) (2015), p. 42Google Scholar
  16. 16.
    M. Villegas, M. Melero, N. Bel, J. Gracia, Leveraging RDF graphs for crossing multiple bilingual dictionaries, in Proceedings of the 10th Language Resources and Evaluation Conference (LREC), Portoroz (2016)Google Scholar
  17. 17.
    E. González-Blanco, G. Del Río, C.I. Martínez Cantón, Linked open data to represent multilingual poetry collections. A proposal to solve interoperability issues between poetic repertoires, in Proceedings of the 5th Workshop on Linked Data in Linguistics (LDL 2016): Managing, Building and Using Linked Language Resources, Portoroz (May 2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Semantic Computing GroupBielefeld UniversityBielefeldGermany
  2. 2.Angewandte ComputerlinguistikGoethe-UniversityFrankfurt am MainGermany
  3. 3.Insight Centre for Data AnalyticsNational University of IrelandGalwayIreland
  4. 4.Aragon Institute of Engineering Research (I3A)University of ZaragozaZaragozaSpain

Personalised recommendations