Advertisement

Modelling Linguistic Annotations

  • Philipp Cimiano
  • Christian Chiarcos
  • John P. McCrae
  • Jorge Gracia
Chapter

Abstract

This chapter describes how linguistic annotations can be represented in RDF. Web Annotation and NIF provide the means to reference text segments on the web. Yet, representing linguistic annotations requires appropriate vocabularies. We discuss relevant vocabularies and illustrate how they can be applied to support annotation at different levels.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    N. Ide, C. Chiarcos, M. Stede, S. Cassidy, Designing annotation schemes: from model to representation, in Handbook of Linguistic Annotation, ed. by N. Ide, J. Pustejovsky. Text, Speech, and Language Technology (Springer, Berlin, 2017)CrossRefGoogle Scholar
  2. 2.
    S. Bird, M. Liberman, A formal framework for linguistic annotation. Speech Commun. 33(1–2), 23 (2001)CrossRefGoogle Scholar
  3. 3.
    N. Ide, K. Suderman, The Linguistic Annotation Framework: a standard for annotation interchange and merging. Lang. Resour. Eval. 48(3), 395 (2014)CrossRefGoogle Scholar
  4. 4.
    ISO, ISO 24612:2012. Language resource management—Linguistic Annotation Framework. Technical Report, ISO/TC 37/SC 4, Language resource management (2012). https://www.iso.org/standard/37326.html
  5. 5.
    N. Ide, K. Suderman, GrAF: a graph-based format for linguistic annotations, in Proceedings of the 1st Linguistic Annotation Workshop (LAW 2007), Prague, 2007, pp. 1–8Google Scholar
  6. 6.
    C. Chiarcos, S. Dipper, M. Götze, U. Leser, A. Lüdeling, J. Ritz, M. Stede, A flexible framework for integrating annotations from different tools and tag sets. TAL (Traitement Automatique des Langues) 49(2), 217 (2008)Google Scholar
  7. 7.
    W. Bosma, P. Vossen, A. Soroa, G. Rigau, M. Tesconi, A. Marchetti, M. Monachini, C. Aliprandi, KAF: a generic semantic annotation format, in Proceedings of the 5th International Conference on Generative Approaches to the Lexicon GL 2009, Pisa, 2009Google Scholar
  8. 8.
    R. Eckart, Choosing an XML database for linguistically annotated corpora, in Sprache und Datenverarbeitung. Proceedings of the KONVENS 2008 Workshop on Datenbanktechnologien für Hypermediale Linguistische Anwendungen, Berlin, 2008Google Scholar
  9. 9.
    A. Burchardt, S. Padó, D. Spohr, A. Frank, U. Heid, Formalising multi-layer corpora in OWL/DL—Lexicon modelling, querying and consistency control, in Proceedings of the 3rd International Joint Conference on NLP (IJCNLP), Hyderabad, 2008, pp. 389–396Google Scholar
  10. 10.
    S. Cassidy, An RDF realisation of LAF in the DaDa annotation server, in Proceedings of the 5th Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation (ISA-5), Hong Kong, 2010Google Scholar
  11. 11.
    A. Fokkens, A. Soroa, Z. Beloki, N. Ockeloen, G. Rigau, W.R. van Hage, P. Vossen, NAF and GAF: linking linguistic annotations, in Proceedings of the 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation (2014), pp. 9–16Google Scholar
  12. 12.
    E. Rubiera, L. Polo, D. Berrueta, A. El Ghali, TELIX: an RDF-based model for linguistic annotation, in Proceedings of the 9th Extended Semantic Web Conference (ESWC 2012), Heraklion, 2012Google Scholar
  13. 13.
    S. Hellmann, J. Lehmann, S. Auer, M. Brümmer, Integrating NLP using linked data, in Proceedings of the 12th International Semantic Web Conference (ISWC). Lecture Notes in Computer Science, vol. 8219 (Springer, Heidelberg, 2013), pp. 98–113CrossRefGoogle Scholar
  14. 14.
    N. Ide, K. Suderman, E. Nyberg, J. Pustejovsky, M. Verhagen, LAPPS/Galaxy: current state and next steps, in Proceedings of the 3rd International Workshop on Worldwide Language Service Infrastructure and 2nd Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016) (2016), pp. 11–18Google Scholar
  15. 15.
    O. Christ, A modular and flexible architecture for an integrated corpus query system, in Proceedings of the 3rd Conference on Computational Lexicography and Text Research (COMPLEX’94), Budapest, 1994Google Scholar
  16. 16.
    A. Kilgarriff, V. Baisa, J. Bušta, M. Jakubíček, V. Kovář, J. Michelfeit, P. Rychlý, V. Suchomel, The Sketch Engine: ten years on. Lexicography 1(1), 7 (2014). https://doi.org/10.1007/s40607-014-0009-9 CrossRefGoogle Scholar
  17. 17.
    C. Chiarcos, C. Fäth, CoNLL-RDF: Linked corpora done in an NLP-friendly way, in Proceedings of the 1st International Conference on Language, Data, and Knowledge, LDK 2017, ed. by J. Gracia, F. Bond, J.P. McCrae, P. Buitelaar, C. Chiarcos, S. Hellmann (Springer, Cham, 2017), pp. 74–88. https://doi.org/10.1007/978-3-319-59888-8_6 Google Scholar
  18. 18.
    J. Nivre, Ž. Agić, L. Ahrenberg, et al., Universal dependencies 1.4 (2016). http://hdl.handle.net/11234/1-1827
  19. 19.
    S. Brants, S. Hansen, Developments in the TIGER annotation scheme and their realization in the corpus, in Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), Las Palmas, 2002, pp. 1643–1649Google Scholar
  20. 20.
    W. Lezius, H. Biesinger, C. Gerstenberger, TigerXML quick reference guide. Technical Report, IMS, University of Stuttgart (2002)Google Scholar
  21. 21.
    K.K. Schuler, VerbNet: a broad-coverage, comprehensive verb lexicon. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA (2005). AAI3179808Google Scholar
  22. 22.
    J. Eckle-Kohler, J. McCrae, C. Chiarcos, lemonUby—a large, interlinked, syntactically-rich resource for ontologies. Semantic Web J. 6(4), 371 (2015)CrossRefGoogle Scholar
  23. 23.
    C. Chiarcos, Interoperability of corpora and annotations, in Linked Data in Linguistics, ed. by C. Chiarcos, S. Nordhoff, S. Hellmann (Springer, Heidelberg, 2012), pp. 161–179CrossRefGoogle Scholar
  24. 24.
    C. Chiarcos, POWLA: modeling linguistic corpora in OWL/DL, in Proceedings of the 9th Extended Semantic Web Conference (ESWC-2012), Heraklion, 2012, pp. 225–239Google Scholar
  25. 25.
    N. Mazziotta, Building the syntactic reference corpus of medieval French using NotaBene RDF annotation tool, in Proceedings of the 4th Linguistic Annotation Workshop (Association for Computational Linguistics, Stroudsburg, 2010), pp. 142–146Google Scholar
  26. 26.
    S. Hellmann, J. Lehmann, S. Auer, M. Brümmer, Integrating NLP using linked data, in Proceedings of the 12th International Semantic Web Conference, 21–25 October 2013, Sydney, 2013. Also see http://persistence.uni-leipzig.org/nlp2rdf/
  27. 27.
    S. Dipper, M. Götze, Accessing heterogeneous linguistic data—generic XML-based representation and flexible visualization, in Proceedings of the 2nd Language & Technology Conference 2005, Poznan, 2005, pp. 23–30Google Scholar
  28. 28.
    M.G. Stefanie Dipper, ANNIS: complex multilevel annotations in a linguistic database, in Proceedings of the 5th Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing, Trento, 2006Google Scholar
  29. 29.
    N. Ide, L. Romary, International standard for a Linguistic Annotation Framework. Nat. Lang. Eng. 10(3–4), 211 (2004)CrossRefGoogle Scholar
  30. 30.
    N. Ide, K. Suderman, GrAF: a graph-based format for linguistic annotations, in Proceedings of the Linguistic Annotation Workshop. Prague (Association for Computational Linguistics, Stroudsburg, 2007), pp. 1–8Google Scholar
  31. 31.
    M. Stede, H. Bieler, S. Dipper, A. Suriyawongk, Summar: combining linguistics and statistics for text summarization, in Proceedings of the 17th European Conference on Artificial Intelligence (ECAI), Riva del Garda, 2006, pp. 827–828Google Scholar
  32. 32.
    A. Zeldes, J. Ritz, A. Lüdeling, C. Chiarcos, ANNIS: a search tool for multi-layer annotated corpora, in Corpus Linguistics, Liverpool, 2009, pp. 20–23Google Scholar
  33. 33.
    F. Zipser, L. Romary, A model oriented approach to the mapping of annotation formats using standards, in Proceedings of the Workshop on Language Resources and Language Technology Standards, collocated with LREC (LR&LTS 2010), Valetta, 2010Google Scholar
  34. 34.
    N. Ide, C.F. Baker, C. Fellbaum, C.J. Fillmore, R. Passonneau, MASC: the manually annotated sub-corpus of American English, in Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC-2008), Marrakech, 2008, pp. 2455–2461Google Scholar
  35. 35.
    D.A. de Araujo, S.J. Rigo, J.L.V. Barbosa, Ontology-based information extraction for juridical events with case studies in Brazilian legal realm. Artif. Intell. Law 25(4), 379 (2017)Google Scholar
  36. 36.
    C. Chiarcos, C. Fäth, Graph-based annotation engineering: towards a gold corpus for Role and Reference Grammar, in Proceedings of the 2nd Conference on Language, Data and Knowledge (LDK). OpenAccess Series in Informatics (Schloss Dagstuhl, Leibniz-Zentrum fuer Informatik, 2019)Google Scholar
  37. 37.
    C. Chiarcos, B. Kosmehl, C. Fäth, M. Sukhareva, Analyzing Middle High German syntax with RDF and SPARQL, in Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC) (Miyazaki, Japan, 2018)Google Scholar
  38. 38.
    T. Krause, U. Leser, A. Lüdeling, graphANNIS: a fast query engine for deeply annotated linguistic corpora. J. Lang. Technol. Comput. Linguist. 31(1), 1 (2016)Google Scholar
  39. 39.
    M. Marcus, B. Santorini, M.A. Marcinkiewicz, Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19(2), 313 (1993)Google Scholar
  40. 40.
    P. Kingsbury, M. Palmer, From TreeBank to PropBank, in Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), Las Palmas, 2002Google Scholar
  41. 41.
    E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, R. Weischedel, OntoNotes: the 90% solution, in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL) (Association for Computational Linguistics, New York, 2006), pp. 57–60Google Scholar
  42. 42.
    L. Carlson, D. Marcu, M.E. Okurowski, Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory, in Current and New Directions in Discourse and Dialogue, ed. by J. van Kuppevelt, R. Smith. Text, Speech, and Language Technology, vol. 22, chap. 5 (Kluwer, Dordrecht, 2003)Google Scholar
  43. 43.
    P. Mendes, M. Jakob, A. García-Silva, C. Bizer, DBpedia SpotLight: shedding light on the web of documents, in Proceedings of the 7th International Conference on Semantic Systems (I-Semantics 2011), Graz, 2011Google Scholar
  44. 44.
    C. Lai, S. Bird, Querying and updating treebanks: a critical survey and requirements analysis, in Proceedings of the Australasian Language Technology Workshop (2004), pp. 139–146Google Scholar
  45. 45.
    M. Kouylekov, S. Oepen, Semantic technologies for querying linguistic annotations: an experiment focusing on graph-structured data, in Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC) (Reykjavik, Iceland, 2014)Google Scholar
  46. 46.
    A. Frank, C. Ivanovic, Building literary corpora for computational literary analysis—a prototype to bridge the gap between CL and DH, in Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), Miyazaki, May 7–12, 2018Google Scholar
  47. 47.
    P. Banski, J. Bingel, N. Diewald, E. Frick, M. Hanl, M. Kupietz, P. Pezik, C. Schnober, A. Witt, KorAP: the new corpus analysis platform at IDS Mannheim, in Proceedings of the 6th Language & Technology Conference on Human Language Technology Challenges for Computer Science and Linguistics, December 7–9, 2013, Poznan, (2014), pp. 586–587Google Scholar
  48. 48.
    T. Krause, U. Leser, A. Lüdeling, graphANNIS: a fast query engine for deeply annotated linguistic corpora. JLCL 31(1), 1 (2016)Google Scholar
  49. 49.
    B. Bohnet, J. Kuhn, The best of both worlds: a graph-based completion model for transition-based parsers, in Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (Association for Computational Linguistics, Stroudsburg, 2012), pp. 77–87Google Scholar
  50. 50.
    F. Ferraro, M. Thomas, M.R. Gormley, T. Wolfe, C. Harman, B. Van Durme, Concretely annotated corpora, in Proceedings of the AKBC Workshop at NIPS (2014)Google Scholar
  51. 51.
    N. Ide, J. Pustejovsky (eds.), Designing Annotation Schemes: From Model to Representation. Text, Speech, and Language Technology (Springer, Berlin, 2017)Google Scholar
  52. 52.
    A. Pareja-Lora, M. Blume, B. Lust, C. Chiarcos (eds.), Development of Linguistic Linked Open Data Resources for Collaborative Data-Intensive Research in the Language Sciences (MIT Press, Cambridge, 2019)Google Scholar
  53. 53.
    D. Cavar, O. Baldinger, U.M. Joshua Herring, Y. Zhang, S. Bedekar, S. Panicker, An annotation encoding schema for natural language processing using JSON: NLP JSON schema version 0.1, November 2018. Technical Report, Indiana University (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Semantic Computing GroupBielefeld UniversityBielefeldGermany
  2. 2.Angewandte ComputerlinguistikGoethe-UniversityFrankfurt am MainGermany
  3. 3.Insight Centre for Data AnalyticsNational University of IrelandGalwayIreland
  4. 4.Aragon Institute of Engineering Research (I3A)University of ZaragozaZaragozaSpain

Personalised recommendations