Skip to main content

Representing Annotated Texts as RDF

  • Chapter
  • First Online:
Book cover Linguistic Linked Data

Abstract

Text annotation consists in defining markables (elements to be annotated), their features (attributes and values of annotations) and relations between markables (e.g. syntactic dependencies or semantic links). In this chapter we describe the principles for annotating text data using RDF-compliant formalisms. These principles provide the basis for making annotated corporate and text collections accessible from the LLOD ecosystem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, R. Weischedel, OntoNotes: the 90% solution, in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2006) (Association for Computational Linguistics, New York, 2006), pp. 57–60

    Google Scholar 

  2. J. Nivre, Ž. Agić, L. Ahrenberg, et. al., Universal dependencies 1.4 (2016). http://hdl.handle.net/11234/1-1827

  3. N. Ide, C. Chiarcos, M. Stede, S. Cassidy, Designing annotation schemes: from model to representation, in Handbook of Linguistic Annotation, ed. by N. Ide, J. Pustejovsky, Text, Speech, and Language Technology (Springer, Berlin, 2017)

    Chapter  Google Scholar 

  4. C. Chiarcos, Ontologies of linguistic annotation: survey and perspectives, in Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), Istanbul, 2012, pp. 303–310

    Google Scholar 

  5. K. Verspoor, K. Livingston, Towards adaptation of linguistic annotations to scholarly annotation formalisms on the Semantic Web, in Proceedings of the 6th Linguistic Annotation Workshop (Association for Computational Linguistics, Jeju, 2012), pp. 75–84

    Google Scholar 

  6. L. Isaksen, R. Simon, E.T. Barker, P. de Soto Cañamares, Pelagios and the emerging graph of ancient world data, in Proceedings of the 2014 ACM Conference on Web Science (ACM, New York, 2014), pp. 197–201

    Google Scholar 

  7. R. Sanderson, P. Ciccarese, B. Young, Web Annotation Data Model. Technical Report, W3C Recommendation (2017). https://www.w3.org/TR/annotation-model/

  8. P. Ciccarese, M. Ocana, L.J. Garcia Castro, S. Das, T. Clark, An open annotation ontology for science on web 3.0, J. Biomed. Semant. 2(Suppl. 2), S4 (2011). https://doi.org/10.1186/2041-1480-2-S2-S4, http://www.jbiomedsem.com/content/2/S2/S4/abstract

    Article  Google Scholar 

  9. D.C. Comeau, R. Islamaj Doğan, P. Ciccarese, K.B. Cohen, M. Krallinger, F. Leitner, Z. Lu, Y. Peng, F. Rinaldi, M. Torii, et al., BioC: a minimalist approach to interoperability for biomedical text processing, Database 2013, bat064 (2013)

    Google Scholar 

  10. R. Sanderson, P. Ciccarese, H. Van de Sompel, Designing the W3C Open Annotation data model, in Proceedings of the 5th Annual ACM Web Science Conference, WebSci ’13 (ACM, New York, 2013), pp. 366–375. https://doi.org/10.1145/2464464.2464474

    Book  Google Scholar 

  11. R. Sanderson, P. Ciccarese, B. Young, Web Annotation vocabulary. Technical Report, W3C Recommendation (2017). https://www.w3.org/TR/annotation-vocab/

  12. P. Mendes, M. Jakob, A. García-Silva, C. Bizer, DBpedia Spotlight: shedding light on the web of documents, in Proceedings of the 7th International Conference on Semantic Systems (I-Semantics 2011), Graz, 2011

    Google Scholar 

  13. S. Hellmann, NIF 2.0 Core Ontology. Technical Report, AKSW, University Leipzig (2015). http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core.html, version of 08-04-2015. Accessed 9 July 2019

  14. E. Wilde, M. Duerst, RFC 5147 – URI fragment identifiers for the text/plain media type. Technical Report, Internet Engineering Task Force (IETF), Network Working Group (2008)

    Google Scholar 

  15. N. Freed, N. Borenstein, RFC 2046 – Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. Technical Report, Internet Engineering Task Force (IETF), Network Working Group (1996)

    Google Scholar 

  16. P. Grosso, E. Maler, J. Marsh, N. Walsh, XPointer Framework. W3C Recommendation 25 March 2003. Technical Report, W3C (2003)

    Google Scholar 

  17. A. Fokkens, A. Soroa, Z. Beloki, N. Ockeloen, G. Rigau, W.R. van Hage, P. Vossen, NAF and GAF: Linking linguistic annotations, in Proceedings of the 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation (2014), pp. 9–16

    Google Scholar 

  18. N. Ide, K. Suderman, E. Nyberg, J. Pustejovsky, M. Verhagen, LAPPS/Galaxy: Current state and next steps, in Proceedings of the 3rd International Workshop on Worldwide Language Service Infrastructure and 2nd Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016) (2016), pp. 11–18

    Google Scholar 

  19. S. Hellmann, J. Lehmann, S. Auer, M. Brümmer, Integrating NLP using Linked Data, in Proceedings of the 12th International Semantic Web Conference, 21–25 October 2013, Sydney, 2013. Also see http://persistence.uni-leipzig.org/nlp2rdf/

  20. M. Egner, M. Lorch, E. Biddle, UIMA Grid: Distributed large-scale text analysis, in Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid (CCGRID’07), Rio de Janeiro, 2007, pp. 317–326

    Google Scholar 

  21. H. Cunningham, GATE, a general architecture for text engineering. Comput. Hum. 36(2), 223 (2002)

    Google Scholar 

  22. S. Hellmann, J. Lehmann, S. Auer, Linked-data aware URI schemes for referencing text fragments, in Proceedings of the International Conference on Knowledge Engineering and Knowledge Management (Springer, Berlin, 2012), pp. 175–184

    Google Scholar 

  23. M. Davis, K. Whistler, Unicode Standard Annex #15. Unicode Normalization Forms. Technical Report, Unicode, Inc. (2017). Unicode 10.0.0, version of 2017-05-26, revision 45

    Google Scholar 

  24. E. Brill, J. Wu, Classifier combination for improved lexical disambiguation, in Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montréal, 1998, pp. 191–195

    Google Scholar 

  25. M.P. Marcus, B. Santorini, M.A. Marcinkiewicz, Building a large annotated corpus of English: the Penn treebank. Comput. Linguist. 19, 313 (1993)

    Google Scholar 

  26. S. Hellmann, M. Brümmer, M. Ackermann, Provenance and confidence for NIF annotations. Technical Report, AKSW, University of Leipzig, Germany (2016). Version of Oct 17, 2016

    Google Scholar 

  27. E. Rubiera, L. Polo, D. Berrueta, A. El Ghali, TELIX: An RDF-based model for linguistic annotation, in Proceedings of the 9th Extended Semantic Web Conference (ESWC 2012), Heraklion, 2012

    Google Scholar 

  28. A. Miles, S. Bechhofer, SKOS Simple Knowledge Organization System eXtension for Labels (SKOS-XL). Technical Report, W3C Recommendation (2009)

    Google Scholar 

  29. R. Agerri, I. Aldabe, E. Laparra, G. Rigau Claramunt, A. Fokkens, P. Huijgen, R. Izquierdo Beviá, M. van Erp, P. Vossen, A.L. Minard, et al., Multilingual event detection using the NewsReader pipelines, in Proceedings of the Workshop on Cross-Platform Text Mining and Natural Language Processing Interoperability, collocated with International Conference on Language Resources and Evaluation (LREC) (2016)

    Google Scholar 

  30. M. Verhagen, K. Suderman, D. Wang, N. Ide, C. Shi, J. Wright, J. Pustejovsky, The LAPPS Interchange Format, in Proceedings of the International Workshop on Worldwide Language Service Infrastructure (Springer, Berlin, 2015), pp. 33–47

    Google Scholar 

  31. B. Bohnet, J. Kuhn, The best of both worlds: a graph-based completion model for transition-based parsers, in Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (Association for Computational Linguistics, Stroudsburg, 2012), pp. 77–87

    Google Scholar 

  32. A. Gangemi, V. Presutti, D. Reforgiato Recupero, A.G. Nuzzolese, F. Draicchio, M. Mongiovì, Semantic Web machine reading with FRED Semantic Web 8(6), 873 (2017)

    Article  Google Scholar 

  33. R. Witte, B. Sateli, The LODeXporter: flexible generation of linked open data triples from NLP frameworks for automatic knowledge base construction, in Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC) (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cimiano, P., Chiarcos, C., McCrae, J.P., Gracia, J. (2020). Representing Annotated Texts as RDF. In: Linguistic Linked Data. Springer, Cham. https://doi.org/10.1007/978-3-030-30225-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30225-2_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30224-5

  • Online ISBN: 978-3-030-30225-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics