Skip to main content

Community Standards for Linguistically-Annotated Resources

  • Chapter
  • First Online:
Handbook of Linguistic Annotation

Abstract

This chapter provides a broad overview of the state-of-the-art in standards development for language resources, beginning with a brief historical overview to serve as context. It describes in some detail several current, major efforts that define the standardization landscape for language resources today, with the aim of outlining their differences and commonalities and, more generally, identifying the progress that has been made to date as well as the obstacles to definitive standardization. In addition to describing standards that are most applicable to linguistic annotation of text, we include a section that overviews considerations and alternatives for spoken data. We also overview a widely-used and influential de facto standard and consider its role in standards development. Finally, we provide an assessment of the standards landscape and the options available to current and future creators of linguistically-annotated resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that until roughly 2001, the separation of physical format and linguistic information was typically not taken into account in the development of standards for language resources.

  2. 2.

    The Poughkeepsie Principles together with an accounting of the founding assumptions and sponsors of the TEI are available at http://www.tei-c.org/Vault/ED/edp01.htm

  3. 3.

    SGML was formally adopted as an ISO standard in 1986; see [62].

  4. 4.

    The TEI Guidelines were later converted to the Extensible Markup Language (XML) which superseded SGML in the mid-1990s and whose design was influenced by work undertaken in the TEI project.

  5. 5.

    Originally called “remote markup”–see http://www.cs.vassar.edu/CES/CES1-5.html

  6. 6.

    ISLE (International Standards for Language Engineering), a standards-oriented transatlantic initiative, was established in 2000 as a continuation of EAGLES.

  7. 7.

    EAGLES Guidelines are still available at http://www.ilc.cnr.it/EAGLES/browse.html

  8. 8.

    http://nl.ijs.si/ME/

  9. 9.

    chapter “Designing Annotation Schemes: From Model to Representation” - Sect. 2 in this volume provides a history of the development of standards for physical format.

  10. 10.

    See chapter “Designing Annotation Schemes: From Model to Representation”, Sect. 5.2.

  11. 11.

    In particular, the TEI Guidelines contain a wealth of examples for each element and the major constructs they allow.

  12. 12.

    For instance, the class att.global, which contains general purpose attributes such as the W3Cs @xml:id and @xml:lang and the TEI’s generic @n (for local numbering) and @rend (for rendering information).

  13. 13.

    ISO 24610-1:2006 Language resource management – Feature structures – Part 1: Feature structure representation.

  14. 14.

    See the implementation in the Polish National corpus [98].

  15. 15.

    See for instance [103] for introducing TBX entries within a TEI document.

  16. 16.

    See http://morphadorner.northwestern.edu, with the annotation tagset described in http://panini.northwestern.edu/mmueller/nupos.pdf

  17. 17.

    http://nl.ijs.si/ME/

  18. 18.

    See http://nl.ijs.si/jos/; http://eng.slovenscina.eu/; and http://nl.ijs.si/imp/

  19. 19.

    See also chapter “Designing Annotation Schemes: From Model to Representation”, Sect. 3.2.4 for a description of the MMAX2 annotation tool.

  20. 20.

    The reference specification of the TEI-based TXM pivot format is available at http://txm.sourceforge.net/wiki/index.php/XML-TXM

  21. 21.

    This is a version based on ISO 24615:2010 SynAF, with the title changed.

  22. 22.

    See chapter “Case Study: The Manually Annotated Sub-Corpus (MASC)”.

  23. 23.

    Two additional standards, ISO 24617-6 SemAF Principles and ISO 24617-8 ISO DR-Core, were published in 2016.

  24. 24.

    See chapter “Building FactBank or How to Annotate Event Factuality One Step at a Time” for an example of ISO-TimeML applied to language data.

  25. 25.

    Copied from [80].

  26. 26.

    \({\texttt {<}}\) TIMEX3 xml:id="t21"/ \({\texttt {>}}\) may be treated as an element, called non-consuming tag, which has no associated markable expression in text, thus the value of its attribute @target is empty. See ISOspace [70], A.3.4 Special Section: Non-consuming tags.

  27. 27.

    See chapter “It-TimeML and the Ita-TimeBank: Language Specific Adaptations for Temporal Annotation” for an example of ISOspace applied to language data.

  28. 28.

    The noun Mia is tagged as se (spatial entity) because it is spatially involved as the figure of the event lives near Harvard in Cambridge.

  29. 29.

    \(_{pl7}\) is a non-consuming tag referring to some spot on the Charles River that is crossed.

  30. 30.

    A new attribute @dir for the direction of a motion may need to be introduced to annotate a markable such as eastward.

  31. 31.

    The informative annex B in SemAF-SR [69] reviews these existing framewokrs in detail.

  32. 32.

    See [17], p. 41.

  33. 33.

    The specification of the annotation structure here is much simplified, differing from that presented in [17].

  34. 34.

    See Annex C.3 Concrete syntax, SemAF-SR [69].

  35. 35.

    See chapters “Semantic Annotation of MASC” and “VerbNet/OntoNotes-Based Sense Annotation”.

  36. 36.

    http://www.isocat.org/datcat/DC-4187

  37. 37.

    http://www.isocat.org/datcat/DC-4189

  38. 38.

    http://tla.mpi.nl/relish/

  39. 39.

    http://www.clarin.eu

  40. 40.

    http://www.isocat.org/rest/dcs/376

  41. 41.

    http://www.isocat.org/rest/dcs/484

  42. 42.

    https://openskos.meertens.knaw.nl/ccr/browser/

  43. 43.

    See http://universaldependencies.github.io/docs/

  44. 44.

    https://www.internationalphoneticassociation.org/content/ipa-chart

  45. 45.

    https://www.internationalphoneticassociation.org/

  46. 46.

    http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=gentium

  47. 47.

    http://wwwhomes.uni-bielefeld.de/gibbon/TGA/

  48. 48.

    https://github.com/hbuschme/TextGridTools/

  49. 49.

    http://www.linguistics.ucla.edu/faciliti/facilities/acoustic/praat.html

  50. 50.

    http://nlp2rdf.org/nif-1-0/

  51. 51.

    http://nlp2rdf.lod2.eu/demo.php

  52. 52.

    http://tools.ietf.org/html/rfc5147

  53. 53.

    http://tools.ietf.org/html/rfc1737

  54. 54.

    http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core

  55. 55.

    http://www.w3.org/TR/rdf-concepts/section-Literals

  56. 56.

    Available at http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/version-1.0/

  57. 57.

    See e.g. http://clarkparsia.com/pellet/icv/

  58. 58.

    http://www.w3.org/TR/its20/datacategory-description

  59. 59.

    http://www.w3.org/TR/rdfa-syntax/

  60. 60.

    http://www.w3.org/2005/11/its/rdf

  61. 61.

    http://www.w3.org/TR/its20/conversion-to-nif

  62. 62.

    http://purl.org/olia

  63. 63.

    http://purl.org/olia/penn.owl

  64. 64.

    http://olia.nlp2rdf.org/owl/

  65. 65.

    http://www.w3.org/TR/skos-reference/skos-xl.html

  66. 66.

    http://opennlp.apache.org

  67. 67.

    Note that converters from many graph-based formats to CoNLL IOB exist, but the reverse conversion from CoNLL IOB into these formats is significantly more challenging.

References

  1. Allen, J., Core, M.: DAMSL: dialogue act markup in several layers (Draft 2.1). Technical report. University of Rochester, Rochester, NY (1997). http://www.cs.rochester.edu/research/cisd/resources/damsl/RevisedManual/

  2. Allwood, J.: On dialogue cohesion. Gothenburg Papers in Theoretical Linguistics 65 (1992). Gothenburg University, Department of Linguistics

    Google Scholar 

  3. Auer, S., Hellmann, S.: The web of data: decentralized, collaborative, interlinked and interoperable. In: LREC (2012)

    Google Scholar 

  4. Austin, P.K., Grenoble, L.A.: Current trends in language documentation. Lang. Doc. Descr. 4, 12–25 (2007)

    Google Scholar 

  5. Bigi, B., Hirst, D.: SPeech phonetization alignment and syllabification (SPPAS): a tool for the automatic analysis of speech prosody. In: Speech Prosody, Shanghai, China, pp. 1–4. (2012). https://hal.archives-ouvertes.fr/hal-00983699

  6. Bird, S., Klein, E.: Phonological events. Journal of Linguistics 26, 33–56 (1990)

    Article  Google Scholar 

  7. Bird, S., Liberman, M.: A formal framework for linguistic annotation. Speech Communication 33(1–2), 23–60 (2001)

    Article  Google Scholar 

  8. Boersma, P., Weenink, D.: Praat, a system for doing phonetics by computer. Speech Communication 5(9/10), 341–345 (2001)

    Google Scholar 

  9. Breen, M., Dilley, L.C., Kraemer, J., Gibson, E.: Inter-transcriber agreement for two systems of prosodic annotation: Tobi (tones and break indices) and rap (rhythm and pitch). Speech Communication 8(2), 277–312 (2012)

    Google Scholar 

  10. Broeder, D., Schuurman, I., Windhouwer, M.: Experiences with the isocat data category registry. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp. 4565–4568. European Language Resources Association (ELRA), Reykjavik, Iceland (2014)

    Google Scholar 

  11. Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL), pp. 149–164 (2006)

    Google Scholar 

  12. Bunt, H.: Context and dialogue control. Speech Communication 3(1), 19–31 (1994)

    Google Scholar 

  13. Bunt, H.: Dialogue pragmatics and context specification. In: Bunt, H., Black, W. (eds.) Abduction, Belief and Context in Dialogue. Studies in Computational Pragmatics, pp. 81–150. John Benjamins, Amsterdam (2000)

    Google Scholar 

  14. Bunt, H.:The DIT++ taxonomy for functional dialogue markup. In: Heylen, D., Pelachaud, C., Catizone, R., Traum, D. (eds.) Proceedings of AAMAS 2009 Workshop "Towards a Standard Markup Language for Embodied Dialogue Acts" (EDAML 2009), Budapest, pp. 13–24 (2009)

    Google Scholar 

  15. Bunt, H.: A methodology for designing semantic annotation languages exploring semantic-syntactic iso-morphisms. In: Fang, A., Ide, N., Webster, J. (eds.) Proceedings of the Second International Conference on Global Interoperability for Language Resources (ICGL 2010), pp. 29–46. Department of Chinese, Translation and Linguistics, City Univesity of Hong Kong, Hong Kong (2010)

    Google Scholar 

  16. Bunt, H.: Introducing abstract syntaxt + semantics in semantic annotation, and its consequences for the annotation of time and events. In: Lee, E., Yoon, A. (eds.) Recent Trends in Language and Knowledge Processing, pp. 157–204. Hankookmunhwasa, Seoul (2011)

    Google Scholar 

  17. Bunt, H., Palmer, M.: Conceptual and representational choices in defining an iso standard for semantic role annotation. In: Bunt, H. (ed.) Proceedings of the 9th Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation (ISA-9), pp. 41–50. Association for Computational Linguistics, Potsdam, Germany (2013). http://www.aclweb.org/anthology/W13-0500

  18. Bunt, H., Pustejvosky, J.: Annotating event and temporal quantification. In: Proceedings of the Fifth Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation ISA-5, pp. 15–22 (2010)

    Google Scholar 

  19. Bunt, H., Alexandersson, J., Carletta, J., Choe, J.W., Fang, A.C., Hasida, K., Lee, K., Petukhova, V., Popescu-Belis, A., Romary, L., Soria, C., Traum, D.: Towards an ISO standard for dialogue act annotation. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10) (2010)

    Google Scholar 

  20. Bunt, H., Alexandersson, J., Choe, J.W., Fang, A.C., Hasida, K., Petukhova, V., Popescu-Belis, A., Traum, D.: Iso 24617-2: a semantically-based standard for dialogue annotation. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey (2012)

    Google Scholar 

  21. Buschmeier, H., Wlodarczak, M.: Textgridtools: a textgrid processing and analysis toolkit for python. In: Tagungsband der 24. Konferenz zur Elektronischen Sprachsignalverarbeitung (ESSV 2013), pp. 152–157 (2013)

    Google Scholar 

  22. Carletta, J., Isard, S., Kowtko, J., Doherty-Sneddon, G.: HCRC dialogue structure coding manual. Technical report HCRC/TR-82 (1996)

    Google Scholar 

  23. Carletta, J., Dahlbäck, N., Reithinger, N., Walker, M.A.: Standards for dialogue coding in natural language processing. Technical report no. 167. Report from Dagstuhl seminar number 9706 (1997)

    Google Scholar 

  24. Chiarcos, C.: Ontologies of linguistic annotation: survey and perspectives. In: LREC. European Language Resources Association (2012)

    Google Scholar 

  25. Cinková, S.: From propbank to engvallex: adapting the propbank-lexicon to the valency theory of the functional generative description. In: Proceedings of the 6th Edition of International Conference on Language Resources and Evaluation (LREC 2006), pp. 2170–2175 (2006)

    Google Scholar 

  26. Corpus Encoding Standard (1994). http://www.cs.vassar.edu/CES/CES1.html

  27. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: ACL (2002). doi:10.3115/1073083.1073112. http://www.aclweb.org/anthology/P02-1022

  28. de Marneffe, M.C., Manning, C.D.: The Stanford typed dependencies representation. In: Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, pp. 1–8 (2008)

    Google Scholar 

  29. de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC) (2006)

    Google Scholar 

  30. de Marneffe, M.C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., Manning, C.D.: Universal Stanford dependencies: a cross-linguistic typology. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC), pp. 4585–4592 (2014)

    Google Scholar 

  31. Dhillon, R., Bhagat, S., Carvey, H., Schriberg, E.: Meeting recorder project: dialogue labelling guide. ICSI Technical Report TR-04-002 (2004)

    Google Scholar 

  32. Di Eugenio, B., Jordan, P.W., Pylkkanen, L.: The COCONUT project: dialogue annotation manual. ISP Technical Report 98–1, University of Pittsburgh (1998)

    Google Scholar 

  33. Eckle-Kohler, J., Gurevych, I., Hartmann, S., Matuschek, M., Meyer, C.M.: UBY-LMF - exploring the boundaries of language-independent lexicon models. In: Francopoulo, G. (ed.) LMF Lexical Markup Framework, Chap. 10, pp. 145–156. ISTE - HERMES - Wiley, London (2013)

    Google Scholar 

  34. Farrar, S., Langendoen, D.T.: A linguistic ontology for the semantic web. Speech Communication 7, 97–100 (2003)

    Google Scholar 

  35. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  36. Ferrucci, D., Lally, A.: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Speech Communication 10(3/4), 327–348 (2004)

    Google Scholar 

  37. Fillmore, C.J.: The case for case. In: Bach, E., Harms, R. (eds.) Universals in Linguistic Theory, pp. 1–89. Holt, Rinehart, and Winston (1968)

    Google Scholar 

  38. Fillmore, C., Baker, C., Sato, H.: Framenet as a “net”. In: Proceedings of the 4th Edition of International Conference on Language Resources and Evaluation (LREC 2004), pp. 1091–1094 (2004)

    Google Scholar 

  39. Francopoulo, G. (ed.): LMF: Lexical Markup Framework. Wiley-ISTE, London (2013)

    Google Scholar 

  40. Gibbon, D.: Time types and time trees: prosodic mining and alignment of temporally annotated data. In: Sudhoff, S., Lenertova, D., Meyer, R., Pappert, S., Augurzky, P., Mleinek, I., Richter, N., Schlieer, J. (eds.) Methods in Empirical Prosody Research, pp. 281–209. Walter de Gruyter, Berlin (2006)

    Google Scholar 

  41. Gibbon, D.: Modelling gesture as speech: a linguistic approach. Pozna? Speech Communication 47, 470–508 (2011)

    Google Scholar 

  42. Gibbon, D., Moore, R., Winski, R. (eds.): Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin (1997)

    Google Scholar 

  43. Gibbon, D., Mertins, I., Moore, R.: Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation. The Springer International Series in Engineering and Computer Science. Springer US (2000). http://books.google.com/books?id=Ntb0T7gfIn8C

  44. Głowińska, K., Przepirkowski, A.: The design of syntactic annotation levels in the national corpus of polish. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), pp. 19–21. European Language Resources Association (ELRA), Valletta, Malta (2010)

    Google Scholar 

  45. Grishman, R.: TIPSTER architecture design document version 2.2. Technical report, Defense Advanced Research Projects Agency (1996)

    Google Scholar 

  46. Gurevych, I., Eckle-Kohler, J., Hartmann, S., Matuschek, M., Meyer, C.M., Wirth, C.: UBY - a large-scale unified lexical-semantic resource. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon, France, pp. 580–590 (2012)

    Google Scholar 

  47. Hellmann, S., Lehmann, J., Auer, S.: Linked-data aware URI schemes for referencing text fragments. EKAW 2012. LNCS, vol. 7603. Springer, New York (2012)

    Google Scholar 

  48. Hirst, D., Di Cristo, A.: Intonation Systems: A Survey of Twenty Languages. Cambridge University Press, Cambridge (1998). http://www.google.com.sg/books?id=LClvNiI4k0sC

  49. Ide, N., Veronis, J.: Multext: multilingual text tools and corpora. In: COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics (1994). http://aclweb.org/anthology/C94-1097

  50. Ide, N., Veronis, J.: Encoding dictionaries. In: Ide, N., Veronis, J. (eds.) The Text Encoding Initiative: Background and Context. Kluwer Academic Publishers, Dordrecht (1995)

    Chapter  Google Scholar 

  51. Ide, N., Romary, L.: Standards for language resources. In: Proceedings of the IRCS Workshop on Linguistic Databases, Philapdelphia, Pa, pp. 141–149 (2001)

    Google Scholar 

  52. Ide, N., Romary, L.: Outline of the international standard linguistic annotation framework. In: Proceedings of ACL’03 Workshop on Linguistic Annotation: Getting the Model Right, pp. 1–5 (2003)

    Google Scholar 

  53. Ide, N., Romary, L.: International standard for a linguistic annotation framework. Speech Communication 10(3–4), 211–225 (2004)

    Google Scholar 

  54. Ide, N., Romary, L.: A registry of standard data categories for linguistic annotation. In: Proceedings of the Fourth Language Resources and Evaluation Conference (LREC), Lisbon, pp. 135–139 (2004)

    Google Scholar 

  55. Ide, N., Romary, L.: Towards international standards for language resources. In: Dybkjaer, L., Hemsen, H., Minker, W. (eds.) Evaluation of Text and Speech Systems, pp. 263–284. Springer, New York (2007)

    Google Scholar 

  56. Ide, N., Suderman, K.: GrAF: a graph-based format for linguistic annotations. In: Proceedings of the Linguistic Annotation Workshop (LAW), pp. 1–8. Association for Computational Linguistics (2007)

    Google Scholar 

  57. Ide, N., Pustejovsky, J.: What does interoperability mean, anyway? Toward an operational definition of interoperability. In: Proceedings of the Second International Conference on Global Interoperability for Language Resources. Hong Kong (2010)

    Google Scholar 

  58. Ide, N., Suderman, K.: The linguistic annotation framework: a standard for annotation interchange and merging. Speech Communication 48(3), 395–418 (2014)

    Google Scholar 

  59. Ide, N., Bonhomme, P., Romary, L.: XCES: an XML-based encoding standard for linguistic corpora. In: Proceedings of the Second International Language Resources and Evaluation Conference (LREC’00) (2000)

    Google Scholar 

  60. Ide, N., Baker, C., Fellbaum, C., Passonneau, R.: The Manually Annotated Sub-Corpus: A Community Resource For and By the People. In: Proceedings of the ACL 2010 Conference Short Papers, pp. 68–73. Association for Computational Linguistics, Uppsala, Sweden (2010)

    Google Scholar 

  61. Ide, N., Pustejovsky, J., Suderman, K., Verhagen, M.: The language application grid web service exchange vocabulary. In: Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT (OIAF4HLT). Dublin (2014)

    Google Scholar 

  62. International Organization for Standardization: ISO 8879:1986: Information processing – Text and office systems – Standard Generalized Markup Language (SGML). ISO, Geneva (1986)

    Google Scholar 

  63. ISO 24612:201 Language resource management - Linguistic annotation framework (LAF), ISO, Geneva. ISO Working Group: ISO/TC 37/SC 4/WG 2 convenor and project leader, Nancy Ide

    Google Scholar 

  64. ISO: ISO 8601:2004 Data elements and interchange formats – Information interchange – Representation of dates and times. ISO, Geneva (2004)

    Google Scholar 

  65. ISO: ISO 24612:2012 Language resource management - Linguistic annotation framework (LAF). ISO, Geneva. ISO Working Group:TC 37/SC 4/WG 1, Convenor and project leader: Nancy Ide (2012)

    Google Scholar 

  66. ISO: ISO 24617-1:2012 Language resource management - Semantic annotation framework - Part 1: time and events (SemAF-Time, ISO-TimeML). ISO, Geneva. ISO Working Group:TC 37/SC 4/WG 2, Editors: James Pustejvosky (chair), Harry Bunt, Kiyong Lee (convenor and project leader), Bran Boguraev, and Nancy Ide in cooperation with the TimeML Working Group (2012). http://www.timeml.org

  67. ISO: ISO 24617–2:2012 Language resource management - Semantic annotation framework - Part 2: dialogue acts (SemAF-DA). ISO, Geneva. ISO Working Group:TC 37/SC 4/WG 2 Convenor: Kiyong Lee. Project leader: Harry Bunt (2012)

    Google Scholar 

  68. ISO: 24612:2012 Language resource management, Linguistic annotation framework (LAF). ISO, Geneva, Switzerland (2012)

    Google Scholar 

  69. ISO: ISO 24617-4:2014 Language resource management - Semantic annotation framework - Part 4: Semantic roles (SemAF-SR). ISO, Geneva. ISO Working Group:TC 37/SC 4/WG 2 Convenor: Kiyong Lee, Project leader: Martha Palmer, Writers: Martha Palmer (USA), Collin Baker (USA), Claire Bonial (USA), Harry Bunt (Holland), Katrin Erk (USA, Germany), Olga Petukhova (Germany), James Pustejovsky (USA), Zdenka Uresova (the Czech Republic), Nianwen Xue (USA, China) (2014)

    Google Scholar 

  70. ISO: ISO 24617-7:2014 Language resource management - Part 7: spatial information (ISOspace). ISO, Geneva. ISO Working Group: TC 37/SC 4/WG 2, Project leaders: James Pustejovsky and Kiyong Lee, supported by the ISOspace Working Group headed by James Pustejvosky at Brandeis University, Waltham, MA, U.S.A. The following is the homepage for the ISO-Space project (2014). https://sites.google.com/site/wikiisospace/

  71. Katz, G.: Annotating temporal and event quantification. Annotating, Extracting and Reasoning About Time and Events, pp. 88–106 (2007)

    Google Scholar 

  72. Kipper, K., Korhonen, A., Ryant, N., Palmer, M.: A large-scale classification of English verbs. Speech Communication 42, 21–40 (2008)

    Google Scholar 

  73. Kipper-Schuler, K.: Verbnet: a broad-coverage, comprehensive verb lexicon. Ph.D. thesis, University of Pennsylvania (2005)

    Google Scholar 

  74. Klessa, K., Gibbon, D.: Annotation Pro + TGA: automation of speech timing analysis. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). European Language Resources Association (ELRA), Reykjavik, Iceland (2014)

    Google Scholar 

  75. Knuth, D.E.: Literate Programming. CSLI Lecture Notes. CSLI, Stanford (1992)

    Google Scholar 

  76. Kübler, S., McDonald, R., Nivre, J.: Dependency Parsing. Morgan and Claypool, San Rafael (2009)

    Google Scholar 

  77. Laurent Romary. TEI and LMF crosswalks. JLCL - Journal for Language Technology and Computational Linguistics, 30(1), (2009). <http://www.jlcl.org><hal-00762664v4>

  78. Lee, K.: Formal Semantics for Temporal Annotation. Lecture Notes for CIL, vol. 18 (2008)

    Google Scholar 

  79. Lee, K.: A compositional interval semantics for temporal annotation. In: Lee, E., Yoon, A. (eds.) Recent Trends in Language and Knowledge Processing, pp. 157–204. Hankookmunhwasa, Seoul. Presented at the workshop on language and knowledge processing, Pusan National University, in summer 2008 (2011)

    Google Scholar 

  80. Lee, K.: The annotation of measure expressions in ISO standards. In: Bunt, H. (ed.) Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-11). QMUL, London. A satellite workshop of IWCS 2015, London, U.K (2015)

    Google Scholar 

  81. Lee, K., Romary, L.: Towards interoperability of ISO standards for language resource management. In: Fang, A.C., Ide, N., Webster, J. (eds.) Proceedings of Language Resources and Interoperability, The Second International Conference on Global Interoperability for Language Resources (ICGL201), Hong Kong, pp. 95–104 (2010)

    Google Scholar 

  82. Mani, I., Hitzeman, J., Richer, J., Harris, D., Quimby, R., Wellner, B.: Spatialml: annotation scheme, corpora, and tools. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08). European Language Resources Association (ELRA), Marrakech, Morocco (2008). http://www.lrec-conf.org/proceedings/lrec2008/

  83. McDonald, R., Petrov, S., Hall, K.: Multi-source transfer of delexicalized dependency parsers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 62–72 (2011)

    Google Scholar 

  84. McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N., Lee, J.: Universal dependency annotation for multilingual parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 92–97 (2013)

    Google Scholar 

  85. Mcneill, D. (ed.): Language and Gesture: Window into Thought and Action. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  86. Mehler, A., Romary, L., Gibbon, D. (eds.): Handbook of Technical Communication. Handbooks of Applied Linguistics. De Gruyter Mouton, Berlin and Boston (2012)

    Google Scholar 

  87. MITRE: SpatialML: annotation scheme for marking spatial expressions in natural language. The MITRE Corporation (2009). Version 3.1, October 1, 2009, Contact: cdoran@mitre.org

    Google Scholar 

  88. Nivre, J., Hall, J., Nilsson, J.: Maltparser: a data-driven parser-generator for dependency parsing. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), pp. 2216–2219 (2006)

    Google Scholar 

  89. Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL Shared Task of EMNLP-CoNLL 2007, pp. 915–932 (2007)

    Google Scholar 

  90. Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: An annotated corpus of semantic roles 31(1), 71–0106 (2005)

    Google Scholar 

  91. Peroni, S., Vitali, F.: Annotations with earmark for arbitrary, overlapping and out-of order markup. In: Borghoff, U.M., Chidlovskii, B. (eds.) ACM Symposium on Document Engineering, pp. 171–180. ACM, New York (2009)

    Google Scholar 

  92. Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC) (2012)

    Google Scholar 

  93. Petukhova, V., Bunt, H.: The independence of dimensions in multidimensional dialogue act annotation. In: Proceedings NAACL HLT Conference, Boulder, Colorado (2009)

    Google Scholar 

  94. Petukhova, V., Bunt, H., Schiffrin, A.: LIRICS semantic role annotation: design and evaluation of a set of data categories. In: Proceedings of the 6th Edition of International Conference on Language Resources and Evaluation (LREC 2008). Marrakech (2007)

    Google Scholar 

  95. Petukhova, V., Prévot, L., Bunt, H.: Discourse relations in dialogue. In: Proceedings 6th Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation (ISA-6). Oxford, UK (2011)

    Google Scholar 

  96. Popescu-Belis, A.: Dialogue acts: one or more dimensions? ISSCO Working Paper 62. ISSCO, Geneva (2005). http://www.issco.unige.ch/publicaitons/working-papers/papers/apb-issco-wp62b.pdf

  97. Pratt-Hartmann, I.: From TimeML to interval temporal logic. In: Bunt, H. (ed.) Proceedings of the Seventh International Workshop on Computational Semantics (IWCS-7), Tilburg, The Netherlands, pp. 166–180 (2007)

    Google Scholar 

  98. Przepiórkowski, A.: TEI P5 as an XML standard for treebank encoding, pp. 149–160 (2009)

    Google Scholar 

  99. Pustejvosky, J., Gaizauskas, R., Saurí, R., Setzer, A., Ingrai, R.: Annotation guideline to TimeML 1.0 (2002). Available at http://timeml.org

  100. Pustejovsky, J., Ingria, R., Saurí, R., Castaño, J., Littman, J., Gaizauskas, R., Setzer, A., Katz, G., Mani, I.: The specification language TimeML. In: Mani, I., Pustejvosky, J., Gaizauskas, R. (eds.) The Language of Time: a Reader, pp. 545–557. Oxford University Press, Cambridge (2005)

    Google Scholar 

  101. Pustejovsky, J., Lee, K., Bunt, H., Romary, L.: ISO-TimeML: an international standard for semantic annotation. In: Proceedings of LREC2010. Malta (2010)

    Google Scholar 

  102. Rizzo, G., Troncy, R., Hellmann, S., Bruemmer, M.: NERD meets NIF: lifting NLP extraction results to the linked data cloud. In: LDOW (2012)

    Google Scholar 

  103. Romary, L.: TBX goes TEI - implementing a TBX basic extension for the text encoding initiative guidelines. Terminology and Knowledge Engineering 2014, Berlin, Germany, (2014).<hal-00950862v2>

    Google Scholar 

  104. Romary, L., Bonhomme, P.: Parallel alignment of structured documents. Parallel Text Processing, pp. 201–217. Springer, New York (2000)

    Chapter  Google Scholar 

  105. Rossini, N.: Reinterpreting Gesture as Language - Language in Action. IOS Press, Amsterdam (2012)

    Google Scholar 

  106. Rubiera, E., Polo, L., Berrueta, D., Ghali, A.E.: Telix: an RDF-based model for linguistic annotation. In: ESWC (2012)

    Google Scholar 

  107. Schierle, M.: Language engineering for information extraction. Ph.D. thesis, Universität Leipzig (2011)

    Google Scholar 

  108. Schiffrin, A., Bunt, H.: LIRICS deliverable D4.3: documented compilation of semantic data categories (2007). http://lirics.loria.fr

  109. Schmidt, T.: A tei-based approach to standardising spoken language transcription. Journal of the Text Encoding Initiative, Issue 1 | June 2011. http://jtei.revues.org/142 ; DOI : 10.4000/jtei.142

  110. Sperberg-McQueen, C., L. Burnard, L. (eds.): Guidelines for electronic text encoding and interchange. TEI P3. Text Encoding Initiative, Oxford, Providence, Charlottesville, Bergen (1994)

    Google Scholar 

  111. Szymański, M., Bachan, J.: Interlabeller agreement on segmental and prosodic annotation of the jurisdict polish database. Speech Communication 14/15, 105–121 (2012)

    Google Scholar 

  112. TEI Consortium (ed.): Guidelines for electronic text encoding and interchange. TEI P5. Text Encoding Initiative, Oxford, Providence, Charlottesville, Bergen, Nancy (2003)

    Google Scholar 

  113. Teoh, A., Chin, S.: Transcribing the speech of children with cochlear implants: clinical application of narrow phonetic transcriptions. Speech Communication 18(4), 388–401 (2009)

    Google Scholar 

  114. Tobies, S.: Complexity results and practical algorithms for logics in knowledge representation. Ph.D. thesis, TU Dresden (2001)

    Google Scholar 

  115. Tomaz, E., Fiser, D., Krek, S., Ledinek, N.: The JOS linguistically tagged corpus of Slovene. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). Malta (2010)

    Google Scholar 

  116. Traum, D.: 20 questions on dialogue act taxonomies. Speech Communication 17(1), 7–30 (2000)

    Google Scholar 

  117. Tsarfaty, R.: A unified morpho-syntactic scheme of Stanford dependencies. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 578–584 (2013)

    Google Scholar 

  118. Windhouwer, M.: RELcat: a relation registry for ISOcat data categories. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, pp. 3661–3664 (2012)

    Google Scholar 

  119. Windhouwer, M., Wright, S.E.: LMF and the data category registry: principles and application. In: Francopoulo, G. (ed.) LMF Lexical Markup Framework, Chap. 10, pp. 41–50. ISTE - HERMES - Wiley, London (2013)

    Chapter  Google Scholar 

  120. Zeman, D.: Reusable tagset conversion using tagset drivers. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), pp. 213–218 (2008)

    Google Scholar 

  121. Zeman, D., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: To parse or not to parse? In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), pp. 2735–2741 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nancy Ide .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Ide, N. et al. (2017). Community Standards for Linguistically-Annotated Resources. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_4

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics