Introduction: The Handbook of Linguistic Annotation

Ide, Nancy

doi:10.1007/978-94-024-0881-2_1

Nancy Ide³

2265 Accesses
8 Citations
3 Altmetric

Abstract

The Handbook of Linguistic Annotation provides a comprehensive survey of the development and state-of-the-art for linguistic annotation of language resources, including methods for annotation scheme design, annotation creation, physical format considerations, annotation tools, annotation use, evaluation, etc. The volume is divided into two parts: Part I includes survey chapters on the various phases and considerations for an annotation project, and Part II consists of thirty-nine case studies describing major annotation projects for a broad range of linguistic phenomena.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Softcover Book: USD 449.99; Price excludes VAT (USA)

Hardcover Book: USD 449.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The earliest automatic part-of-speech taggers include Greene and Rubin’s TAGGIT [19], Garside’s CLAWS [17], DeRose’s VOLSUNGA [13], and Church’s PARTS [6].
2.
http://nlp.shef.ac.uk/parole/parole.html.
3.
A few projects relied on manual annotation alone [31, 33, 45], partial “spot-checking” of automatically-generated annotations (e.g., the British National Corpus), or even combinations of several automatic annotators [41].
4.
http://www.MTurk.com.
5.
See chapter “Community standards” in this volume for an overview.
6.
http://www.ilc.cnr.it/EAGLES/browse.html.
7.
www.ilc.cnr.it/EAGLES/annotate/annotate.html.
8.
http://link.springer.com/journal/10579.
9.
http://www.cs.vassar.edu/~sigann.
10.
http://oxygenxml.com.
11.
http://linguistic-lod.org/llod-cloud.

References

Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)
Article Google Scholar
Bernsen, N.O., Dybkjær, L., Kolodnytsky, M.: The NITE workbench. A tool for annotation of natural interactivity and multimodal data. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC-2002). European Language Resources Association (ELRA), Las Palmas, Canary Islands - Spain (2002). http://www.lrec-conf.org/proceedings/lrec2002/pdf/214.pdf. ACL Anthology Identifier: L02-1214
Bird, S., Day, D., Garofolo, J., Henderson, J., Laprun, C., Liberman, M.: ATLAS: a flexible and extensible architecture for linguistic annotation. In: Proceedings of the Second International Conference on Language Resources and Evaluation (LREC-2000). European Language Resources Association (ELRA), Athens, Greece (2000)
Google Scholar
Bunt, H.: A methodology for designing semantic annotation languages exploiting semantic-syntactic isomorphisms. In: Proceedings of the Second International Conference on Global Interoperability for Language Resources (ICGL2010), pp. 29–46. City University of Hong Kong, Hong Kong SAR (2010)
Google Scholar
Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)
Google Scholar
Church, K.W.: A stochastic parts program and noun phrase parser for unrestricted text. In: Proceedings of the Second Conference on Applied Natural Language Processing, ANLC ’88, pp. 136–143. Association for Computational Linguistics, Stroudsburg, PA, USA (1988). doi:10.3115/974235.974260. http://dx.doi.org/10.3115/974235.974260
Clear, J.H.: The British National Corpus. In: Landow, G.P., Delany, P. (eds.) The Digital Word, pp. 163–187. MIT Press, Cambridge (1993)
Google Scholar
Core, M., Ishizaki, M., Moore, J., Nakatani, C., Reithinger, N., Traum, D., Tutiya, S.: The report of the third workshop of the discourse resource initiative. Chiba University and Kazusa Academia Hall, Technical report (1998)
Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust nlp tools and applications. In: Proceedings of ACL’02 (2002)
Google Scholar
Cunningham, H., Wilks, Y., Gaizauskas, R.: Software infrastructure for language engineering. In: Proceedings of the AISB Workshop on Language Engineering for Document Analysis and Recognition. Brighton, U.K. (1996)
Google Scholar
Day, D., Aberdeen, J., Hirschman, L., Kozierok, R., Robinson, P., Vilain, M.: Mixed-initiative development of language processing systems. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 348–355. Association for Computational Linguistics, Washington, DC, USA (1997)
Google Scholar
Day, D.S., McHenry, C., Kozierok, R., Riek, L.: Callisto: a configurable annotation workbench. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC-2004). European Language Resources Association (2004)
Google Scholar
DeRose, S.J.: Grammatical category disambiguation by statistical optimization. Comput. Linguist. 14(1), 31–39 (1988)
Google Scholar
Doddington, G.R., Mitchell, A., Przybocki, M.A., Ramshaw, L.A., Strassel, S., Weischedel, R.M.: The automatic content extraction (ace) program - tasks, data, and evaluation. In: Proceedings of the Fourth Language Resources and Evaluation Conference (LREC 2004. European Language Resources Association (2004)
Google Scholar
Erjaveç, T., Ide, N.: The MULTEXT-East corpus. In: Proceedings of First International Conference on Language Resources and Evaluation, pp. 971–974 (1998)
Google Scholar
Ferrucci, D., Lally, A.: Uima: an architectural approach to unstructured information processing in the corporate research environment. Natural Lang. Eng. 10(3–4), 327–348 (2004)
Article Google Scholar
Garside, R.: The CLAWS word-tagging system. In: R. Garside, G. Sampson, G. Leech (eds.) The Computational Analysis of English: A Corpus-Based Approach. Longman (1987). http://www.researchgate.net/publication/230876041_The_CLAWS_word-tagging_system
Garside, R., Leech, G., Sampson, G.: The computational analysis of English: a corpus-based approach. Longman (1987)
Google Scholar
Greene, B.B., Rubin, G.M.: Automatic Grammatical Tagging of English. Brown University, Department of Linguistics (1971)
Google Scholar
Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Proceedings of the 16th Conference on Computational Linguistics - COLING ’96, vol. 1, pp. 466–471. Association for Computational Linguistics, Stroudsburg, PA, USA (1996)
Google Scholar
Hellmann, S., Lehmann, J., Auer, S., Nitzschke, M.: Nif combinator: combining nlp tool output. In: 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW2012) (2012)
Google Scholar
Hovy, E., Lavid, J.: Towards a ‘science’ of corpus annotation: a new methodological challenge for corpus linguistics. Int. J. Transl. Stud. 22(2) (2010)
Google Scholar
Ide, N.: Corpus encoding standard: SGML guidelines for encoding linguistic corpora. In: Proceedings of the First International Language Resources and Evaluation Conference (LREC 1998), pp. 463–470. European Language Resources Association (ELRA) (1998)
Google Scholar
Ide, N.: Annotation science: from theory to practice and use. In: Rehm, G., Witt, A., Lemnitzer, L. (eds.) Data Structures for Linguistics Resources and Applications. Gunter Narr Verlag, Germany (2007)
Google Scholar
Ide, N., Atwell, E. (eds.): Annotation science: state of the art in enhancing automatic linguistic annotation. In: Proceedings of the Workshop. European Language Resources Association (2006). http://www.lrec-conf.org/proceedings/lrec2006/
Ide, N., Bunt, H.: Anatomy of annotation schemes: mapping to GrAF. In: Proceedings of the Fourth Linguistic Annotation Workshop. LAW IV, pp. 247–255. Association for Computational Linguistics, Stroudsburg, PA, USA (2010)
Google Scholar
Ide, N., Suderman, K.: The linguistic annotation framework: a standard for annotation interchange and merging. Lang. Resour. Eval. 48(3), 395–418 (2014)
Google Scholar
Ide, N., Véronis, J.: MULTEXT: multilingual text tools and corpora. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING 94), vol. I, pp. 588–592. Kyoto, Japan (1994)
Google Scholar
Ide, N., Bonhomme, P., Romary, L.: XCES: an XML-based encoding standard for linguistic corpora. In: Proceedings of the Second Language Resources and Evaluation Conference (LREC 2000). European Language Resources Association (ELRA), Athens, Greece (2000)
Google Scholar
Isard, A., Mller, M.B., McKelvie, D., Mengel, A.: The MATE workbench - a tool for annotating xml corpora. In: Proceedings of Recherche d’Informations Assiste par Ordinateur (RIAO’2000). Paris (2000)
Google Scholar
Jäborg, J.: Introduction to “This is Watson". Göteborg University, Institute för spräkvetenskaplig databehandling (1986)
Google Scholar
Kučera, H., Francis, W.N.: Computational Analysis of Present-Day American English. Brown University Press, Providence (1967)
Google Scholar
Landes, S., Leacock, C., Tengi, R.I.: Building semantic concordances. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Google Scholar
Litman, D., Hirschberg, J.: Disambiguating cue phrases in text and speech. In: Proceedings of the 13th Conference on Computational Linguistics - COLING ’90, vol. 2, pp. 251–256. Association for Computational Linguistics, Stroudsburg, PA, USA (1990)
Google Scholar
Marcu, D., Amorrortu, E., Romera, M.: Experiments in constructing a corpus of discourse trees. In: Proceedings Towards Standards and Tools for Discourse Tagging, pp. 48–57 (1999)
Google Scholar
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)
Google Scholar
Marcus, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The penn treebank: annotating predicate argument structure. In: Proceedings of the Workshop on Human Language Technology, pp. 114–119. Association for Computational Linguistics, Stroudsburg, PA, USA (1994)
Google Scholar
Melamed, I.D.: Manual annotation of translational equivalence: the Blinker project. CoRR cmp-lg/9805005 (1998)
Google Scholar
Ng, H.T., Lim, C.Y., Foo, S.K.: A case study on inter-annotator agreement for word sense disambiguation. In: SIGLEX99: Standardizing Lexical Resources, pp. 351–14 (1999)
Google Scholar
Ogren, P.V.: Knowtator: a Protégé plug-in for annotated corpus construction. In: Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume: Demonstrations, pp. 273–275. Association for Computational Linguistics, Stroudsburg, PA, USA (2006)
Google Scholar
Paroubek, P.: Language resources as by-product of evaluation: the MultiTag example. In: Proceedings of the Second International Conference on Language Resources and Evaluation (LREC-2000). European Language Resources Association (ELRA), Athens, Greece (2000)
Google Scholar
Passonneau, R.J., Litman, D.J.: Intention-based segmentation: human reliability and correlation with linguistic cues. Proceedings of the 31st Annual Meeting on Association for Computational Linguistics. ACL ’93, pp. 148–155. Association for Computational Linguistics, Stroudsburg, PA, USA (1993)
Google Scholar
Pustejovsky, J., Stubbs, A.: Natural Language Annotation for Machine Learning. O’Reilly Media, California (2013)
Google Scholar
Resnik, P.: Disambiguating noun groupings with respect to WordNet senses. In: Proceedings of the 3rd Workshop on Very Large Corpora (1995)
Google Scholar
Sampson, G.: English for the Computer: the SUSANNE corpus and analytic scheme. Clarendon Press, Oxford (1995)
Google Scholar
Siegel, S., Castellan, N.: Nonparametric statistics for the behavioral sciences, second edn. McGraw–Hill, New York (1988)
Google Scholar
Silverman, K.E.A., Beckman, M.E., Pitrelli, J.F., Ostendorf, M., Wightman, C.W., Price, P., Pierrehumbert, J.B., Hirschberg, J.: ToBI: a standard for labeling English prosody. In: International Conference on Spoken Language Processing. ISCA (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Vassar College, Poughkeepsie, NY, 12604, USA
Nancy Ide

Authors

Nancy Ide
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nancy Ide .

Editor information

Editors and Affiliations

Department of Computer Science, Vassar College, Poughkeepsie, New York, USA
Nancy Ide
Department of Computer Science, Volen Center for Complex Systems, Brandeis University, Waltham, Massachusetts, USA
James Pustejovsky

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ide, N. (2017). Introduction: The Handbook of Linguistic Annotation. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_1

Download citation

DOI: https://doi.org/10.1007/978-94-024-0881-2_1
Published: 17 June 2017
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics