Extending the TüBa-D/Z Treebank with GermaNet Sense Annotation

Henrich, Verena; Hinrichs, Erhard

doi:10.1007/978-3-642-40722-2_9

Verena Henrich²² &
Erhard Hinrichs²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8105))

1296 Accesses

Abstract

This paper describes the manual construction of a sense-annotated corpus for German with the goal of providing a gold standard for word sense disambiguation. The underlying textual resource, the TüBa-D/Z treebank, is a German newspaper corpus already manually enriched with high-quality, manual annotations at various levels of grammar. The sense inventory used for tagging word senses is taken from GermaNet [8,9], the German counterpart of the Princeton WordNet for English [6]. With the sense annotation for a selected set of 109 words (30 nouns and 79 verbs) occurring together more than 15 500 times in the TüBa-D/Z, the treebank currently represents the largest manually sense-annotated corpus available for GermaNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agirre, E., Marquez, L., Wicentowski, R.: Proceedings of the 4th International Workshop on Semantic Evaluations. Association for Computational Linguistics, Stroudsburg (2007)
Google Scholar
Broscheit, S., Frank, A., Jehle, D., Ponzetto, S.P., Rehl, D., Summa, A., Suttner, K., Vola, S.: Rapid bootstrapping of Word Sense Disambiguation resources for German. In: Proceedings of the 10. Konferenz zur Verarbeitung Natürlicher Sprache, Saarbrücken, Germany, pp. 19–27 (2010)
Google Scholar
Chen, J., Palmer, M.: Improving English Verb Sense Disambiguation Performance with Linguistically Motivated Features and Clear Sense Distinction Boundaries. In: Language Resources and Evaluation, vol. 43, pp. 181–208. Springer, Netherland (2009)
Google Scholar
Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37–46 (1960)
Article Google Scholar
Erk, K., Strapparava, C.: Proceedings of the 5th International Workshop on Semantic Evaluation. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Fellbaum, C. (ed.): WordNet – An Electronic Lexical Database. The MIT Press (1998)
Google Scholar
Fellbaum, C., Palmer, M., Dang, H.T., Delfs, L., Wolf, S.: Manual and Automatic Semantic Annotation with WordNet. In: SIGLEX Workshop on WordNet and other Lexical Resources, NAACL 2001, Invited Talk, Pittsburgh, PA (2001)
Google Scholar
Hamp, B., Feldweg, H.: GermaNet – a Lexical-Semantic Net for German. In: Proceedings of ACL Workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, Madrid (1997)
Google Scholar
Henrich, V., Hinrichs, E.: GernEdiT – The GermaNet Editing Tool. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), Valletta, Malta, pp. 2228–2235 (2010)
Google Scholar
Henrich, V., Hinrichs, E., Suttner, K.: Automatically Linking GermaNet to Wikipedia for Harvesting Corpus Examples for GermaNet Senses. Journal for Language Technology and Computational Linguistics (JLCL) 27(1), 1–19 (2012)
Google Scholar
Henrich, V., Hinrichs, E., Vodolazova, T.: WebCAGe - A Web-Harvested Corpus Annotated with GermaNet Senses. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon, France, pp. 387–396 (2012)
Google Scholar
Mihalcea, R., Chklovski, T., Kilgarriff, A.: Proceedings of Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain (2004)
Google Scholar
Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: The Penn Treebank. In: Computational Linguistics, vol. 19, pp. 313–330 (1993)
Google Scholar
Palmer, M., Ng, H.T., Dang, H.T.: Evaluation of WSD Systems. In: Agirre, E., Edmonds, P. (eds.) Word Sense Disambiguation: Algorithms and Applications, pp. 75–106. Springer (2006)
Google Scholar
Raileanu, D., Buitelaar, P., Vintar, S., Bay, J.: Evaluation Corpora for Sense Disambiguation in the Medical Domain. In: Proceedings of the 3rd International Language Resources and Evaluation (LREC 2002), Las Palmas, Canary Islands, pp. 609–612 (2002)
Google Scholar
Schiller, A., Teufel, S., Thielen, C.: Guidelines für das Tagging deutscher Textcorpora mit STTS. Technical report, Universities of Stuttgart and Tübingen (1995)
Google Scholar
Telljohann, H., Hinrichs, E.W., Kübler, S., Zinsmeister, H., Beck, K.: Stylebook for the Tübingen Treebank of Written German (TüBa-D/Z). Technical report, Department of General and Computational Linguistics, University of Tübingen, Germany (2012)
Google Scholar
Véronis, J.: A study of polysemy judgments and inter-annotator agreement. In: Proceedings of SENSEVAL-1, Herstmonceux Castle, England (1998)
Google Scholar
Widdows, D., Peters, S., Cederberg, S., Chan, C.-K., Steffen, D., Buitelaar, P.: Unsupervised monolingual and bilingual word-sense disambiguation of medical documents using umls. In: Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine, BioMed 2003, pp. 9–16. Association for Computational Linguistics, Stroudsburg (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Linguistics, University of Tübingen, Wilhelmstr. 19, 72074, Tübingen, Germany
Verena Henrich & Erhard Hinrichs

Authors

Verena Henrich
View author publications
You can also search for this author in PubMed Google Scholar
Erhard Hinrichs
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Technical University Darmstadt, 64289 Darmstadt, Germany, and German Institute for International Education Research,, 60486, Frankfurt, Germany
Iryna Gurevych
Technical University Darmstadt, 64289, Darmstadt, Germany
Chris Biemann
Technical University Darmstadt, 64289 Darmsadt, and German Institute for International Educational Research, 60486, Frankfurt, Germany
Torsten Zesch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Henrich, V., Hinrichs, E. (2013). Extending the TüBa-D/Z Treebank with GermaNet Sense Annotation. In: Gurevych, I., Biemann, C., Zesch, T. (eds) Language Processing and Knowledge in the Web. Lecture Notes in Computer Science(), vol 8105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40722-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-40722-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40721-5
Online ISBN: 978-3-642-40722-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics