Skip to main content

Extending the TüBa-D/Z Treebank with GermaNet Sense Annotation

  • Conference paper
Book cover Language Processing and Knowledge in the Web

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8105))

  • 1296 Accesses

Abstract

This paper describes the manual construction of a sense-annotated corpus for German with the goal of providing a gold standard for word sense disambiguation. The underlying textual resource, the TüBa-D/Z treebank, is a German newspaper corpus already manually enriched with high-quality, manual annotations at various levels of grammar. The sense inventory used for tagging word senses is taken from GermaNet [8,9], the German counterpart of the Princeton WordNet for English [6]. With the sense annotation for a selected set of 109 words (30 nouns and 79 verbs) occurring together more than 15 500 times in the TüBa-D/Z, the treebank currently represents the largest manually sense-annotated corpus available for GermaNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agirre, E., Marquez, L., Wicentowski, R.: Proceedings of the 4th International Workshop on Semantic Evaluations. Association for Computational Linguistics, Stroudsburg (2007)

    Google Scholar 

  2. Broscheit, S., Frank, A., Jehle, D., Ponzetto, S.P., Rehl, D., Summa, A., Suttner, K., Vola, S.: Rapid bootstrapping of Word Sense Disambiguation resources for German. In: Proceedings of the 10. Konferenz zur Verarbeitung Natürlicher Sprache, Saarbrücken, Germany, pp. 19–27 (2010)

    Google Scholar 

  3. Chen, J., Palmer, M.: Improving English Verb Sense Disambiguation Performance with Linguistically Motivated Features and Clear Sense Distinction Boundaries. In: Language Resources and Evaluation, vol. 43, pp. 181–208. Springer, Netherland (2009)

    Google Scholar 

  4. Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37–46 (1960)

    Article  Google Scholar 

  5. Erk, K., Strapparava, C.: Proceedings of the 5th International Workshop on Semantic Evaluation. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  6. Fellbaum, C. (ed.): WordNet – An Electronic Lexical Database. The MIT Press (1998)

    Google Scholar 

  7. Fellbaum, C., Palmer, M., Dang, H.T., Delfs, L., Wolf, S.: Manual and Automatic Semantic Annotation with WordNet. In: SIGLEX Workshop on WordNet and other Lexical Resources, NAACL 2001, Invited Talk, Pittsburgh, PA (2001)

    Google Scholar 

  8. Hamp, B., Feldweg, H.: GermaNet – a Lexical-Semantic Net for German. In: Proceedings of ACL Workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, Madrid (1997)

    Google Scholar 

  9. Henrich, V., Hinrichs, E.: GernEdiT – The GermaNet Editing Tool. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), Valletta, Malta, pp. 2228–2235 (2010)

    Google Scholar 

  10. Henrich, V., Hinrichs, E., Suttner, K.: Automatically Linking GermaNet to Wikipedia for Harvesting Corpus Examples for GermaNet Senses. Journal for Language Technology and Computational Linguistics (JLCL) 27(1), 1–19 (2012)

    Google Scholar 

  11. Henrich, V., Hinrichs, E., Vodolazova, T.: WebCAGe - A Web-Harvested Corpus Annotated with GermaNet Senses. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon, France, pp. 387–396 (2012)

    Google Scholar 

  12. Mihalcea, R., Chklovski, T., Kilgarriff, A.: Proceedings of Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain (2004)

    Google Scholar 

  13. Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: The Penn Treebank. In: Computational Linguistics, vol. 19, pp. 313–330 (1993)

    Google Scholar 

  14. Palmer, M., Ng, H.T., Dang, H.T.: Evaluation of WSD Systems. In: Agirre, E., Edmonds, P. (eds.) Word Sense Disambiguation: Algorithms and Applications, pp. 75–106. Springer (2006)

    Google Scholar 

  15. Raileanu, D., Buitelaar, P., Vintar, S., Bay, J.: Evaluation Corpora for Sense Disambiguation in the Medical Domain. In: Proceedings of the 3rd International Language Resources and Evaluation (LREC 2002), Las Palmas, Canary Islands, pp. 609–612 (2002)

    Google Scholar 

  16. Schiller, A., Teufel, S., Thielen, C.: Guidelines für das Tagging deutscher Textcorpora mit STTS. Technical report, Universities of Stuttgart and Tübingen (1995)

    Google Scholar 

  17. Telljohann, H., Hinrichs, E.W., Kübler, S., Zinsmeister, H., Beck, K.: Stylebook for the Tübingen Treebank of Written German (TüBa-D/Z). Technical report, Department of General and Computational Linguistics, University of Tübingen, Germany (2012)

    Google Scholar 

  18. Véronis, J.: A study of polysemy judgments and inter-annotator agreement. In: Proceedings of SENSEVAL-1, Herstmonceux Castle, England (1998)

    Google Scholar 

  19. Widdows, D., Peters, S., Cederberg, S., Chan, C.-K., Steffen, D., Buitelaar, P.: Unsupervised monolingual and bilingual word-sense disambiguation of medical documents using umls. In: Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine, BioMed 2003, pp. 9–16. Association for Computational Linguistics, Stroudsburg (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Henrich, V., Hinrichs, E. (2013). Extending the TüBa-D/Z Treebank with GermaNet Sense Annotation. In: Gurevych, I., Biemann, C., Zesch, T. (eds) Language Processing and Knowledge in the Web. Lecture Notes in Computer Science(), vol 8105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40722-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40722-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40721-5

  • Online ISBN: 978-3-642-40722-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics