Skip to main content

Mapping Named Entities from NKJP Corpus to Składnica Treebank and Polish Wordnet

  • Conference paper
Language Processing and Intelligent Information Systems (IIS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7912))

Included in the following conference series:

  • 1025 Accesses

Abstract

In this paper a method of mapping named entities from NKJP corpus, where their annotation is rather coarse, to Składnica treebank, where their annotation is wordnet-based, is discussed. The method is based on the fact that Składnica is a subcorpus of the one-million-word manually annotated balanced subcorpus of NKJP. The method to find a corresponding node in a parse tree is presented. Next, several heuristics to match the lemma of an NE in Polish Wordnet and to choose the most probable semantic interpretation of ambiguous ones are suggested. The results of the mapping are evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agirre, E., Martinez, D.: Learning class-to-class selectional preferences. In: Proceedings of the Conference on Natural Language Learning, Toulouse, France, pp. 15–22 (2001)

    Google Scholar 

  2. Brockmann, C., Lapata, M.: Evaluating and combining approaches to selectional preference acquisition. In: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), Budapest, Hungary, pp. 27–34 (2003)

    Google Scholar 

  3. Ribas, F.: An experiment on learning appropriate selectional restrictions from parsed corpus. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING 1994), Kyoto, Japan, pp. 769–774 (1994)

    Google Scholar 

  4. Lapata, M.: Acquiring lexical generalizations from corpora: a case study for diathesis alternations. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), College Park, MA, pp. 397–404 (1999)

    Google Scholar 

  5. McCarthy, D.: Lexical Acquisition at the Syntax-Semantics Interface: Diathesis Alternations, Subcategorization Frames and Selectional Preferences. PhD thesis, University of Sussex (2001)

    Google Scholar 

  6. Hajnicz, E.: Automatyczne tworzenie semantycznego słownika walencyjnego. Problemy Współczesnej Nauki. Teoria i Zastosowania: Inżynieria Lingwistyczna. Academic Publishing House Exit, Warsaw (2011)

    Google Scholar 

  7. Cimiano, P., Völker, J.: Towards large-scale, open-domain and ontology-based named entity classification. In: Proceedings of the Recent Advances in Natural Language Processing (RANLP 2005), Borovets, Bulgaria, INCOMA Ltd., pp. 166–172 (2005)

    Google Scholar 

  8. Fleischman, M., Hovy, E.: Fine grained classification of named entities. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, pp. 1–7 (2002)

    Google Scholar 

  9. Cucchiarelli, A., Velardi, P.: Unsupervised named entity recognition using syntactic and semantic contextual evidence. Computational Linguistics 27(1), 123–131 (2001)

    Article  Google Scholar 

  10. Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165(1), 91–134 (2005)

    Article  Google Scholar 

  11. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  12. Przepiórkowski, A., Górski, R.L., Łaziński, M., Pęzik, P.: Recent developments in the National Corpus of Polish. In: [24]

    Google Scholar 

  13. Głowińska, K., Przepiórkowski, A.: The design of syntactic annotation levels in the National Corpus of Polish. In: [24]

    Google Scholar 

  14. Przepiórkowski, A., Bański, P.: XML text interchange format in the National Corpus of Polish. In: Goźdź-Roszkowski, S. (ed.) Practical Applications in Language Corpora (PALC 2009), Frankfurt am Main, Peter Lang, pp. 55–65 (2009)

    Google Scholar 

  15. Savary, A., Waszczuk, J., Przepiórkowski, A.: Towards the annotation of named entities in the National Corpus of Polish. In: [24]

    Google Scholar 

  16. Woliński, M.: An efficient implementation of a large grammar of Polish. In: Vetulani, Z. (ed.) Proceedings of the 2nd Language & Technology Conference, Poznań, Poland, pp. 343–347 (2005)

    Google Scholar 

  17. Świdziński, M., Woliński, M.: A new formal definition of Polish nominal phrases. In: Marciniak, M., Mykowiecka, A. (eds.) Aspects of Natural Language Processing. LNCS, vol. 5070, pp. 143–162. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  18. Świdziński, M., Woliński, M.: Towards a bank of constituent parse trees for polish. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 197–204. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  19. Świdziński, M.: Gramatyka formalna języka polskiego. Rozprawy Uniwersytetu Warszawskiego. Wydawnictwa Uniwersytetu Warszawskiego, Warsaw, Poland (1992)

    Google Scholar 

  20. Piasecki, M., Szpakowicz, S., Broda, B.: A Wordnet from the Ground Up. Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław, Poland (2009)

    Google Scholar 

  21. Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawisławska, M., Broda, B.: Words, concepts and relations in the construction of Polish WordNet. In: Tanacs, A., Csendes, D., Vincze, V., Fellbaum, C., Vossen, P. (eds.) Proceedings of the Global WordNet Conference, Seged, Hungary, pp. 162–177 (2008)

    Google Scholar 

  22. Agirre, E., Edmonds, P. (eds.): Word Sense Disambiguation. Algorithms and Applications. Text, Speech and Language Technology, vol. 33. Springer, Dordrecht (2006)

    Google Scholar 

  23. Gale, W., Church, K., Yarowsky, D.: Estimating upper and lower bounds on the performance of word-sense disambiguation programs. In: Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics (ACL 1992), Newark, DL, pp. 249–256 (1992)

    Google Scholar 

  24. Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010), Valetta, Malta, ELRA (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hajnicz, E. (2013). Mapping Named Entities from NKJP Corpus to Składnica Treebank and Polish Wordnet. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol 7912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38634-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38634-3_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38633-6

  • Online ISBN: 978-3-642-38634-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics