Mapping Named Entities from NKJP Corpus to Składnica Treebank and Polish Wordnet

Hajnicz, Elżbieta

doi:10.1007/978-3-642-38634-3_11

Elżbieta Hajnicz¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7912))

Included in the following conference series:

Intelligent Information Systems Symposium

1025 Accesses

Abstract

In this paper a method of mapping named entities from NKJP corpus, where their annotation is rather coarse, to Składnica treebank, where their annotation is wordnet-based, is discussed. The method is based on the fact that Składnica is a subcorpus of the one-million-word manually annotated balanced subcorpus of NKJP. The method to find a corresponding node in a parse tree is presented. Next, several heuristics to match the lemma of an NE in Polish Wordnet and to choose the most probable semantic interpretation of ambiguous ones are suggested. The results of the mapping are evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agirre, E., Martinez, D.: Learning class-to-class selectional preferences. In: Proceedings of the Conference on Natural Language Learning, Toulouse, France, pp. 15–22 (2001)
Google Scholar
Brockmann, C., Lapata, M.: Evaluating and combining approaches to selectional preference acquisition. In: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), Budapest, Hungary, pp. 27–34 (2003)
Google Scholar
Ribas, F.: An experiment on learning appropriate selectional restrictions from parsed corpus. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING 1994), Kyoto, Japan, pp. 769–774 (1994)
Google Scholar
Lapata, M.: Acquiring lexical generalizations from corpora: a case study for diathesis alternations. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), College Park, MA, pp. 397–404 (1999)
Google Scholar
McCarthy, D.: Lexical Acquisition at the Syntax-Semantics Interface: Diathesis Alternations, Subcategorization Frames and Selectional Preferences. PhD thesis, University of Sussex (2001)
Google Scholar
Hajnicz, E.: Automatyczne tworzenie semantycznego słownika walencyjnego. Problemy Współczesnej Nauki. Teoria i Zastosowania: Inżynieria Lingwistyczna. Academic Publishing House Exit, Warsaw (2011)
Google Scholar
Cimiano, P., Völker, J.: Towards large-scale, open-domain and ontology-based named entity classification. In: Proceedings of the Recent Advances in Natural Language Processing (RANLP 2005), Borovets, Bulgaria, INCOMA Ltd., pp. 166–172 (2005)
Google Scholar
Fleischman, M., Hovy, E.: Fine grained classification of named entities. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, pp. 1–7 (2002)
Google Scholar
Cucchiarelli, A., Velardi, P.: Unsupervised named entity recognition using syntactic and semantic contextual evidence. Computational Linguistics 27(1), 123–131 (2001)
Article Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165(1), 91–134 (2005)
Article Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Article Google Scholar
Przepiórkowski, A., Górski, R.L., Łaziński, M., Pęzik, P.: Recent developments in the National Corpus of Polish. In: [24]
Google Scholar
Głowińska, K., Przepiórkowski, A.: The design of syntactic annotation levels in the National Corpus of Polish. In: [24]
Google Scholar
Przepiórkowski, A., Bański, P.: XML text interchange format in the National Corpus of Polish. In: Goźdź-Roszkowski, S. (ed.) Practical Applications in Language Corpora (PALC 2009), Frankfurt am Main, Peter Lang, pp. 55–65 (2009)
Google Scholar
Savary, A., Waszczuk, J., Przepiórkowski, A.: Towards the annotation of named entities in the National Corpus of Polish. In: [24]
Google Scholar
Woliński, M.: An efficient implementation of a large grammar of Polish. In: Vetulani, Z. (ed.) Proceedings of the 2nd Language & Technology Conference, Poznań, Poland, pp. 343–347 (2005)
Google Scholar
Świdziński, M., Woliński, M.: A new formal definition of Polish nominal phrases. In: Marciniak, M., Mykowiecka, A. (eds.) Aspects of Natural Language Processing. LNCS, vol. 5070, pp. 143–162. Springer, Heidelberg (2009)
Chapter Google Scholar
Świdziński, M., Woliński, M.: Towards a bank of constituent parse trees for polish. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 197–204. Springer, Heidelberg (2010)
Chapter Google Scholar
Świdziński, M.: Gramatyka formalna języka polskiego. Rozprawy Uniwersytetu Warszawskiego. Wydawnictwa Uniwersytetu Warszawskiego, Warsaw, Poland (1992)
Google Scholar
Piasecki, M., Szpakowicz, S., Broda, B.: A Wordnet from the Ground Up. Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław, Poland (2009)
Google Scholar
Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawisławska, M., Broda, B.: Words, concepts and relations in the construction of Polish WordNet. In: Tanacs, A., Csendes, D., Vincze, V., Fellbaum, C., Vossen, P. (eds.) Proceedings of the Global WordNet Conference, Seged, Hungary, pp. 162–177 (2008)
Google Scholar
Agirre, E., Edmonds, P. (eds.): Word Sense Disambiguation. Algorithms and Applications. Text, Speech and Language Technology, vol. 33. Springer, Dordrecht (2006)
Google Scholar
Gale, W., Church, K., Yarowsky, D.: Estimating upper and lower bounds on the performance of word-sense disambiguation programs. In: Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics (ACL 1992), Newark, DL, pp. 249–256 (1992)
Google Scholar
Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010), Valetta, Malta, ELRA (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, Poland
Elżbieta Hajnicz

Authors

Elżbieta Hajnicz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248, Warsaw, Poland
Mieczysław A. Kłopotek , Jacek Koronacki , Małgorzata Marciniak & Agnieszka Mykowiecka , , &
Institute of Computer Science, Polish Academy of Sciences, ul. Brzegi 55, 80-045, Gdańsk, Poland
Sławomir T. Wierzchoń

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hajnicz, E. (2013). Mapping Named Entities from NKJP Corpus to Składnica Treebank and Polish Wordnet. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol 7912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38634-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-38634-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38633-6
Online ISBN: 978-3-642-38634-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics