Lemmatization of Multi-Word Entity Names for Polish Language Using Rules Automatically Generated Based on the Corpus Analysis

Małyszko, Jacek; Abramowicz, Witold; Filipowska, Agata; Wagner, Tomasz

doi:10.1007/978-3-319-93782-3_6

Jacek Małyszko¹⁶,
Witold Abramowicz¹⁶,
Agata Filipowska¹⁶ &
…
Tomasz Wagner¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10930))

Included in the following conference series:

Language and Technology Conference

541 Accesses
1 Citations

Abstract

The article concerns automatic lemmatization of Multi-Word Units for highly inflective languages. We present an approach, where the lemmatization is conducted using rules generated solely based on a corpus analysis. Conducted experiments revealed, that the accuracy of automatic lemmatization of MWUs for the Polish language according to the developed approach may reach up to 82%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://hunspell.sourceforge.net/.

References

Handl, J.: Computational inflection of contiguous multi-word units with JSLIM. Conf. Intell. Inf. Syst. 2013, 113–126 (2013)
Google Scholar
Małyszko, J., Abramowicz, W., Stróżyna, M.: Named entity disambiguation for maritime-related data retrieved from heterogenous sources. TransNav: Int. J. Mar. Navig. Saf. Sea Transp. 10(3), 465–477 (2016)
Article Google Scholar
Marcińczuk, M., Kocoń, J., Oleksy, M.: Liner2 - a generic framework for named entity recognition. In: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, Valencia, Spain, April 2017
Google Scholar
Piskorski, J., Sydow, M., Kupść, A.: Lemmatization of Polish Person Names. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies, ACL 2007, pp. 27–34. Association for Computational Linguistics, Stroudsburg (2007). http://dl.acm.org/citation.cfm?id=1567545.1567551
Radziszewski, A.: A Tiered CRF Tagger for Polish. In: Bembenik, R., Skonieczny, L., Rybinski, H., Kryszkiewicz, M., Niezgodka, M. (eds.) Intelligent Tools for Building a Scientific Information Platform. SCI, vol. 467. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35647-6_16
Chapter Google Scholar
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_1
Chapter Google Scholar
Savary, A.: A formalism for the computational morphology of multi-word units. Arch. Control Sci. 15(3), 437 (2005)
MATH Google Scholar
Savary, A.: Computational inflection of multi-word units, a contrastive study of lexical approaches. Linguist. Issues Lang. Tech. 1–2, 1–53 (2008)
Google Scholar
Stankovic, R., Obradovic, I., Krstev, C., Vitas, D.: Production of morphological dictionaries of multi-word units using a multipurpose tool. In: Proceedings of the Computational Linguistics-Applications Conference, Jachranka, Poland, 17–19 October 2011, pp. 77–84. Polish Information Processing Society (2011)
Google Scholar
Woliński, M., Miłkowski, M., Ogrodniczuk, M., Przepiórkowski, A.: PoliMorf: a (not so) New Open Morphological Dictionary for Polish. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey. European Language Resources Association (ELRA), May 2012
Google Scholar

Download references

Author information

Authors and Affiliations

Poznan Univeristy of Economics and Business, al. Niepodległości 10, 61-875, Poznań, Poland
Jacek Małyszko, Witold Abramowicz, Agata Filipowska & Tomasz Wagner

Authors

Jacek Małyszko
View author publications
You can also search for this author in PubMed Google Scholar
Witold Abramowicz
View author publications
You can also search for this author in PubMed Google Scholar
Agata Filipowska
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Wagner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Agata Filipowska .

Editor information

Editors and Affiliations

Adam Mickiewicz University, Poznań, Poland
Zygmunt Vetulani
LIMSI-CNRS, Orsay Cedex, France
Joseph Mariani
Adam Mickiewicz University, Poznań, Poland
Marek Kubis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Małyszko, J., Abramowicz, W., Filipowska, A., Wagner, T. (2018). Lemmatization of Multi-Word Entity Names for Polish Language Using Rules Automatically Generated Based on the Corpus Analysis. In: Vetulani, Z., Mariani, J., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2015. Lecture Notes in Computer Science(), vol 10930. Springer, Cham. https://doi.org/10.1007/978-3-319-93782-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-93782-3_6
Published: 16 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93781-6
Online ISBN: 978-3-319-93782-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics