Lemmatization of Multi-Word Entity Names for Polish Language Using Rules Automatically Generated Based on the Corpus Analysis

  • Jacek Małyszko
  • Witold Abramowicz
  • Agata FilipowskaEmail author
  • Tomasz Wagner
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10930)


The article concerns automatic lemmatization of Multi-Word Units for highly inflective languages. We present an approach, where the lemmatization is conducted using rules generated solely based on a corpus analysis. Conducted experiments revealed, that the accuracy of automatic lemmatization of MWUs for the Polish language according to the developed approach may reach up to 82%.


Natural Language Processing Multi-Word Units Lemmatization 


  1. 1.
    Handl, J.: Computational inflection of contiguous multi-word units with JSLIM. Conf. Intell. Inf. Syst. 2013, 113–126 (2013)Google Scholar
  2. 2.
    Małyszko, J., Abramowicz, W., Stróżyna, M.: Named entity disambiguation for maritime-related data retrieved from heterogenous sources. TransNav: Int. J. Mar. Navig. Saf. Sea Transp. 10(3), 465–477 (2016)CrossRefGoogle Scholar
  3. 3.
    Marcińczuk, M., Kocoń, J., Oleksy, M.: Liner2 - a generic framework for named entity recognition. In: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, Valencia, Spain, April 2017Google Scholar
  4. 4.
    Piskorski, J., Sydow, M., Kupść, A.: Lemmatization of Polish Person Names. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies, ACL 2007, pp. 27–34. Association for Computational Linguistics, Stroudsburg (2007).
  5. 5.
    Radziszewski, A.: A Tiered CRF Tagger for Polish. In: Bembenik, R., Skonieczny, L., Rybinski, H., Kryszkiewicz, M., Niezgodka, M. (eds.) Intelligent Tools for Building a Scientific Information Platform. SCI, vol. 467. Springer, Heidelberg (2013). Scholar
  6. 6.
    Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). Scholar
  7. 7.
    Savary, A.: A formalism for the computational morphology of multi-word units. Arch. Control Sci. 15(3), 437 (2005)zbMATHGoogle Scholar
  8. 8.
    Savary, A.: Computational inflection of multi-word units, a contrastive study of lexical approaches. Linguist. Issues Lang. Tech. 1–2, 1–53 (2008)Google Scholar
  9. 9.
    Stankovic, R., Obradovic, I., Krstev, C., Vitas, D.: Production of morphological dictionaries of multi-word units using a multipurpose tool. In: Proceedings of the Computational Linguistics-Applications Conference, Jachranka, Poland, 17–19 October 2011, pp. 77–84. Polish Information Processing Society (2011)Google Scholar
  10. 10.
    Woliński, M., Miłkowski, M., Ogrodniczuk, M., Przepiórkowski, A.: PoliMorf: a (not so) New Open Morphological Dictionary for Polish. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey. European Language Resources Association (ELRA), May 2012Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Jacek Małyszko
    • 1
  • Witold Abramowicz
    • 1
  • Agata Filipowska
    • 1
    Email author
  • Tomasz Wagner
    • 1
  1. 1.Poznan Univeristy of Economics and BusinessPoznańPoland

Personalised recommendations