Advertisement

SEJF - A Grammatical Lexicon of Polish Multiword Expressions

  • Monika CzerepowickaEmail author
  • Agata Savary
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10930)

Abstract

We present SEJF, a lexical resource of Polish nominal, adjectival and adverbial multiword expressions. It consists of an intensional module with about 4,700 multiword lemmas assigned to 160 inflection graphs, and an extensional module with 88,000 automatically generated inflected forms annotated with grammatical tags. We show the results of its coverage evaluation against an annotated corpus. The resource is freely available under the Creative Commons BY-SA license.

Notes

Acknowledgements

This work has been supported by three projects: (i) Nekst(http://www.ipipan.waw.pl/nekst), funded by the European Regional Development Fund and the Polish Ministry of Science and Higher Education, (ii) CESAR (http://clip.ipipan.waw.pl/CESAR) - a European project (CIP-ICT-PSP-271022), part of META-NET, (iii) IC1207 COST action PARSEME(http://www.parseme.eu).

References

  1. 1.
    Al-Haj, H., Itai, A., Wintner, S.: Lexical representation of multiword expressions in morphologically-complex languages. Int. J. Lexicogr. 27(2), 130–170 (2014)CrossRefGoogle Scholar
  2. 2.
    Alegria, I., Ansa, O., Artola, X., Ezeiza, N., Gojenola, K., Urizar, R.: Representation and treatment of multiword expressions in Basque. In: Proceedings of the ACL 2004 Workshop on Multiword Expressions, pp. 48–55 (2004)Google Scholar
  3. 3.
    Bańko, M.: Słownik porównań. Polish Scientific Publishers PWN, Warsaw (2004)Google Scholar
  4. 4.
    Bień, J.S.: Koncepcja słownikowej informacji morfologicznej i jej komputerowej weryfikacji. Rozprawy Uniwersytetu Warszawskiego 383 (1991)Google Scholar
  5. 5.
    Broda, B., Derwojedowa, M., Piasecki, M.: Recognition of structured collocations in an inflective language. In: Proceedings of the International Multiconference on Computer Science and Information Technology – 2nd International Symposium Advances in Artificial Intelligence and Applications (AAIA 2007), pp. 237–246 (2007)Google Scholar
  6. 6.
    Czerepowicka, M.: Opis powierzchniowoskładniowy wyrażeń niestandardowych typu Open image in new window we współczesnym języku polskim. Akademicka Oficyna Wydawnicza EXIT, Warszawa (2006)Google Scholar
  7. 7.
    Czerepowicka, M., Kosek, I., Przybyszewski, S.: O projekcie elektronicznego słownika odmiany frazeologizmów czasownikowych. Polonica 34, 115–123 (2014)Google Scholar
  8. 8.
    El Maarouf, I., Oakes, M.: Statistical measures for characterising MWEs. In: IC1207 COST PARSEME 5th General Meeting (2015). http://typo.uni-konstanz.de/parseme/index.php/2-general/138-admitted-posters-iasi-23-24-september-2015
  9. 9.
    Foufi, V.: Les noms composés A(A)N du Grec Moderne et leurs variantes. In: Kakoyianni Doa, F. (ed.) Penser Le Lexique-Grammaire : Perspectives Actuelles. Editions Honoré Champion, Paris (2013)Google Scholar
  10. 10.
    Graliński, F., Savary, A., Czerepowicka, M., Makowiecki, F.: Computational lexicography of multi-word units. How efficient can it be? In: Proceedings of the COLING-MWE 2010 Workshop, Beijing, China (2010)Google Scholar
  11. 11.
    Grégoire, N.: DuELME: a Dutch electronic lexicon of multiword expressions. Lang. Resour. Eval. 44(1–2), 23–39 (2010)CrossRefGoogle Scholar
  12. 12.
    Kosek, I.: Fleksja i składnia nieciągłych imiennych jednostek leksykalnych. Publishing House of the University of Warmia and Mazury, Olsztyn (2008)Google Scholar
  13. 13.
    Krstev, C., Stanković, R., Obradović, I., Vitas, D., Utvić, M.: Automatic construction of a morphological dictionary of multi-word units. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) NLP 2010. LNCS (LNAI), vol. 6233, pp. 226–237. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-14770-8_26CrossRefGoogle Scholar
  14. 14.
    Kyriacopoulou, T., Mrabti, S., Yannacopoulou, A.: Le dictionnaire électronique des noms composés en grec moderne. Lingvist. Investig. 25(1), 7–28 (2002)CrossRefGoogle Scholar
  15. 15.
    Losnegaard, G.S., Sangati, F., Escartín, C.P., Savary, A., Bargmann, S., Monti, J.: Parseme survey on MWE resources. In: Chair, N.C.C., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Paris, May 2016Google Scholar
  16. 16.
    Marciniak, M., Savary, A., Sikora, P., Woliński, M.: Toposław – a lexicographic framework for multi-word units. In: Vetulani, Z. (ed.) LTC 2009. LNCS (LNAI), vol. 6562, pp. 139–150. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20095-3_13CrossRefGoogle Scholar
  17. 17.
    Marciniak, M., Rabiega-Wiśniewska, J., Savary, A., Woliński, M., Heliasz, C.: Constructing an electronic dictionary of polish urban proper names. In: Recent Advances in Intelligent Information Systems, pp. 233–246. Exit (2009)Google Scholar
  18. 18.
    Oflazer, K., Çetonoğlu, Özlem., Say, B.: Integrating morphology with multi-word expression processing in Turkish. In: Second ACL Workshop on Multiword Expressions, pp. 64–71 (2004)Google Scholar
  19. 19.
    Przepiórkowski, A., Bańko, M., Górski, R.L., Lewandowska-Tomaszczyk, B. (eds.): Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN, Warsaw (2012)Google Scholar
  20. 20.
    Przepiórkowski, A., Hajnicz, E., Patejuk, A., Woliński, M.: Extended phraseological information in a valence dictionary for NLP applications. In: Proceedings of the Workshop on Lexical and Grammatical Resources for Language Processing (LG-LP 2014), pp. 83–91. Association for Computational Linguistics and Dublin City University, Dublin, Ireland (2014). http://www.aclweb.org/anthology/siglex.html#2014_0
  21. 21.
    Radziszewski, A., Kilgarriff, A., Lew, R.: Polish word sketches. In: Proceedings of the 5th Language and Technology Conference, Poznań, Poland, pp. 237–242, November 2011Google Scholar
  22. 22.
    Rafajlovska, A., Zdravkova, K.: Représentation des expressions composées en macédonien en tant qu’entrées lexicales en Unitex. In: Actes de la Traitement Automatique des Langues Slaves, pp. 1–8. Association pour le Traitement Automatique des Langues, Caen, France, June 2015. http://www.atala.org/taln_archives/TASLA/TASLA-2015/tasla-2015-court-001
  23. 23.
    Saloni, Z.: Klasyfikacja gramatyczna leksemów polskich. Język Polski 54(1), 3–13 (1974)Google Scholar
  24. 24.
    Savary, A.: Recensement et description des mots composés - méthodes et applications, Ph.D. Thesis. Université de Marne-la-Vallée (2000)Google Scholar
  25. 25.
    Savary, A.: Multiflex: a multilingual finite-state tool for multi-word units. In: Maneth, S. (ed.) CIAA 2009. LNCS, vol. 5642, pp. 237–240. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-02979-0_27CrossRefGoogle Scholar
  26. 26.
    Savary, A., Waszczuk, J.: Projecting multiword expression resources on a polish treebank. In: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, pp. 20–26. Association for Computational Linguistics, Valencia, Spain April 2017. http://www.aclweb.org/anthology/W17-1404
  27. 27.
    Savary, A., Zaborowski, B., Krawczyk-Wieczorek, A., Makowiecki, F.: SEJFEK - a lexicon and a shallow grammar of polish economic multi-word units. In: Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon, pp. 195–214. The COLING 2012 Organizing Committee, Mumbai, India, December 2012. http://www.aclweb.org/anthology/W12-5116
  28. 28.
    Silberztein, M.: Les groupes nominaux productifs et les noms composés lexicalisés. Lingvist. Investig. 17(2), 405–425 (1993)CrossRefGoogle Scholar
  29. 29.
    Stanković, R., Obradović, I., Krstev, C., Vitas, D.: Production of morphological dictionaries of multi-word units using a multipurpose tool. In: Proceedings of the Computational Linguistics-Applications Conference, Jachranka, Poland, pp. 77–84, October 2011Google Scholar
  30. 30.
    Świdziński, M., Woliński, M.: Towards a bank of constituent parse trees for polish. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS (LNAI), vol. 6231, pp. 197–204. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15760-8_26CrossRefGoogle Scholar
  31. 31.
    Wojdak, P.: Przysłówki polisegmentalne w modelu składniowym polszczyzny. Publishing House of the University of Szczecin, Szczecin (2008)Google Scholar
  32. 32.
    Woliński, M.: Morfeusz - a practical tool for the morphological analysis of polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining. AINSC, vol. 35. Springer, Heidelberg (2006).  https://doi.org/10.1007/3-540-33521-8_55CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Faculty of HumanitiesUniversity of Warmia and Mazury in OlsztynOlsztynPoland
  2. 2.Université François Rabelais ToursToursFrance

Personalised recommendations