Word Embeddings for the Polish Language

Rogalski, Marek; Szczepaniak, Piotr S.

doi:10.1007/978-3-319-39378-0_12

Word Embeddings for the Polish Language

Marek Rogalski¹⁹ &
Piotr S. Szczepaniak¹⁹

Conference paper
First Online: 29 May 2016

1192 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9692))

Abstract

We present a dataset of word embeddings for the Polish language. Presented embeddings can be used as an input for Artificial Intelligence methods as an alternative for one-hot representation. Spatial relations between embeddings reflect relations such as alternatives and analogies. This improves generalization of methods using presented embeddings. Data from Wikipedia has been used together with skip-gram and contitous-bag-of-words methods introduced originally for English language by Mikolov et al. Current version of embeddings can be downloaded from http://publications.ics.p.lodz.pl/2016/word_embeddings/.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Even though sparse representations are encoded in the compact form all the operations are performed as if they still were full-size vectors.

References

Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Chen, Y., Perozzi, B., Al-Rfou, R., Skiena, S.: The expressive power of word embeddings. arXiv preprint (2013). arXiv:1301.3226
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Annual Meeting of the Association for Computational Linguistics (ACL) (2012)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT press, Cambridge (1999)
MATH Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013)
Google Scholar
Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL, pp. 746–751 (2013)
Google Scholar
Mnih, A., Hinton, G.: Three new graphical models for statistical language modelling. In: Proceedings of the 24th International Conference on Machine learning. pp. 641–648. ACM (2007)
Google Scholar
Przepiórkowski, A.: A comparison of two morphosyntactic tagsets of polish. In: Representing Semantics in Digital Lexicography: Proceedings of MONDILEX Fourth Open Workshop. pp. 138–144. Warsaw (2009)
Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of The 48th Annual Meeting of The Association for Computational Linguistics. pp. 384–394. Association for Computational Linguistics (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Lodz University of Technology, Wolczanska 215, 90-924, Lodz, Poland
Marek Rogalski & Piotr S. Szczepaniak

Authors

Marek Rogalski
View author publications
You can also search for this author in PubMed Google Scholar
Piotr S. Szczepaniak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marek Rogalski .

Editor information

Editors and Affiliations

Częstochowa University of Technology, Czestochowa, Poland
Leszek Rutkowski
Częstochowa University of Technology, Czestochowa, Poland
Marcin Korytkowski
Częstochowa University of Technology, Czestochowa, Poland
Rafał Scherer
AGH University of Science and Technology, Krakow, Poland
Ryszard Tadeusiewicz
University of California, Berkeley, California, USA
Lotfi A. Zadeh
University of Louisville, Louisville, Kentucky, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rogalski, M., Szczepaniak, P.S. (2016). Word Embeddings for the Polish Language. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2016. Lecture Notes in Computer Science(), vol 9692. Springer, Cham. https://doi.org/10.1007/978-3-319-39378-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-39378-0_12
Published: 29 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39377-3
Online ISBN: 978-3-319-39378-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics