Skip to main content

Word Embeddings for the Polish Language

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9692))

Abstract

We present a dataset of word embeddings for the Polish language. Presented embeddings can be used as an input for Artificial Intelligence methods as an alternative for one-hot representation. Spatial relations between embeddings reflect relations such as alternatives and analogies. This improves generalization of methods using presented embeddings. Data from Wikipedia has been used together with skip-gram and contitous-bag-of-words methods introduced originally for English language by Mikolov et al. Current version of embeddings can be downloaded from http://publications.ics.p.lodz.pl/2016/word_embeddings/.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Even though sparse representations are encoded in the compact form all the operations are performed as if they still were full-size vectors.

References

  1. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  2. Chen, Y., Perozzi, B., Al-Rfou, R., Skiena, S.: The expressive power of word embeddings. arXiv preprint (2013). arXiv:1301.3226

  3. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  4. Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Annual Meeting of the Association for Computational Linguistics (ACL) (2012)

    Google Scholar 

  5. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT press, Cambridge (1999)

    MATH  Google Scholar 

  6. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013)

    Google Scholar 

  7. Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL, pp. 746–751 (2013)

    Google Scholar 

  8. Mnih, A., Hinton, G.: Three new graphical models for statistical language modelling. In: Proceedings of the 24th International Conference on Machine learning. pp. 641–648. ACM (2007)

    Google Scholar 

  9. Przepiórkowski, A.: A comparison of two morphosyntactic tagsets of polish. In: Representing Semantics in Digital Lexicography: Proceedings of MONDILEX Fourth Open Workshop. pp. 138–144. Warsaw (2009)

    Google Scholar 

  10. Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of The 48th Annual Meeting of The Association for Computational Linguistics. pp. 384–394. Association for Computational Linguistics (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marek Rogalski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Rogalski, M., Szczepaniak, P.S. (2016). Word Embeddings for the Polish Language. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2016. Lecture Notes in Computer Science(), vol 9692. Springer, Cham. https://doi.org/10.1007/978-3-319-39378-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39378-0_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39377-3

  • Online ISBN: 978-3-319-39378-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics