Skip to main content

Towards German Word Embeddings: A Use Case with Predictive Sentiment Analysis

  • Conference paper
  • First Online:
Book cover Data Science – Analytics and Applications

Zusammenfassung

Trotz des Forschungsbooms im Bereich Worteinbettungen und ihrer Textmininganwendungen der letzten Jahre, konzentriert sich der Großteil der Publikationen ausschließlich auf die englische Sprache. Außerdem ist die Hyperparameterabstimmung ein Prozess, der selten gut dokumentiert (speziell für nicht-englische Texte), jedoch sehr wichtig ist, um hochqualitative Wortwiedergaben zu erhalten. In dieser Arbeit zeigen wir, wie verschiedene Hyperparameterkombinationen Einfluss auf die resultierenden deutschen Wortvektoren haben und wie diese Wortwiedergaben Teil eines komplexeren Modells sein können. Im Einzelnen führen wir als erstes eine intrinsische Bewertung unserer deutschen Worteinbettungen durch, die später in einem vorausschauenden Stimmungsanalysemodell verwendet werden. Letzteres dient nicht nur einer intrinsischen Bewertung der deutschen Worteinbettungen, sondern zeigt außerdem, ob Kundenwünsche nur durch das Einbetten von Dokumenten vorhergesagt werden können.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 29.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Literatur

  • [1] W. Y. Zou, R. Socher, D. M. Cer, and C. D. Manning, “Bilingual word embeddings for phrase-based machine translation.” in EMNLP, 2013, pp. 1393–1398.

    Google Scholar 

  • [2] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119.

    Google Scholar 

  • [3] A. Passos, V. Kumar, and A. McCallum, “Lexicon infused phrase embeddings for named entity resolution,” in Conference on Computational Natural Language Learning (CoNLL), 2014.

    Google Scholar 

  • [4] G. Faaß and K. Eckart, “SdeWaC–a corpus of parsable sentences from the web,” in Language processing and knowledge in the Web. Springer, 2013, pp. 61–68.

    Google Scholar 

  • [5] M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta, “The WaCky Wide Web: A collection of very large linguistically processed webcrawled corpora,” Language Resources and Evaluation, vol. 43, no. 3, pp. 209–226, September 2009.

    Google Scholar 

  • [6] Q. V. Le and T. Mikolov, “Distributed representations of sentences and documents.” in ICML, vol. 14, 2014, pp. 1188–1196.

    Google Scholar 

  • [7] Z. S. Harris, “Distributional structure,” Word, vol. 10, no. 2-3, pp. 146–162, 1954.

    Google Scholar 

  • [8] J. Turian, L. Ratinov, and Y. Bengio, “Word representations: A simple and general method for semi-supervised learning,” in 48th Annual Meeting of the Association for Computational Linguistics, A. for Computational Linguistics, Ed., 2010, pp. 384–394.

    Google Scholar 

  • [9] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 2013.

    Google Scholar 

  • [10] M. S¨anger, U. Leser, S. Kemmerer, P. Adolphs, and R. Klinger, “SCARE – The Sentiment Corpus of App Reviews with Fine-grained Annotations in German,” in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC16). Portorož, Slovenia: European Language Resources Association (ELRA), May 2016.

    Google Scholar 

  • [11] I. Gurevych, “Using the structure of a conceptual network in computing semantic relatedness,” in International Conference on Natural Language Processing. Springer,2005,pp.767–778.

    Google Scholar 

  • [12] T. Zesch and I. Gurevych, “Automatically creating datasets for measures of semantic relatedness,” in Proceedings of the Workshop on Linguistic Distances. AssociationforComputationalLinguistics,2006,pp.16–24.

    Google Scholar 

  • [13] C. Spearman, “The proof and measurement of association between two things,” The American journal of psychology, vol. 15, no. 1, pp. 72–101, 1904.

    Google Scholar 

  • [14] O. Levy, Y. Goldberg, and I. Dagan, “Improving distributional similarity with lessons learned from word embeddings,” Transactions of the Association for Computational Linguistics, vol. 3, pp. 211–225, 2015.

    Google Scholar 

  • [15] J. H. Lau and T. Baldwin, “An empirical evaluation of doc2vec with practical insights into document embedding generation,” in Proceedings of the 1st Workshop on Representation Learning for NLP, 2016, pp. 78–86.

    Google Scholar 

  • [16] M. Faruqui, Y. Tsvetkov, P. Rastogi, and C. Dyer, “Problems with evaluation of word embeddings using word similarity tasks,” in Proceedings of the 1st Workshop on Evaluating Vector Space Representations for NLP, 2016.

    Google Scholar 

  • [17] A. Drachen, R. Sifa, and C. Thurau, “The Name In the Game: Patterns in Character Names and Gamer Tags,” Entertainment Computing, vol. 5, no. 1, pp. 21–32, 2014.

    Google Scholar 

  • [18] C. Ojeda, K. Cvejoski, R. Sifa, and C. Bauckhage, “Inverse Dynamical Inheritance in Stack Exchange Taxonomies,” in Proc. of AAAI AIIDE, 2017.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eduardo Brito .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Fachmedien Wiesbaden GmbH

About this paper

Cite this paper

Brito, E., Sifa, R., Cvejoski, K., Ojeda, C., Bauckhage, C. (2017). Towards German Word Embeddings: A Use Case with Predictive Sentiment Analysis. In: Haber, P., Lampoltshammer, T., Mayr, M. (eds) Data Science – Analytics and Applications. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-19287-7_8

Download citation

Publish with us

Policies and ethics