Towards German Word Embeddings: A Use Case with Predictive Sentiment Analysis

Brito, Eduardo; Sifa, Rafet; Cvejoski, Kostadin; Ojeda, César; Bauckhage, Christian

doi:10.1007/978-3-658-19287-7_8

Eduardo Brito⁴,
Rafet Sifa^4,5,
Kostadin Cvejoski⁴,
César Ojeda⁴ &
…
Christian Bauckhage^4,5

4836 Accesses
1 Citations

Zusammenfassung

Trotz des Forschungsbooms im Bereich Worteinbettungen und ihrer Textmininganwendungen der letzten Jahre, konzentriert sich der Großteil der Publikationen ausschließlich auf die englische Sprache. Außerdem ist die Hyperparameterabstimmung ein Prozess, der selten gut dokumentiert (speziell für nicht-englische Texte), jedoch sehr wichtig ist, um hochqualitative Wortwiedergaben zu erhalten. In dieser Arbeit zeigen wir, wie verschiedene Hyperparameterkombinationen Einfluss auf die resultierenden deutschen Wortvektoren haben und wie diese Wortwiedergaben Teil eines komplexeren Modells sein können. Im Einzelnen führen wir als erstes eine intrinsische Bewertung unserer deutschen Worteinbettungen durch, die später in einem vorausschauenden Stimmungsanalysemodell verwendet werden. Letzteres dient nicht nur einer intrinsischen Bewertung der deutschen Worteinbettungen, sondern zeigt außerdem, ob Kundenwünsche nur durch das Einbetten von Dokumenten vorhergesagt werden können.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 29.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Literatur

[1] W. Y. Zou, R. Socher, D. M. Cer, and C. D. Manning, “Bilingual word embeddings for phrase-based machine translation.” in EMNLP, 2013, pp. 1393–1398.
Google Scholar
[2] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119.
Google Scholar
[3] A. Passos, V. Kumar, and A. McCallum, “Lexicon infused phrase embeddings for named entity resolution,” in Conference on Computational Natural Language Learning (CoNLL), 2014.
Google Scholar
[4] G. Faaß and K. Eckart, “SdeWaC–a corpus of parsable sentences from the web,” in Language processing and knowledge in the Web. Springer, 2013, pp. 61–68.
Google Scholar
[5] M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta, “The WaCky Wide Web: A collection of very large linguistically processed webcrawled corpora,” Language Resources and Evaluation, vol. 43, no. 3, pp. 209–226, September 2009.
Google Scholar
[6] Q. V. Le and T. Mikolov, “Distributed representations of sentences and documents.” in ICML, vol. 14, 2014, pp. 1188–1196.
Google Scholar
[7] Z. S. Harris, “Distributional structure,” Word, vol. 10, no. 2-3, pp. 146–162, 1954.
Google Scholar
[8] J. Turian, L. Ratinov, and Y. Bengio, “Word representations: A simple and general method for semi-supervised learning,” in 48th Annual Meeting of the Association for Computational Linguistics, A. for Computational Linguistics, Ed., 2010, pp. 384–394.
Google Scholar
[9] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 2013.
Google Scholar
[10] M. S¨anger, U. Leser, S. Kemmerer, P. Adolphs, and R. Klinger, “SCARE – The Sentiment Corpus of App Reviews with Fine-grained Annotations in German,” in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC16). Portorož, Slovenia: European Language Resources Association (ELRA), May 2016.
Google Scholar
[11] I. Gurevych, “Using the structure of a conceptual network in computing semantic relatedness,” in International Conference on Natural Language Processing. Springer,2005,pp.767–778.
Google Scholar
[12] T. Zesch and I. Gurevych, “Automatically creating datasets for measures of semantic relatedness,” in Proceedings of the Workshop on Linguistic Distances. AssociationforComputationalLinguistics,2006,pp.16–24.
Google Scholar
[13] C. Spearman, “The proof and measurement of association between two things,” The American journal of psychology, vol. 15, no. 1, pp. 72–101, 1904.
Google Scholar
[14] O. Levy, Y. Goldberg, and I. Dagan, “Improving distributional similarity with lessons learned from word embeddings,” Transactions of the Association for Computational Linguistics, vol. 3, pp. 211–225, 2015.
Google Scholar
[15] J. H. Lau and T. Baldwin, “An empirical evaluation of doc2vec with practical insights into document embedding generation,” in Proceedings of the 1st Workshop on Representation Learning for NLP, 2016, pp. 78–86.
Google Scholar
[16] M. Faruqui, Y. Tsvetkov, P. Rastogi, and C. Dyer, “Problems with evaluation of word embeddings using word similarity tasks,” in Proceedings of the 1st Workshop on Evaluating Vector Space Representations for NLP, 2016.
Google Scholar
[17] A. Drachen, R. Sifa, and C. Thurau, “The Name In the Game: Patterns in Character Names and Gamer Tags,” Entertainment Computing, vol. 5, no. 1, pp. 21–32, 2014.
Google Scholar
[18] C. Ojeda, K. Cvejoski, R. Sifa, and C. Bauckhage, “Inverse Dynamical Inheritance in Stack Exchange Taxonomies,” in Proc. of AAAI AIIDE, 2017.
Google Scholar

Download references

Author information

Authors and Affiliations

Fraunhofer IAIS, St. Augustin, Deutschland
Eduardo Brito, Rafet Sifa, Kostadin Cvejoski, César Ojeda & Christian Bauckhage
University of Bonn, Bonn, Deutschland
Rafet Sifa & Christian Bauckhage

Authors

Eduardo Brito
View author publications
You can also search for this author in PubMed Google Scholar
Rafet Sifa
View author publications
You can also search for this author in PubMed Google Scholar
Kostadin Cvejoski
View author publications
You can also search for this author in PubMed Google Scholar
César Ojeda
View author publications
You can also search for this author in PubMed Google Scholar
Christian Bauckhage
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eduardo Brito .

Editor information

Editors and Affiliations

Fachhochschule Salzburg, Puch/Salzburg, Austria
Peter Haber
its Informationstechnik & System-Management, Fachhochschule Salzburg, Puch/Salzburg, Austria
Thomas Lampoltshammer
Informationstechnik & System-Management, Fachhochschule Salzburg, Puch/Salzburg, Austria
Manfred Mayr

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brito, E., Sifa, R., Cvejoski, K., Ojeda, C., Bauckhage, C. (2017). Towards German Word Embeddings: A Use Case with Predictive Sentiment Analysis. In: Haber, P., Lampoltshammer, T., Mayr, M. (eds) Data Science – Analytics and Applications. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-19287-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-658-19287-7_8
Published: 16 September 2017
Publisher Name: Springer Vieweg, Wiesbaden
Print ISBN: 978-3-658-19286-0
Online ISBN: 978-3-658-19287-7
eBook Packages: Computer Science and Engineering (German Language)

Publish with us

Policies and ethics