Zusammenfassung
Trotz des Forschungsbooms im Bereich Worteinbettungen und ihrer Textmininganwendungen der letzten Jahre, konzentriert sich der Großteil der Publikationen ausschließlich auf die englische Sprache. Außerdem ist die Hyperparameterabstimmung ein Prozess, der selten gut dokumentiert (speziell für nicht-englische Texte), jedoch sehr wichtig ist, um hochqualitative Wortwiedergaben zu erhalten. In dieser Arbeit zeigen wir, wie verschiedene Hyperparameterkombinationen Einfluss auf die resultierenden deutschen Wortvektoren haben und wie diese Wortwiedergaben Teil eines komplexeren Modells sein können. Im Einzelnen führen wir als erstes eine intrinsische Bewertung unserer deutschen Worteinbettungen durch, die später in einem vorausschauenden Stimmungsanalysemodell verwendet werden. Letzteres dient nicht nur einer intrinsischen Bewertung der deutschen Worteinbettungen, sondern zeigt außerdem, ob Kundenwünsche nur durch das Einbetten von Dokumenten vorhergesagt werden können.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Literatur
[1] W. Y. Zou, R. Socher, D. M. Cer, and C. D. Manning, “Bilingual word embeddings for phrase-based machine translation.” in EMNLP, 2013, pp. 1393–1398.
[2] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119.
[3] A. Passos, V. Kumar, and A. McCallum, “Lexicon infused phrase embeddings for named entity resolution,” in Conference on Computational Natural Language Learning (CoNLL), 2014.
[4] G. Faaß and K. Eckart, “SdeWaC–a corpus of parsable sentences from the web,” in Language processing and knowledge in the Web. Springer, 2013, pp. 61–68.
[5] M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta, “The WaCky Wide Web: A collection of very large linguistically processed webcrawled corpora,” Language Resources and Evaluation, vol. 43, no. 3, pp. 209–226, September 2009.
[6] Q. V. Le and T. Mikolov, “Distributed representations of sentences and documents.” in ICML, vol. 14, 2014, pp. 1188–1196.
[7] Z. S. Harris, “Distributional structure,” Word, vol. 10, no. 2-3, pp. 146–162, 1954.
[8] J. Turian, L. Ratinov, and Y. Bengio, “Word representations: A simple and general method for semi-supervised learning,” in 48th Annual Meeting of the Association for Computational Linguistics, A. for Computational Linguistics, Ed., 2010, pp. 384–394.
[9] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 2013.
[10] M. S¨anger, U. Leser, S. Kemmerer, P. Adolphs, and R. Klinger, “SCARE – The Sentiment Corpus of App Reviews with Fine-grained Annotations in German,” in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC16). Portorož, Slovenia: European Language Resources Association (ELRA), May 2016.
[11] I. Gurevych, “Using the structure of a conceptual network in computing semantic relatedness,” in International Conference on Natural Language Processing. Springer,2005,pp.767–778.
[12] T. Zesch and I. Gurevych, “Automatically creating datasets for measures of semantic relatedness,” in Proceedings of the Workshop on Linguistic Distances. AssociationforComputationalLinguistics,2006,pp.16–24.
[13] C. Spearman, “The proof and measurement of association between two things,” The American journal of psychology, vol. 15, no. 1, pp. 72–101, 1904.
[14] O. Levy, Y. Goldberg, and I. Dagan, “Improving distributional similarity with lessons learned from word embeddings,” Transactions of the Association for Computational Linguistics, vol. 3, pp. 211–225, 2015.
[15] J. H. Lau and T. Baldwin, “An empirical evaluation of doc2vec with practical insights into document embedding generation,” in Proceedings of the 1st Workshop on Representation Learning for NLP, 2016, pp. 78–86.
[16] M. Faruqui, Y. Tsvetkov, P. Rastogi, and C. Dyer, “Problems with evaluation of word embeddings using word similarity tasks,” in Proceedings of the 1st Workshop on Evaluating Vector Space Representations for NLP, 2016.
[17] A. Drachen, R. Sifa, and C. Thurau, “The Name In the Game: Patterns in Character Names and Gamer Tags,” Entertainment Computing, vol. 5, no. 1, pp. 21–32, 2014.
[18] C. Ojeda, K. Cvejoski, R. Sifa, and C. Bauckhage, “Inverse Dynamical Inheritance in Stack Exchange Taxonomies,” in Proc. of AAAI AIIDE, 2017.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Fachmedien Wiesbaden GmbH
About this paper
Cite this paper
Brito, E., Sifa, R., Cvejoski, K., Ojeda, C., Bauckhage, C. (2017). Towards German Word Embeddings: A Use Case with Predictive Sentiment Analysis. In: Haber, P., Lampoltshammer, T., Mayr, M. (eds) Data Science – Analytics and Applications. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-19287-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-658-19287-7_8
Published:
Publisher Name: Springer Vieweg, Wiesbaden
Print ISBN: 978-3-658-19286-0
Online ISBN: 978-3-658-19287-7
eBook Packages: Computer Science and Engineering (German Language)