Advertisement

Improving Word Embeddings by Emphasizing Co-hyponyms

  • Xiangrui Cai
  • Yonghong Luo
  • Ying ZhangEmail author
  • Xiaojie Yuan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11242)

Abstract

Word embeddings are powerful for capturing semantic similarity between words in a vocabulary. They have been demonstrated beneficial to various natural language processing tasks such as language modeling, part-of-speech tagging and machine translation, etc. Existing embedding methods derive word vectors from the co-occurrence statistics of target-context word pairs. They treat the context words of a target word equally, while not all contexts are created equal for a target. Some recent work learns non-uniform weights of the contexts for predicting the target, while none of them take the semantic relation types of target-context pairs into consideration. This paper observes co-hyponyms usually have similar contexts and can be substitutes of one another. To this end, this paper proposes a simple but effective method to improve word embeddings. It automatically identifies possible co-hyponyms within the context window and optimizes the embeddings of co-hyponyms to be close directly. Compared to 3 state-of-the-art neural embedding models, the proposed model performs better on several datasets of different languages in terms of the human similarity judgement and the language modeling tasks.

Notes

Acknowledgements

This research is supported by National Natural Science Foundation of China (No. 61772289), Natural Science Foundation of Tianjin (No. 16JCQNJC00500) and Fundamental Research Funds for the Central Universities.

References

  1. 1.
    Ballesteros, M., Dyer, C., Smith, N.A.: Improved transition-based parsing by modeling characters instead of words with LSTMs. In: EMNLP, pp. 349–359 (2015)Google Scholar
  2. 2.
    Baroni, M., Lenci, A.: Distributional memory: a general framework for corpus-based semantics. Comput. Linguist. 36(4), 673–721 (2010)CrossRefGoogle Scholar
  3. 3.
    Baskaya, O., Sert, E., Cirik, V., Yuret, D.: AI-KU: using substitute vectors and co-occurrence modeling for word sense induction and disambiguation. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 300–306 (2013)Google Scholar
  4. 4.
    Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(2), 1137–1155 (2003)zbMATHGoogle Scholar
  5. 5.
    Bojanowski, P., Joulin, A., Mikolov, T.: Alternative structures for character-level RNNs. In: Workshop on International Conference of Learning Representation (2016)Google Scholar
  6. 6.
    Botha, J., Blunsom, P.: Compositional morphology for word representations and language modelling. In: International Conference on Machine Learning, pp. 1899–1907 (2014)Google Scholar
  7. 7.
    Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behav. Res. Methods 44(3), 890–907 (2012)CrossRefGoogle Scholar
  8. 8.
    Cirik, V., Yuret, D.: Substitute based SCODE word embeddings in supervised NLP tasks. arXiv preprint arXiv:1407.6853 (2014)
  9. 9.
    Cohen, R., Goldberg, Y., Elhadad, M.: Domain adaptation of a dependency parser with a class-class selectional preference model. In: Proceedings of ACL 2012 Student Research Workshop, pp. 43–48. Association for Computational Linguistics (2012)Google Scholar
  10. 10.
    Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: ICML, pp. 160–167. ACM (2008)Google Scholar
  11. 11.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)zbMATHGoogle Scholar
  12. 12.
    Dos Santos, C.N., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: COLING, pp. 69–78 (2014)Google Scholar
  13. 13.
    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Finkelstein, L., et al.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, pp. 406–414. ACM (2001)Google Scholar
  15. 15.
    Gurevych, I.: Using the structure of a conceptual network in computing semantic relatedness. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 767–778. Springer, Heidelberg (2005).  https://doi.org/10.1007/11562214_67CrossRefGoogle Scholar
  16. 16.
    Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)CrossRefGoogle Scholar
  17. 17.
    Hassan, S., Mihalcea, R.: Cross-lingual semantic relatedness using encyclopedic knowledge. In: Conference on Empirical Methods in Natural Language Processing: Volume, pp. 1192–1201 (2009)Google Scholar
  18. 18.
    Joubarne, C., Inkpen, D.: Comparison of semantic similarity for different languages using the Google n-gram corpus and second-order co-occurrence measures. In: Butz, C., Lingras, P. (eds.) AI 2011. LNCS (LNAI), vol. 6657, pp. 216–221. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-21043-3_26CrossRefGoogle Scholar
  19. 19.
    Jurgens, D., Klapaftis, I.: Semeval-2013 task 13: word sense induction for graded and non-graded senses. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 290–299 (2013)Google Scholar
  20. 20.
    Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, pp. 957–966 (2015)Google Scholar
  21. 21.
    Ling, W., et al.: Not all contexts are created equal: better word representations with variable attention. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1367–1372 (2015)Google Scholar
  22. 22.
    Liu, L., Ruiz, F., Athey, S., Blei, D.: Context selection for embedding models. In: NIPS, pp. 4819–4828 (2017)Google Scholar
  23. 23.
    Luong, T., Socher, R., Manning, C.D.: Better word representations with recursive neural networks for morphology. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 104–113 (2013)Google Scholar
  24. 24.
    Maron, Y., Lamar, M., Bienenstock, E.: Sphere embedding: an application to part-of-speech induction. In: Advances in Neural Information Processing Systems, pp. 1567–1575 (2010)Google Scholar
  25. 25.
    Melamud, O., McClosky, D., Patwardhan, S., Bansal, M.: The role of context types and dimensionality in learning word embeddings. In: NAACL, pp. 1030–1040 (2016)Google Scholar
  26. 26.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)Google Scholar
  27. 27.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  28. 28.
    Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009)Google Scholar
  29. 29.
    Panchenko, A., et al.: Human and machine judgements for Russian semantic relatedness. AIST 2016. CCIS, vol. 661, pp. 221–235. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-52920-2_21CrossRefGoogle Scholar
  30. 30.
    Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. In: Annual Meeting of the Association for Computational Linguistics, pp. 135–146 (2017)CrossRefGoogle Scholar
  31. 31.
    Ritter, A., Mausam, Etzioni, O.: A latent Dirichlet allocation method for selectional preferences. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 424–434. Association for Computational Linguistics (2010)Google Scholar
  32. 32.
    Rudolph, M., Ruiz, F., Mandt, S., Blei, D.: Exponential family embeddings. In: Advances in Neural Information Processing Systems, pp. 478–486 (2016)Google Scholar
  33. 33.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)zbMATHGoogle Scholar
  34. 34.
    Santos, C.D., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: ICML, pp. 1818–1826 (2014)Google Scholar
  35. 35.
    Séaghdha, D.O.: Latent variable models of selectional preference. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 435–444. Association for Computational Linguistics (2010)Google Scholar
  36. 36.
    Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)CrossRefGoogle Scholar
  37. 37.
    Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Yatbaz, M.A., Sert, E., Yuret, D.: Learning syntactic categories using paradigmatic representations of word context. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 940–951. Association for Computational Linguistics (2012)Google Scholar
  39. 39.
    Zesch, T., Gurevych, I.: Automatically creating datasets for measures of semantic relatedness. In: The Workshop on Linguistic Distances, pp. 16–24 (2006)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Xiangrui Cai
    • 1
  • Yonghong Luo
    • 1
  • Ying Zhang
    • 1
    Email author
  • Xiaojie Yuan
    • 1
  1. 1.College of Computer ScienceNankai UniversityTianjinChina

Personalised recommendations