Skip to main content

Improving Word Embeddings by Emphasizing Co-hyponyms

  • Conference paper
  • First Online:
  • 1341 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11242))

Abstract

Word embeddings are powerful for capturing semantic similarity between words in a vocabulary. They have been demonstrated beneficial to various natural language processing tasks such as language modeling, part-of-speech tagging and machine translation, etc. Existing embedding methods derive word vectors from the co-occurrence statistics of target-context word pairs. They treat the context words of a target word equally, while not all contexts are created equal for a target. Some recent work learns non-uniform weights of the contexts for predicting the target, while none of them take the semantic relation types of target-context pairs into consideration. This paper observes co-hyponyms usually have similar contexts and can be substitutes of one another. To this end, this paper proposes a simple but effective method to improve word embeddings. It automatically identifies possible co-hyponyms within the context window and optimizes the embeddings of co-hyponyms to be close directly. Compared to 3 state-of-the-art neural embedding models, the proposed model performs better on several datasets of different languages in terms of the human similarity judgement and the language modeling tasks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://dumps.wikimedia.org.

  2. 2.

    http://mattmahoney.net/dc/textdata.html.

  3. 3.

    https://bothameister.github.io/.

  4. 4.

    https://github.com/facebookresearch/fastText.

References

  1. Ballesteros, M., Dyer, C., Smith, N.A.: Improved transition-based parsing by modeling characters instead of words with LSTMs. In: EMNLP, pp. 349–359 (2015)

    Google Scholar 

  2. Baroni, M., Lenci, A.: Distributional memory: a general framework for corpus-based semantics. Comput. Linguist. 36(4), 673–721 (2010)

    Article  Google Scholar 

  3. Baskaya, O., Sert, E., Cirik, V., Yuret, D.: AI-KU: using substitute vectors and co-occurrence modeling for word sense induction and disambiguation. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 300–306 (2013)

    Google Scholar 

  4. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(2), 1137–1155 (2003)

    MATH  Google Scholar 

  5. Bojanowski, P., Joulin, A., Mikolov, T.: Alternative structures for character-level RNNs. In: Workshop on International Conference of Learning Representation (2016)

    Google Scholar 

  6. Botha, J., Blunsom, P.: Compositional morphology for word representations and language modelling. In: International Conference on Machine Learning, pp. 1899–1907 (2014)

    Google Scholar 

  7. Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behav. Res. Methods 44(3), 890–907 (2012)

    Article  Google Scholar 

  8. Cirik, V., Yuret, D.: Substitute based SCODE word embeddings in supervised NLP tasks. arXiv preprint arXiv:1407.6853 (2014)

  9. Cohen, R., Goldberg, Y., Elhadad, M.: Domain adaptation of a dependency parser with a class-class selectional preference model. In: Proceedings of ACL 2012 Student Research Workshop, pp. 43–48. Association for Computational Linguistics (2012)

    Google Scholar 

  10. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: ICML, pp. 160–167. ACM (2008)

    Google Scholar 

  11. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)

    MATH  Google Scholar 

  12. Dos Santos, C.N., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: COLING, pp. 69–78 (2014)

    Google Scholar 

  13. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  14. Finkelstein, L., et al.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, pp. 406–414. ACM (2001)

    Google Scholar 

  15. Gurevych, I.: Using the structure of a conceptual network in computing semantic relatedness. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 767–778. Springer, Heidelberg (2005). https://doi.org/10.1007/11562214_67

    Chapter  Google Scholar 

  16. Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)

    Article  Google Scholar 

  17. Hassan, S., Mihalcea, R.: Cross-lingual semantic relatedness using encyclopedic knowledge. In: Conference on Empirical Methods in Natural Language Processing: Volume, pp. 1192–1201 (2009)

    Google Scholar 

  18. Joubarne, C., Inkpen, D.: Comparison of semantic similarity for different languages using the Google n-gram corpus and second-order co-occurrence measures. In: Butz, C., Lingras, P. (eds.) AI 2011. LNCS (LNAI), vol. 6657, pp. 216–221. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21043-3_26

    Chapter  Google Scholar 

  19. Jurgens, D., Klapaftis, I.: Semeval-2013 task 13: word sense induction for graded and non-graded senses. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 290–299 (2013)

    Google Scholar 

  20. Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, pp. 957–966 (2015)

    Google Scholar 

  21. Ling, W., et al.: Not all contexts are created equal: better word representations with variable attention. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1367–1372 (2015)

    Google Scholar 

  22. Liu, L., Ruiz, F., Athey, S., Blei, D.: Context selection for embedding models. In: NIPS, pp. 4819–4828 (2017)

    Google Scholar 

  23. Luong, T., Socher, R., Manning, C.D.: Better word representations with recursive neural networks for morphology. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 104–113 (2013)

    Google Scholar 

  24. Maron, Y., Lamar, M., Bienenstock, E.: Sphere embedding: an application to part-of-speech induction. In: Advances in Neural Information Processing Systems, pp. 1567–1575 (2010)

    Google Scholar 

  25. Melamud, O., McClosky, D., Patwardhan, S., Bansal, M.: The role of context types and dimensionality in learning word embeddings. In: NAACL, pp. 1030–1040 (2016)

    Google Scholar 

  26. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)

    Google Scholar 

  27. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  28. Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009)

    Google Scholar 

  29. Panchenko, A., et al.: Human and machine judgements for Russian semantic relatedness. AIST 2016. CCIS, vol. 661, pp. 221–235. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52920-2_21

    Chapter  Google Scholar 

  30. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. In: Annual Meeting of the Association for Computational Linguistics, pp. 135–146 (2017)

    Article  Google Scholar 

  31. Ritter, A., Mausam, Etzioni, O.: A latent Dirichlet allocation method for selectional preferences. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 424–434. Association for Computational Linguistics (2010)

    Google Scholar 

  32. Rudolph, M., Ruiz, F., Mandt, S., Blei, D.: Exponential family embeddings. In: Advances in Neural Information Processing Systems, pp. 478–486 (2016)

    Google Scholar 

  33. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)

    MATH  Google Scholar 

  34. Santos, C.D., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: ICML, pp. 1818–1826 (2014)

    Google Scholar 

  35. Séaghdha, D.O.: Latent variable models of selectional preference. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 435–444. Association for Computational Linguistics (2010)

    Google Scholar 

  36. Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)

    Article  Google Scholar 

  37. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)

    Article  MathSciNet  Google Scholar 

  38. Yatbaz, M.A., Sert, E., Yuret, D.: Learning syntactic categories using paradigmatic representations of word context. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 940–951. Association for Computational Linguistics (2012)

    Google Scholar 

  39. Zesch, T., Gurevych, I.: Automatically creating datasets for measures of semantic relatedness. In: The Workshop on Linguistic Distances, pp. 16–24 (2006)

    Google Scholar 

Download references

Acknowledgements

This research is supported by National Natural Science Foundation of China (No. 61772289), Natural Science Foundation of Tianjin (No. 16JCQNJC00500) and Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cai, X., Luo, Y., Zhang, Y., Yuan, X. (2018). Improving Word Embeddings by Emphasizing Co-hyponyms. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds) Web Information Systems and Applications. WISA 2018. Lecture Notes in Computer Science(), vol 11242. Springer, Cham. https://doi.org/10.1007/978-3-030-02934-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02934-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02933-3

  • Online ISBN: 978-3-030-02934-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics