Constructing Semantic Hierarchies via Fusion Learning Architecture

  • Tianwen Jiang
  • Ming Liu
  • Bing QinEmail author
  • Ting Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10390)


Semantic hierarchies construction means to build structure of concepts linked by hypernym-hyponym (“is-a”) relations. A major challenge for this task is the automatic discovery of hypernym-hyponym (“is-a”) relations. We propose a fusion learning architecture based on word embeddings for constructing semantic hierarchies, composed of discriminative generative fusion architecture and a very simple lexical structure rule for assisting, getting an F1-score of 74.20% with 91.60% precision-value, outperforming the state-of-the-art methods on a manually labeled test dataset. Subsequently, combining our method with manually-built hierarchies can further improve F1-score to 82.01%. Besides, the fusion learning architecture is language-independent.


Semantic hierarchies Hypernym-hyponym relation Fusion learning architecture 



The research in this paper is supported by National Natural Science Foundation of China (No. 61632011, No. 61772156), National High-tech R&D Program (863 Program) (No. 2015AA015407).


  1. 1.
    Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)zbMATHGoogle Scholar
  2. 2.
    Che, W., Li, Z., Liu, T.: LTP: a Chinese language technology platform. In: Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, pp. 13–16. Association for Computational Linguistics (2010)Google Scholar
  3. 3.
    Dhillon, P., Foster, D.P., Ungar, L.H.: Multi-view learning of word embeddings via CCA. In: Advances in Neural Information Processing Systems, pp. 199–207 (2011)Google Scholar
  4. 4.
    Elman, J.L.: Finding structure in time. Cognit. Sci. 14(2), 179–211 (1990)CrossRefGoogle Scholar
  5. 5.
    Fu, R., Guo, J., Qin, B., Che, W., Wang, H., Liu, T.: Learning semantic hierarchies via word embeddings. In: ACL, vol. 1 pp. 1199–1209 (2014)Google Scholar
  6. 6.
    Fu, R., Qin, B., Liu, T.: Exploiting multiple sources for open-domain hypernym discovery. In: EMNLP, pp. 1224–1234 (2013)Google Scholar
  7. 7.
    Geffet, M., Dagan, I.: The distributional inclusion hypotheses and lexical entailment. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 107–114. Association for Computational Linguistics (2005)Google Scholar
  8. 8.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 2, pp. 539–545. Association for Computational Linguistics (1992)Google Scholar
  9. 9.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  10. 10.
    Jordan, M.I.: Serial order: a parallel distributed processing approach. Adv. Psychol. 121, 471–495 (1997)CrossRefGoogle Scholar
  11. 11.
    Kotlerman, L., Dagan, I., Szpektor, I., Zhitomirsky-Geffet, M.: Directional distributional similarity for lexical inference. Natural Lang. Eng. 16(04), 359–389 (2010)CrossRefGoogle Scholar
  12. 12.
    Lenci, A., Benotto, G.: Identifying hypernyms in distributional semantic spaces. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol. 1 - Proceedings of the Main Conference and the Shared Task, and vol. 2 - Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 75–79. Association for Computational Linguistics (2012)Google Scholar
  13. 13.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint (2013). arXiv:1301.3781
  14. 14.
    Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech. vol. 2, p. 3 (2010)Google Scholar
  15. 15.
    Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL, vol. 13, pp. 746–751 (2013)Google Scholar
  16. 16.
    Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  17. 17.
    Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009)Google Scholar
  18. 18.
    Rosenblatt, F.: Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Technical report, DTIC Document (1961)Google Scholar
  19. 19.
    Shwartz, V., Goldberg, Y., Dagan, I.: Improving hypernymy detection with an integrated path-based and distributional method. arXiv preprint (2016). arXiv:1603.06076
  20. 20.
    Siegel, S., Castellan Jr., N.J.: Nonparametric Statistics for the Behavioral Sciences, 2nd edn. McGraw-HiU Book Company, New York (1988)Google Scholar
  21. 21.
    Snow, R., Jurafsky, D., Ng, A.Y.: Learning syntactic patterns for automatic hypernym discovery. In: Advances in Neural Information Processing Systems, vol. 17 (2004)Google Scholar
  22. 22.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706. ACM (2007)Google Scholar
  23. 23.
    Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint (2012). arXiv:1212.5701
  24. 24.
    Zhitomirsky-Geffet, M., Dagan, I.: Bootstrapping distributional feature vector quality. Comput. Linguist. 35(3), 435–461 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations