Information Retrieval Journal

, Volume 22, Issue 6, pp 525–542 | Cite as

Beyond word embeddings: learning entity and concept representations from large scale knowledge bases

  • Walid ShalabyEmail author
  • Wlodek Zadrozny
  • Hongxia Jin


Text representations using neural word embeddings have proven effective in many NLP applications. Recent researches adapt the traditional word embedding models to learn vectors of multiword expressions (concepts/entities). However, these methods are limited to textual knowledge bases (e.g., Wikipedia). In this paper, we propose a novel and simple technique for integrating the knowledge about concepts from two large scale knowledge bases of different structure (Wikipedia and Probase) in order to learn concept representations. We adapt the efficient skip-gram model to seamlessly learn from the knowledge in Wikipedia text and Probase concept graph. We evaluate our concept embedding models on two tasks: (1) analogical reasoning, where we achieve a state-of-the-art performance of 91% on semantic analogies, (2) concept categorization, where we achieve a state-of-the-art performance on two benchmark datasets achieving categorization accuracy of 100% on one and 98% on the other. Additionally, we present a case study to evaluate our model on unsupervised argument type identification for neural semantic parsing. We demonstrate the competitive accuracy of our unsupervised method and its ability to better generalize to out of vocabulary entity mentions compared to the tedious and error prone methods which depend on gazetteers and regular expressions.


Entity and concept embeddings Entity identification Concept categorization Skip-gram Probase Knowledge graph representations 



This work was partially supported by the National Science Foundation under Grant Number 1624035. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. The authors would like to thank Avik Ray and Yilin Shen from Samsung Research America for their constructive feedback and discussions while developing the case study on the argument type identification task. The authors also appreciate the reviewers valuable and profound comments.


  1. Baroni, M., & Lenci, A. (2010). Distributional memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4), 673–721.CrossRefGoogle Scholar
  2. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems (pp. 2787–2795).Google Scholar
  3. Camacho-Collados, J., Pilehvar, M. T., & Navigli, R. (2016). Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artificial Intelligence, 240, 36–64.MathSciNetCrossRefGoogle Scholar
  4. Cao, Y., Huang, L., Ji, H., Chen, X., & Li, J. (2017). Bridge text and knowledge by learning multi-prototype entity mention embedding. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers) (Vol. 1, pp. 1623–1633).Google Scholar
  5. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537.zbMATHGoogle Scholar
  6. Dong, L., & Lapata, M. (2016). Language to logical form with neural attention. ArXiv preprint arXiv:160101280.
  7. Fang, W., Zhang, J., Wang, D., Chen, Z., & Li, M. (2016). Entity disambiguation by knowledge and text jointly embedding. In Proceedings of the 20th SIGNLL conference on computational natural language learning (pp. 260–269).Google Scholar
  8. Hu, Z., Huang, P., Deng, Y., Gao, Y., & Xing, E. P. (2015). Entity hierarchy embedding. In Proceedings of The 53rd annual meeting of the association for computational linguistics.Google Scholar
  9. Hua, W., Wang, Z., Wang, H., Zheng, K., & Zhou, X. (2015). Short text understanding through lexical-semantic analysis. In 2015 IEEE 31st international conference on data engineering (ICDE) (pp. 495–506).Google Scholar
  10. Iacobacci, I., Pilehvar, M. T., & Navigli, R. (2015). SensEmbed: Learning sense embeddings for word and relational similarity. In ACL (1) (pp. 95–105).Google Scholar
  11. Kim, D., Wang, H., & Oh, A. H. (2013). Context-dependent conceptualization. In IJCAI (pp. 2330–2336).Google Scholar
  12. Levy, O., & Goldberg, Y. (2014). Dependency-based word embeddings. In Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 2: short papers) (Vol. 2, pp. 302–308).Google Scholar
  13. Li, Y., Zheng, R., Tian, T., Hu, Z., Iyer, R., & Sycara, K. (2016). Joint embedding of hierarchical categories and entities for concept categorization and dataless classification. ArXiv preprint arXiv:160707956.
  14. Mancini, M., Camacho-Collados, J., Iacobacci, I., Navigli, R. (2016). Embedding words and senses together via joint knowledge-enhanced training. ArXiv preprint arXiv:161202703.
  15. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space. ArXiv preprint arXiv:13013781.
  16. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).Google Scholar
  17. Mikolov, T., Yih, W. T., & Zweig, G. (2013c). Linguistic regularities in continuous space word representations. In HLT-NAACL (Vol. 13, pp. 746–751).Google Scholar
  18. Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014), 12, 1532–1543.CrossRefGoogle Scholar
  19. Phan, M. C., Sun, A., Tay, Y., Han, J., & Li, C. (2017). NeuPL: Attention-based semantic matching and pair-linking for entity disambiguation. In Proceedings of the 2017 ACM on conference on information and knowledge management (pp. 1667–1676).Google Scholar
  20. Ristoski, P., & Paulheim, H. (2016). RDF2Vec: RDF graph embeddings for data mining. In International semantic web conference (pp. 498–514). Springer.Google Scholar
  21. Rocchio, J. J. (1971). Relevance feedback in information retrieval. In The SMART retrieval system: Experiments in automatic document processing (pp. 313–323). Prentice-Hall Inc.Google Scholar
  22. Shalaby, W., & Zadrozny, W. (2017). Learning concept embeddings for efficient bag-of-concepts densification. ArXiv preprint arXiv:170203342.
  23. Song, Y., & Roth, D. (2015). Unsupervised sparse vector densification for short text similarity. In Proceedings of NAACL.Google Scholar
  24. Song, Y., Wang, H., Wang, Z., Li, H., & Chen, W. (2011). Short text conceptualization using a probabilistic knowledgebase. In Proceedings of the twenty-second international joint conference on artificial intelligence-volume volume three (pp. 2330–2336). AAAI PressGoogle Scholar
  25. Song, Y., Wang, S., & Wang, H. (2015). Open domain short text conceptualization: A generative+ descriptive modeling approach. In IJCAI (pp. 3820–3826).Google Scholar
  26. Wang, Y., Berant, J., Liang, P., et al. (2015a). Building a semantic parser overnight. In ACL (1) (pp. 1332–1342).Google Scholar
  27. Wang, Z., Zhang, J., Feng, J., & Chen, Z. (2014). Knowledge graph and text jointly embedding. EMNLP, 14, 1591–1601.Google Scholar
  28. Wang, Z., Zhao, K., Wang, H., Meng, X., & Wen, J. R. (2015b). Query understanding through knowledge-based conceptualization.Google Scholar
  29. Yamada, I., Shindo, H., Takeda, H., & Takefuji, Y. (2016). Joint learning of the embedding of words and entities for named entity disambiguation. ArXiv preprint arXiv:160101343.
  30. Zettlemoyer, L. S., & Collins, M. (2012). Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. ArXiv preprint arXiv:12071420.
  31. Zwicklbauer, S., Seifert, C., & Granitzer, M. (2016). Robust and collective entity disambiguation through semantic embeddings. In Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval (pp. 425–434).Google Scholar

Copyright information

© Springer Nature B.V. 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of North Carolina at CharlotteCharlotteUSA
  2. 2.Samsung Research AmericaMountain ViewUSA

Personalised recommendations