k-NN Embedding Stability for word2vec Hyper-Parametrisation in Scientific Text

  • Amna DridiEmail author
  • Mohamed Medhat Gaber
  • R. Muhammad Atif Azad
  • Jagdev Bhogal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11198)


Word embeddings are increasingly attracting the attention of researchers dealing with semantic similarity and analogy tasks. However, finding the optimal hyper-parameters remains an important challenge due to the resulting impact on the revealed analogies mainly for domain-specific corpora. While analogies are highly used for hypotheses synthesis, it is crucial to optimise word embedding hyper-parameters for precise hypothesis synthesis. Therefore, we propose, in this paper, a methodological approach for tuning word embedding hyper-parameters by using the stability of k-nearest neighbors of word vectors within scientific corpora and more specifically Computer Science corpora with Machine learning adopted as a case study. This approach is tested on a dataset created from NIPS (Conference on Neural Information Processing Systems) publications, and evaluated with a curated ACM hierarchy and Wikipedia Machine Learning outline as the gold standard. Our quantitative and qualitative analysis indicate that our approach not only reliably captures interesting patterns like “unsupervised_learning is to kmeans as supervised_learning is to knn”, but also captures the analogical hierarchy structure of Machine Learning and consistently outperforms the \(61\%\) sate-of-the-art embeddings on syntactic accuracy with \(68\%\).


Word embedding Word2vec Skip-gram Hyper-parameters k-NN stability ACM hierarchy Wikipedia outline NIPS 


  1. 1.
    Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)zbMATHGoogle Scholar
  2. 2.
    Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRefGoogle Scholar
  3. 3.
    Heffernan, K., Teufel, S.: Identifying problems and solutions in scientific text. Scientometrics (2018)Google Scholar
  4. 4.
    Hutter, F., Hoos, H., Leyton-Brown, K.: An efficient approach for assessing hyperparameter importance. In: 31st International Conference on Machine Learning, pp. 754–762 (2014)Google Scholar
  5. 5.
    Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From word embeddings to document distances. In: 32Nd International Conference on Machine Learning, pp. 957–966 (2015)Google Scholar
  6. 6.
    Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: 52nd Annual Meeting of the Association for Computational Linguistics, pp. 302–308 (2014)Google Scholar
  7. 7.
    Lu, W., Huang, Y., Bu, Y., Cheng, Q.: Functional structure identification of scientific documents in computer science. Scientometrics 115(1), 463–486 (2018). AprCrossRefGoogle Scholar
  8. 8.
    Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28(2), 203–208 (1996)CrossRefGoogle Scholar
  9. 9.
    van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar
  10. 10.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefGoogle Scholar
  11. 11.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/ arXiv:1301.3781 (2013)
  12. 12.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: 26th International Conference on Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  13. 13.
    Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL, pp. 746–751 (2013)Google Scholar
  14. 14.
    Miñarro-Giménez, J.A., Marín-Alonso, O., Samwald, M.: Applying deep learning techniques on medical corpora from the world wide web: a prototypical system and evaluation. CoRR abs/ arXiv:1502.03682 (2015)
  15. 15.
    Meinshausen, Nicolai: Peter Bhlmann: Stability selection. J. R. Stat. Soc. 72(4), 417–473 (2010)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. EMNLP 14, 1532–1543 (2014)Google Scholar
  17. 17.
    Petterson, J., Buntine, W., Narayanamurthy, S.M., Caetano, T.S., Smola, A.J.: Word features for latent dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 1921–1929 (2010)Google Scholar
  18. 18.
    Rinaldo, A., Singh, A., Nugent, R., Wasserman, L.: Stability of density-based clustering. J. Mach. Learn. Res. 13(1), 905–948 (2012). AprMathSciNetzbMATHGoogle Scholar
  19. 19.
    dos Santos, C.N., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: COLING, pp. 69–78 (2014)Google Scholar
  20. 20.
    Turian, J., Ratinov, L., Bengio, Y.: Word representations: A simple and general method for semi-supervised learning. In: 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)Google Scholar
  21. 21.
    Yao, Z., Sun, Y., Ding, W., Rao, N., Xiong, H.: Dynamic word embeddings for evolving semantic discovery. In: 11th ACM International Conference on Web Search and Data Mining, pp. 673–681 (2018)Google Scholar
  22. 22.
    Zhao, S., Zhang, D., Duan, Z., Chen, J., Zhang, Y.p., Tang, J.: A novel classification method for paper-reviewer recommendation. Scientometrics, pp. 1–21 (Mar 2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Amna Dridi
    • 1
    Email author
  • Mohamed Medhat Gaber
    • 1
  • R. Muhammad Atif Azad
    • 1
  • Jagdev Bhogal
    • 1
  1. 1.School of Computing and Digital TechnologyBirmingham City UniversityBirminghamUnited Kingdom

Personalised recommendations