Semantic word shifts in a scientific domain
- 356 Downloads
Understanding semantic word shifts in scientific domains is essential for facilitating interdisciplinary communication. Using a data set of published papers in the field of information retrieval (IR), this paper studies the semantic shifts of words in IR based on mining per-word topic distribution over time. We propose that semantic word shifts not only occur over time, but also over topics. The shifts are examined from two perspectives, the topic-level and the context-level. According to the over-time word-topic distribution, stable words and unstable words are recognized. The diverging and converging trends in the unstable type reveal characteristics of the topic evolution process. The context-level shifts are further detected by similarities between word vectors. Our work associates semantic word shifts with the evolving of topics, which facilitates a better understanding of semantic word shifts from both topics and contexts.
KeywordsWord-topic distribution Semantic shifts Semantic analysis
This work is funded by the National Natural Science Foundation of China (Grant Nos. 71420107026 and 71704138). The present study is an extended version of an article presented at the 16th International Conference on Scientometrics and Informetrics, Wuhan (China), 16–20 October 2017 (Chen et al. 2017a).
- Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In Proceedings of GSCL (pp. 31–40).Google Scholar
- Chen, B., Ding, Y., & Ma, F. (2017a). Mapping the semantic word shifts in topics in the field of information retrieval. In Proceedings of ISSI 2017—The 16th international conference on scientometrics and informetrics (pp. 1335–1341). Wuhan University, China.Google Scholar
- Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on machine learning (pp. 160–167). New York, NY, USA: ACM.Google Scholar
- Griffiths, T. L., & Steyvers, M. (2003). Prediction and semantic association. In Advances in Neural Information Processing Systems (pp. 11–18). Cambridge, MA, USA: MIT Press.Google Scholar
- Gulordava, K., & Baroni, M. (2011). A distributional similarity approach to the detection of semantic change in the Google Books Ngram Corpus. In Proceedings of the GEMS 2011 workshop on geometrical models of natural language semantics (pp. 67–71). Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
- Hamilton, W. L., Leskovec, J., & Jurafsky, D. (2016). Diachronic word embeddings reveal statistical laws of semantic change. arXiv:1605.09096 [Cs].
- Hoffman, M., Bach, F. R., & Blei, D. M. (2010). Online learning for latent dirichlet allocation. In Advances in neural information processing systems (pp. 856–864). Cambridge, MA, USA: MIT Press.Google Scholar
- Kenter, T., Wevers, M., Huijnen, P., & de Rijke, M. (2015). Ad hoc monitoring of vocabulary shifts over time. In Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 1191–1200). New York, NY, USA: ACM.Google Scholar
- Kim, Y., Chiu, Y.-I., Hanaki, K., Hegde, D., & Petrov, S. (2014). Temporal analysis of language through neural language models. arXiv:1405.3515 [Cs].
- Lehmann, W. P. (1993). Historical linguistics: An introduction (3rd edition). London; New York: Routledge.Google Scholar
- Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781 [Cs].
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems 26 (pp. 3111–3119). New York: Curran Associates Inc.Google Scholar
- Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the Lrec 2010 workshop on new challenges for Nlp Frameworks (pp. 45–50).Google Scholar
- Tang, J., Liu, J., Zhang, M., & Mei, Q. (2016). Visualizing large-scale and high-dimensional data. In Proceedings of the 25th international conference on world wide web (pp. 287–297). Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee.Google Scholar
- Wijaya, D. T., & Yeniterzi, R. (2011). Understanding semantic change of words over centuries. In Proceedings of the 2011 international workshop on detecting and exploiting cultural diversity on the social web (pp. 35–40). New York, NY, USA: ACM.Google Scholar