Abstract
Understanding semantic word shifts in scientific domains is essential for facilitating interdisciplinary communication. Using a data set of published papers in the field of information retrieval (IR), this paper studies the semantic shifts of words in IR based on mining per-word topic distribution over time. We propose that semantic word shifts not only occur over time, but also over topics. The shifts are examined from two perspectives, the topic-level and the context-level. According to the over-time word-topic distribution, stable words and unstable words are recognized. The diverging and converging trends in the unstable type reveal characteristics of the topic evolution process. The context-level shifts are further detected by similarities between word vectors. Our work associates semantic word shifts with the evolving of topics, which facilitates a better understanding of semantic word shifts from both topics and contexts.
Similar content being viewed by others
References
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.
Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In Proceedings of GSCL (pp. 31–40).
Chen, B., Ding, Y., & Ma, F. (2017a). Mapping the semantic word shifts in topics in the field of information retrieval. In Proceedings of ISSI 2017—The 16th international conference on scientometrics and informetrics (pp. 1335–1341). Wuhan University, China.
Chen, B., Tsutsui, S., Ding, Y., & Ma, F. (2017b). Understanding the topic evolution in a scientific domain: an exploratory study for the field of information retrieval. Journal of Informetrics, 11(4), 1175–1189.
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on machine learning (pp. 160–167). New York, NY, USA: ACM.
Ding, Y., & Stirling, K. (2016). Data-driven discovery: a new era of exploiting the literature and data. Journal of Data and Information Science, 1(4), 1–9.
Griffiths, T. L., & Steyvers, M. (2003). Prediction and semantic association. In Advances in Neural Information Processing Systems (pp. 11–18). Cambridge, MA, USA: MIT Press.
Gulordava, K., & Baroni, M. (2011). A distributional similarity approach to the detection of semantic change in the Google Books Ngram Corpus. In Proceedings of the GEMS 2011 workshop on geometrical models of natural language semantics (pp. 67–71). Stroudsburg, PA, USA: Association for Computational Linguistics.
Hamilton, W. L., Leskovec, J., & Jurafsky, D. (2016). Diachronic word embeddings reveal statistical laws of semantic change. arXiv:1605.09096 [Cs].
Harris, Z. S. (1954). Distributional structure. Word, 10, 146–162.
Hoffman, M., Bach, F. R., & Blei, D. M. (2010). Online learning for latent dirichlet allocation. In Advances in neural information processing systems (pp. 856–864). Cambridge, MA, USA: MIT Press.
Kenter, T., Wevers, M., Huijnen, P., & de Rijke, M. (2015). Ad hoc monitoring of vocabulary shifts over time. In Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 1191–1200). New York, NY, USA: ACM.
Kim, Y., Chiu, Y.-I., Hanaki, K., Hegde, D., & Petrov, S. (2014). Temporal analysis of language through neural language models. arXiv:1405.3515 [Cs].
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211.
Lehmann, W. P. (1993). Historical linguistics: An introduction (3rd edition). London; New York: Routledge.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781 [Cs].
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems 26 (pp. 3111–3119). New York: Curran Associates Inc.
Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the Lrec 2010 workshop on new challenges for Nlp Frameworks (pp. 45–50).
Tang, J., Liu, J., Zhang, M., & Mei, Q. (2016). Visualizing large-scale and high-dimensional data. In Proceedings of the 25th international conference on world wide web (pp. 287–297). Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee.
Wang, S., Schlobach, S., & Klein, M. (2011). Concept drift and how to identify it. Web Semantics: Science, Services and Agents on the World Wide Web, 9(3), 247–265.
Wijaya, D. T., & Yeniterzi, R. (2011). Understanding semantic change of words over centuries. In Proceedings of the 2011 international workshop on detecting and exploiting cultural diversity on the social web (pp. 35–40). New York, NY, USA: ACM.
Xu, J., Ding, Y., & Malic, V. (2015). Author credit for transdisciplinary collaboration. PLoS ONE, 10(9), e0137968.
Yan, E., Ding, Y., Milojević, S., & Sugimoto, C. R. (2012). Topics in dynamic research communities: an exploratory study for the field of information retrieval. Journal of Informetrics, 6(1), 140–153.
Acknowledgements
This work is funded by the National Natural Science Foundation of China (Grant Nos. 71420107026 and 71704138). The present study is an extended version of an article presented at the 16th International Conference on Scientometrics and Informetrics, Wuhan (China), 16–20 October 2017 (Chen et al. 2017a).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, B., Ding, Y. & Ma, F. Semantic word shifts in a scientific domain. Scientometrics 117, 211–226 (2018). https://doi.org/10.1007/s11192-018-2843-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-018-2843-2