Skip to main content

Exploring Semantic Change of Chinese Word Using Crawled Web Data

  • Conference paper
  • First Online:
Book cover Web Engineering (ICWE 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11496))

Included in the following conference series:

  • 1700 Accesses

Abstract

Words changing their meanings over time reflects various shifts in socio-cultural attitudes and conceptual structures. Understanding semantic change of words over time is important in order to study models of language and cultural evolution. Word embeddings methods such as PPMI, SVD and word2vec have been evaluated in recent years. These kinds of representation methods, sometimes referring as semantic maps of words, are able to facilitate the whole process of language processing. Chinese language is no exception. The development of technology gradually influences people’s communication and the language they are using. In the paper, a huge amount of data (300 GB) is provided by Sogou, a Chinese web search engine provider. After pre-processing, the Chinese language corpus is obtained. Three different word representation methods are extended to including temporal information. They are trained and tested based on the above dataset. A thorough analysis (both qualitative and quantitative analysis) is conducted with different thresholds to capture different semantic accuracy and alignment quality of the shifted words. A comparison between three methods is provided and possible reasons behind experiment results are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://word2vec.googlecode.com/svn/trunk/.

  2. 2.

    http://github.com/fxsjy/jieba.

References

  1. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  2. Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: a computational study. Behav. Res. Methods 39(3), 510–526 (2007)

    Article  Google Scholar 

  3. Garg, N., Schiebinger, L., Jurafsky, D., Zou, J.: Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Nat. Acad. Sci. 115(16), E3635–E3644 (2018)

    Article  Google Scholar 

  4. Grayson, S., Mulvany, M., Wade, K., Meaney, G., Greene, D.: Exploring the role of gender in 19th century fiction through the lens of word embeddings. In: Gracia, J., Bond, F., McCrae, J.P., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds.) LDK 2017. LNCS (LNAI), vol. 10318, pp. 358–364. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59888-8_30

    Chapter  Google Scholar 

  5. Hamilton, W.L., Leskovec, J., Jurafsky, D.: Cultural shift or linguistic drift? Comparing two computational measures of semantic change. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, vol. 2016, p. 2116. NIH Public Access (2016)

    Google Scholar 

  6. Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. arXiv preprint arXiv:1605.09096 (2016)

  7. Hellrich, J., Hahn, U.: Bad company—neighborhoods in neural embedding spaces considered harmful. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2785–2796 (2016)

    Google Scholar 

  8. Jatowt, A., Duh, K.: A framework for analyzing semantic change of words across time. In: IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 229–238. IEEE (2014)

    Google Scholar 

  9. Kulkarni, V., Al-Rfou, R., Perozzi, B., Skiena, S.: Statistically significant detection of linguistic change. In: Proceedings of the 24th International Conference on World Wide Web, pp. 625–635. International World Wide Web Conferences Steering Committee (2015)

    Google Scholar 

  10. Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Computat. Linguist. 3, 211–225 (2015)

    Article  Google Scholar 

  11. Liu, Y., Chen, F., Kong, W., Yu, H., Zhang, M., Ma, S., Ru, L.: Identifying web spam with the wisdom of the crowds. ACM Trans. Web (TWEB) 6(1), 2 (2012)

    Google Scholar 

  12. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  13. Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)

    Google Scholar 

  14. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)

    Article  MathSciNet  Google Scholar 

  15. Xu, Y., Kemp, C.: A computational evaluation of two laws of semantic change. In: CogSci (2015)

    Google Scholar 

  16. Yan, E., Zhu, Y.: Tracking word semantic change in biomedical literature. Int. J. Med. Informatics 109, 76–86 (2018)

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported by National Undergraduate Training Program for Innovation and Entrepreneurship (No. 201810635003), National Natural Science Foundation of China (No. 61877051) and CSTC funding (No. cstc2017zdcy-zdyf0366).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, X., Cao, Y., Li, L. (2019). Exploring Semantic Change of Chinese Word Using Crawled Web Data. In: Bakaev, M., Frasincar, F., Ko, IY. (eds) Web Engineering. ICWE 2019. Lecture Notes in Computer Science(), vol 11496. Springer, Cham. https://doi.org/10.1007/978-3-030-19274-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-19274-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-19273-0

  • Online ISBN: 978-3-030-19274-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics