Normalization of Homophonic Words in Chinese Microblogs

Zhang, Xin; Song, Jiaying; He, Yu; Fu, Guohong

doi:10.1007/978-3-662-46248-5_22

Xin Zhang¹⁸,
Jiaying Song¹⁸,
Yu He¹⁸ &
…
Guohong Fu¹⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 503))

Included in the following conference series:

International Conference of Young Computer Scientists, Engineers and Educators

1993 Accesses
1 Citations

Abstract

Homophonic words are very popular in Chinese microblog, posing a new challenge for Chinese microblog text analysis. However, to date, there has been very little research conducted on Chinese homophonic words normalization. In this paper, we take Chinese homophonic word normalization as a process of language decoding and propose an n-gram based approach. To this end, we first employ homophonic–original word or character mapping tables to generate normalization candidates for a given sentence with homophonic words, and thus exploit n-gram language models to decode the best normalization from the candidate set. Our experimental results show that using the homophonic-original character mapping table and n-grams trained from the microblog corpus help improve performance in homophonic word recognition and restoration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tan, X., Pu, K., Shen, M.: Chinese Dictionary of Rhetoric. Shanghai Dictionary Publishing House, Shanghai (2010)
Google Scholar
Zhou, J., Yang, Y.: Features of Network homophonic words in generation and development. Journal of Jianghan University: Humanities Science Edition 31(3), 30–35 (2012)
Google Scholar
Choudhury, M., Saraf, R., Jain, V., Mukherjee, A., Sarkar, S., Basu, A.: Investigation and modeling of the structure of texting language. International Journal of Document Analysis and Recognition 10(3-4), 157–174 (2007)
Article Google Scholar
Han, B., Baldwin, T.: Lexical normalization of short text messages: Makn sens a# twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 368–378 (2011)
Google Scholar
Hassan, H., Menezes, A.: Social text normalization using contextual graph random walks. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 1577–1586 (2013)
Google Scholar
Sproat, R., Black, A.W., Chen, S., Kumar, S., Ostendorf, M., Richards, C.: Normalization of non-standard words. Computer Speech & Language 15(3), 287–333 (2001)
Article Google Scholar
Beaufort, R., Roekhaut, S., Cougnon, L., Fairon, C.: A hybrid rule/model-based finite-state framework for normalizing SMS messages. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 770–779 (2010)
Google Scholar
Aw, A., Zhang, M., Xiao, J., Su, J.: A phrase-based statistical model for SMS text normalization. In: Proceedings of the 44st Annual Meeting of the Association for Computational Linguistics, pp. 33–40 (2006)
Google Scholar
Bangalore, S., Murdock, V., Riccardi, G.: Bootstrapping bilingual data using consensus translation for a multilingual instant messaging system. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7 (2002)
Google Scholar
Fu, G., Zhang, M., Zhou, G., Luke, K.: A unified framework for text analysis in Chinese TTS. In: van de Snepscheut, J.L.A. (ed.) Trace Theory and VLSI Design. LNCS, vol. 200, pp. 200–210. Springer, Heidelberg (1985)
Google Scholar
Fu, G., Luke, K., Webster, J.: Automatic expansion of abbreviations in Chinese news text: A hybrid approach. International Journal of Computer Processing of Oriental Languages 20(2&3), 165–179 (2007)
Article Google Scholar
Liu, L., Wang, S., Wang, D., Wang, P., Cao, C.: Automatic Text Detection in Domain Question Answering. Journal of Chinese Information Processing 27(3), 77–83 (2013)
MathSciNet Google Scholar
Chang, C.: Corpus-based adaptation mechanisms for Chinese homophone disambiguation. In: Proceedings of the Workshop on Very Large Corpora, pp. 94–101 (1993)
Google Scholar
Lee, Y., Chen, H.: Applying repair processing in Chinese homophone disambiguation. In: Proceedings of the 5th Conference on Applied Natural Language Processing, pp. 57–63 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Heilongjiang University, Harbin, 150080, China
Xin Zhang, Jiaying Song, Yu He & Guohong Fu

Authors

Xin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiaying Song
View author publications
You can also search for this author in PubMed Google Scholar
Yu He
View author publications
You can also search for this author in PubMed Google Scholar
Guohong Fu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Harbin Institute of Technology, Harbin, China
Hongzhi Wang & Wanxiang Che &
School of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, China
Haoliang Qi & Zhongyuan Han &
Northeast Forestry University, Harbin, China
Zhaowen Qiu
Heilongjiang Institute of Technology, Harbin, China
Leilei Kong
Harbin Engineering University, China
Junyu Lin
Zhongkeyunhai Company, Harbin, China
Zeguang Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Song, J., He, Y., Fu, G. (2015). Normalization of Homophonic Words in Chinese Microblogs. In: Wang, H., et al. Intelligent Computation in Big Data Era. ICYCSEE 2015. Communications in Computer and Information Science, vol 503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46248-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-662-46248-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46247-8
Online ISBN: 978-3-662-46248-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics