Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks

Wang, Qiufeng; Liu, Minghuan; Zhang, Weijia; Guo, Yuhang; Li, Tianrui

doi:10.1007/978-3-030-32236-6_31

Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks

Qiufeng Wang¹³,
Minghuan Liu¹³,
Weijia Zhang¹³,
Yuhang Guo¹³ &
…
Tianrui Li¹³

Conference paper
First Online: 30 September 2019

4851 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11839))

Abstract

Rapid increase of the scale of text carries huge costs for manual proofreading. In comparison, automatic proofreading shows great advantages on time and human resource, drawing more researchers into it. In this paper, we propose two attention based deep neural network models combined with confusion sets to detect and correct possible Chinese spelling errors in character-level. Our proposed approaches first model the context of Chinese character embedding using Long Short-Term Memory (LSTM) networks, then score the probabilities of candidates from its confusion set through attention mechanism, choosing the highest one as the prediction answer. Also, we define a new methodology for obtaining (preceding text, following text, candidates, target) quads and provides a supervised dataset for training and testing (Our data has been released to the public in https://github.com/ccit-proofread.). Performance evaluation indicates that our models achieve the state-of-the-art performance and outperform a set of baselines.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Elman, J.L.: Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning 7(2–3), 195–225 (1991)
Google Scholar
Liu, C., Lai, M., Chuang, Y., Lee, C.: Visually and phonologically similar characters in incorrect simplified chinese words. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 739–747. Association for Computational Linguistics, 2010
Google Scholar
Liu, L., Cao, C.: Chinese real-word error automatic proofreading ßased on combining of local context features. Computer Science 43(12), 30–35 (2016)
Google Scholar
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421, 2015
Google Scholar
Mays, E., Damerau, F.J., Mercer, R.L.: Context based spelling correction. Information Processing & Management 27(5), 517–522 (1991)
Article Google Scholar
Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In Advances in neural information processing systems, pages 2204–2212, 2014
Google Scholar
Kinga, D., Adam, J.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Hermann, K., Grefenstette, E., Espeholt, L., Will Kay, et al.: Teaching machines to read and comprehend. arxiv.org/abs/1506.03340, 2015
Tang, D., Qin, B., Feng, X., Ting L.: Effective lstms for target-dependent sentiment classification. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3298–3307, 2016
Google Scholar
Tseng, Y.H., Lee, L.H., Chang, L.P., Chen, H.H.: Introduction to sighan 2015 bake-off for chinese spelling check. In Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, pages 32–37, 2015
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones L., et al.: Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000–6010, 2017
Google Scholar
Hearn, A.W., Hirst, G., Budanitsky, A.: Real-word spelling correction with trigrams: A reconsideration of the mays, damerau, and mercer model. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 605–616. Springer, 2008
Google Scholar
Zhang, L., Huang, C., Zhou, M., Pan, H.: Automatic detecting/correcting errors in chinese text by an approximate word-matching algorithm. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pages 248–254. Association for Computational Linguistics, 2000
Google Scholar
Golding, A.R., Roth, D.: A winnow-based approach to context-sensitive spelling correction. Machine learning 34(1–3), 107–130 (1999)
Article Google Scholar
Hirst, G., Budanitsky, A.: Correcting real-word spelling errors by restoring lexical cohesion. Natural Language Engineering 11(1), 87–111 (2005)
Article Google Scholar
Zhao, H., Cai, D., Xin, Y., Wang, Y., Jia, Z.: A hybrid model for chinese spelling check. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 16(3), 21 (2017)
Google Scholar
Zhuang, L., Bao, T., Zhu, X., Wang, C., Naoi, S.: A chinese ocr spelling check approach based on statistical language models. In Systems, Man and Cybernetics, 2004 IEEE International Conference on, volume 5, pages 4727–4732. IEEE, 2004
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Technology, Southwest Jiaotong University, 999 Xi’an Road, Chengdu, China
Qiufeng Wang, Minghuan Liu, Weijia Zhang, Yuhang Guo & Tianrui Li

Authors

Qiufeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Minghuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Weijia Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuhang Guo
View author publications
You can also search for this author in PubMed Google Scholar
Tianrui Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiufeng Wang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Tang
National University of Singapore, Singapore, Singapore
Min-Yen Kan
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Sujian Li
Zhengzhou University, Zhengzhou, China
Hongying Zan

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 311 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Q., Liu, M., Zhang, W., Guo, Y., Li, T. (2019). Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-32236-6_31
Published: 30 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)