Abstract
Rapid increase of the scale of text carries huge costs for manual proofreading. In comparison, automatic proofreading shows great advantages on time and human resource, drawing more researchers into it. In this paper, we propose two attention based deep neural network models combined with confusion sets to detect and correct possible Chinese spelling errors in character-level. Our proposed approaches first model the context of Chinese character embedding using Long Short-Term Memory (LSTM) networks, then score the probabilities of candidates from its confusion set through attention mechanism, choosing the highest one as the prediction answer. Also, we define a new methodology for obtaining (preceding text, following text, candidates, target) quads and provides a supervised dataset for training and testing (Our data has been released to the public in https://github.com/ccit-proofread.). Performance evaluation indicates that our models achieve the state-of-the-art performance and outperform a set of baselines.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Elman, J.L.: Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning 7(2–3), 195–225 (1991)
Liu, C., Lai, M., Chuang, Y., Lee, C.: Visually and phonologically similar characters in incorrect simplified chinese words. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 739–747. Association for Computational Linguistics, 2010
Liu, L., Cao, C.: Chinese real-word error automatic proofreading ßased on combining of local context features. Computer Science 43(12), 30–35 (2016)
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421, 2015
Mays, E., Damerau, F.J., Mercer, R.L.: Context based spelling correction. Information Processing & Management 27(5), 517–522 (1991)
Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In Advances in neural information processing systems, pages 2204–2212, 2014
Kinga, D., Adam, J.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
Hermann, K., Grefenstette, E., Espeholt, L., Will Kay, et al.: Teaching machines to read and comprehend. arxiv.org/abs/1506.03340, 2015
Tang, D., Qin, B., Feng, X., Ting L.: Effective lstms for target-dependent sentiment classification. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3298–3307, 2016
Tseng, Y.H., Lee, L.H., Chang, L.P., Chen, H.H.: Introduction to sighan 2015 bake-off for chinese spelling check. In Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, pages 32–37, 2015
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones L., et al.: Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000–6010, 2017
Hearn, A.W., Hirst, G., Budanitsky, A.: Real-word spelling correction with trigrams: A reconsideration of the mays, damerau, and mercer model. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 605–616. Springer, 2008
Zhang, L., Huang, C., Zhou, M., Pan, H.: Automatic detecting/correcting errors in chinese text by an approximate word-matching algorithm. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pages 248–254. Association for Computational Linguistics, 2000
Golding, A.R., Roth, D.: A winnow-based approach to context-sensitive spelling correction. Machine learning 34(1–3), 107–130 (1999)
Hirst, G., Budanitsky, A.: Correcting real-word spelling errors by restoring lexical cohesion. Natural Language Engineering 11(1), 87–111 (2005)
Zhao, H., Cai, D., Xin, Y., Wang, Y., Jia, Z.: A hybrid model for chinese spelling check. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 16(3), 21 (2017)
Zhuang, L., Bao, T., Zhu, X., Wang, C., Naoi, S.: A chinese ocr spelling check approach based on statistical language models. In Systems, Man and Cybernetics, 2004 IEEE International Conference on, volume 5, pages 4727–4732. IEEE, 2004
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Q., Liu, M., Zhang, W., Guo, Y., Li, T. (2019). Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-32236-6_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)