Skip to main content

Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11839))

Abstract

Rapid increase of the scale of text carries huge costs for manual proofreading. In comparison, automatic proofreading shows great advantages on time and human resource, drawing more researchers into it. In this paper, we propose two attention based deep neural network models combined with confusion sets to detect and correct possible Chinese spelling errors in character-level. Our proposed approaches first model the context of Chinese character embedding using Long Short-Term Memory (LSTM) networks, then score the probabilities of candidates from its confusion set through attention mechanism, choosing the highest one as the prediction answer. Also, we define a new methodology for obtaining (preceding text, following text, candidates, target) quads and provides a supervised dataset for training and testing (Our data has been released to the public in https://github.com/ccit-proofread.). Performance evaluation indicates that our models achieve the state-of-the-art performance and outperform a set of baselines.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  2. Elman, J.L.: Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning 7(2–3), 195–225 (1991)

    Google Scholar 

  3. Liu, C., Lai, M., Chuang, Y., Lee, C.: Visually and phonologically similar characters in incorrect simplified chinese words. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 739–747. Association for Computational Linguistics, 2010

    Google Scholar 

  4. Liu, L., Cao, C.: Chinese real-word error automatic proofreading ßased on combining of local context features. Computer Science 43(12), 30–35 (2016)

    Google Scholar 

  5. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421, 2015

    Google Scholar 

  6. Mays, E., Damerau, F.J., Mercer, R.L.: Context based spelling correction. Information Processing & Management 27(5), 517–522 (1991)

    Article  Google Scholar 

  7. Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In Advances in neural information processing systems, pages 2204–2212, 2014

    Google Scholar 

  8. Kinga, D., Adam, J.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  9. Hermann, K., Grefenstette, E., Espeholt, L., Will Kay, et al.: Teaching machines to read and comprehend. arxiv.org/abs/1506.03340, 2015

  10. Tang, D., Qin, B., Feng, X., Ting L.: Effective lstms for target-dependent sentiment classification. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3298–3307, 2016

    Google Scholar 

  11. Tseng, Y.H., Lee, L.H., Chang, L.P., Chen, H.H.: Introduction to sighan 2015 bake-off for chinese spelling check. In Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, pages 32–37, 2015

    Google Scholar 

  12. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones L., et al.: Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000–6010, 2017

    Google Scholar 

  13. Hearn, A.W., Hirst, G., Budanitsky, A.: Real-word spelling correction with trigrams: A reconsideration of the mays, damerau, and mercer model. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 605–616. Springer, 2008

    Google Scholar 

  14. Zhang, L., Huang, C., Zhou, M., Pan, H.: Automatic detecting/correcting errors in chinese text by an approximate word-matching algorithm. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pages 248–254. Association for Computational Linguistics, 2000

    Google Scholar 

  15. Golding, A.R., Roth, D.: A winnow-based approach to context-sensitive spelling correction. Machine learning 34(1–3), 107–130 (1999)

    Article  Google Scholar 

  16. Hirst, G., Budanitsky, A.: Correcting real-word spelling errors by restoring lexical cohesion. Natural Language Engineering 11(1), 87–111 (2005)

    Article  Google Scholar 

  17. Zhao, H., Cai, D., Xin, Y., Wang, Y., Jia, Z.: A hybrid model for chinese spelling check. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 16(3), 21 (2017)

    Google Scholar 

  18. Zhuang, L., Bao, T., Zhu, X., Wang, C., Naoi, S.: A chinese ocr spelling check approach based on statistical language models. In Systems, Man and Cybernetics, 2004 IEEE International Conference on, volume 5, pages 4727–4732. IEEE, 2004

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiufeng Wang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 311 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Q., Liu, M., Zhang, W., Guo, Y., Li, T. (2019). Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32236-6_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32235-9

  • Online ISBN: 978-3-030-32236-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics