Skip to main content

Hierarchical Transformer Encoders for Vietnamese Spelling Correction

  • Conference paper
  • First Online:
Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices (IEA/AIE 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12798))

Abstract

In this paper, we propose a Hierarchical Transformer model for Vietnamese spelling correction problem. The model consists of multiple Transformer encoders and utilizes both character-level and word-level to detect errors and make corrections. In addition, to facilitate future work in Vietnamese spelling correction tasks, we propose a realistic dataset collected from real-life texts for the problem. We compare our method with other methods and publicly available systems. The proposed method outperforms all of the contemporary methods in terms of recall, precision, and f1-score. A demo version (https://nlp.laban.vn/wiki/spelling_checker/) is publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://vi.wikipedia.org/wiki/Wikipedia.

  2. 2.

    https://github.com/heraclex12/Viwiki-spelling.

  3. 3.

    http://www.hieuthi.com/blog/2017/04/03/vietnamese-syllables-usage.html.

  4. 4.

    https://viettelgroup.ai/en/service/nlp.

References

  1. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)

    Google Scholar 

  2. Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994)

    Google Scholar 

  3. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

  4. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: A lite BERT for self-supervised learning of language representations (2020)

    Google Scholar 

  5. Nguyen, H.T., Dang, T.B., Nguyen, L.M.: Deep learning approach for Vietnamese consonant misspell correction. In: Nguyen, L.-M., Phan, X.-H., Hasida, K., Tojo, S. (eds.) PACLING 2019. CCIS, vol. 1215, pp. 497–504. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-6168-9_40

    Chapter  Google Scholar 

  6. Nguyen, P.H., Ngo, T.D., Phan, D.A., Dinh, T.P.T., Huynh, T.Q.: Vietnamese spelling detection and correction using bi-gram, minimum edit distance, soundex algorithms with some additional heuristics. In: 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies, pp. 96–102 (2008)

    Google Scholar 

  7. Niranjan, A., Shaik, M.A.B., Verma, K.: Hierarchical attention transformer architecture for syntactic spell correction. ArXiv abs/2005.04876 (2020)

    Google Scholar 

  8. Park, C., Kim, K., Yang, Y., Kang, M., Lim, H.: Neural spelling correction: translating incorrect sentences to correct sentences for multimedia. Multimed. Tools Appl. (2020). https://doi.org/10.1007/s11042-020-09148-2

    Article  Google Scholar 

  9. Pham, T., Pham, X., Le-Hong, P.: On the use of machine translation-based approaches for Vietnamese diacritic restoration. CoRR abs/1709.07104 (2017). http://arxiv.org/abs/1709.07104

  10. Vaswani, A., et al.: Attention is all you need. ArXiv abs/1706.03762 (2017)

    Google Scholar 

  11. Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016)

    Google Scholar 

  12. You, Y., et al.: Large batch optimization for deep learning: training BERT in 76 minutes (2020)

    Google Scholar 

  13. Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: ACL (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Son T. Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tran, H., Dinh, C.V., Phan, L., Nguyen, S.T. (2021). Hierarchical Transformer Encoders for Vietnamese Spelling Correction. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12798. Springer, Cham. https://doi.org/10.1007/978-3-030-79457-6_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79457-6_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79456-9

  • Online ISBN: 978-3-030-79457-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics