Hierarchical Transformer Encoders for Vietnamese Spelling Correction

Tran, Hieu; Dinh, Cuong V.; Phan, Long; Nguyen, Son T.

doi:10.1007/978-3-030-79457-6_46

Hieu Tran^12,13,
Cuong V. Dinh¹²,
Long Phan¹² &
…
Son T. Nguyen^12,13,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12798))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1612 Accesses
4 Citations

Abstract

In this paper, we propose a Hierarchical Transformer model for Vietnamese spelling correction problem. The model consists of multiple Transformer encoders and utilizes both character-level and word-level to detect errors and make corrections. In addition, to facilitate future work in Vietnamese spelling correction tasks, we propose a realistic dataset collected from real-life texts for the problem. We compare our method with other methods and publicly available systems. The proposed method outperforms all of the contemporary methods in terms of recall, precision, and f1-score. A demo version (https://nlp.laban.vn/wiki/spelling_checker/) is publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Google Scholar
Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: A lite BERT for self-supervised learning of language representations (2020)
Google Scholar
Nguyen, H.T., Dang, T.B., Nguyen, L.M.: Deep learning approach for Vietnamese consonant misspell correction. In: Nguyen, L.-M., Phan, X.-H., Hasida, K., Tojo, S. (eds.) PACLING 2019. CCIS, vol. 1215, pp. 497–504. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-6168-9_40
Chapter Google Scholar
Nguyen, P.H., Ngo, T.D., Phan, D.A., Dinh, T.P.T., Huynh, T.Q.: Vietnamese spelling detection and correction using bi-gram, minimum edit distance, soundex algorithms with some additional heuristics. In: 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies, pp. 96–102 (2008)
Google Scholar
Niranjan, A., Shaik, M.A.B., Verma, K.: Hierarchical attention transformer architecture for syntactic spell correction. ArXiv abs/2005.04876 (2020)
Google Scholar
Park, C., Kim, K., Yang, Y., Kang, M., Lim, H.: Neural spelling correction: translating incorrect sentences to correct sentences for multimedia. Multimed. Tools Appl. (2020). https://doi.org/10.1007/s11042-020-09148-2
Article Google Scholar
Pham, T., Pham, X., Le-Hong, P.: On the use of machine translation-based approaches for Vietnamese diacritic restoration. CoRR abs/1709.07104 (2017). http://arxiv.org/abs/1709.07104
Vaswani, A., et al.: Attention is all you need. ArXiv abs/1706.03762 (2017)
Google Scholar
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016)
Google Scholar
You, Y., et al.: Large batch optimization for deep learning: training BERT in 76 minutes (2020)
Google Scholar
Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: ACL (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Zalo Group - VNG Corporation, Ho Chi Minh City, Vietnam
Hieu Tran, Cuong V. Dinh, Long Phan & Son T. Nguyen
University of Science, Ho Chi Minh City, Vietnam
Hieu Tran & Son T. Nguyen
Vietnam National University, Ho Chi Minh City, Vietnam
Son T. Nguyen

Authors

Hieu Tran
View author publications
You can also search for this author in PubMed Google Scholar
Cuong V. Dinh
View author publications
You can also search for this author in PubMed Google Scholar
Long Phan
View author publications
You can also search for this author in PubMed Google Scholar
Son T. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Son T. Nguyen .

Editor information

Editors and Affiliations

i-SOMET Incorporate Association, Morioka, Japan
Hamido Fujita
Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
Ali Selamat
Western Norway University of Applied Sciences, Bergen, Norway
Jerry Chun-Wei Lin
Texas State University San Marcos, San Marcos, TX, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tran, H., Dinh, C.V., Phan, L., Nguyen, S.T. (2021). Hierarchical Transformer Encoders for Vietnamese Spelling Correction. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12798. Springer, Cham. https://doi.org/10.1007/978-3-030-79457-6_46

Download citation

DOI: https://doi.org/10.1007/978-3-030-79457-6_46
Published: 19 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79456-9
Online ISBN: 978-3-030-79457-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hierarchical Transformer Encoders for Vietnamese Spelling Correction