Abstract
In this paper, we propose a Hierarchical Transformer model for Vietnamese spelling correction problem. The model consists of multiple Transformer encoders and utilizes both character-level and word-level to detect errors and make corrections. In addition, to facilitate future work in Vietnamese spelling correction tasks, we propose a realistic dataset collected from real-life texts for the problem. We compare our method with other methods and publicly available systems. The proposed method outperforms all of the contemporary methods in terms of recall, precision, and f1-score. A demo version (https://nlp.laban.vn/wiki/spelling_checker/) is publicly available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: A lite BERT for self-supervised learning of language representations (2020)
Nguyen, H.T., Dang, T.B., Nguyen, L.M.: Deep learning approach for Vietnamese consonant misspell correction. In: Nguyen, L.-M., Phan, X.-H., Hasida, K., Tojo, S. (eds.) PACLING 2019. CCIS, vol. 1215, pp. 497–504. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-6168-9_40
Nguyen, P.H., Ngo, T.D., Phan, D.A., Dinh, T.P.T., Huynh, T.Q.: Vietnamese spelling detection and correction using bi-gram, minimum edit distance, soundex algorithms with some additional heuristics. In: 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies, pp. 96–102 (2008)
Niranjan, A., Shaik, M.A.B., Verma, K.: Hierarchical attention transformer architecture for syntactic spell correction. ArXiv abs/2005.04876 (2020)
Park, C., Kim, K., Yang, Y., Kang, M., Lim, H.: Neural spelling correction: translating incorrect sentences to correct sentences for multimedia. Multimed. Tools Appl. (2020). https://doi.org/10.1007/s11042-020-09148-2
Pham, T., Pham, X., Le-Hong, P.: On the use of machine translation-based approaches for Vietnamese diacritic restoration. CoRR abs/1709.07104 (2017). http://arxiv.org/abs/1709.07104
Vaswani, A., et al.: Attention is all you need. ArXiv abs/1706.03762 (2017)
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016)
You, Y., et al.: Large batch optimization for deep learning: training BERT in 76 minutes (2020)
Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: ACL (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Tran, H., Dinh, C.V., Phan, L., Nguyen, S.T. (2021). Hierarchical Transformer Encoders for Vietnamese Spelling Correction. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12798. Springer, Cham. https://doi.org/10.1007/978-3-030-79457-6_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-79457-6_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79456-9
Online ISBN: 978-3-030-79457-6
eBook Packages: Computer ScienceComputer Science (R0)