Deep Learning Approach for Vietnamese Consonant Misspell Correction
- 8 Downloads
Vietnamese words are combinations of consonants, vowels, and diacritics. Previous studies on Vietnamese spelling correction often focused on mistyped errors. Misspelled errors are more common and difficult to detect. Based on our literature review, there is no direct study to address this issue. A misspelled Vietnamese word can become another word does exist in the vocabulary but make the sentence a different meaning or meaningless. While mistyped errors are typographical errors, misspelled errors may appear in any type of text including typed documents and handwritten text. Compared to mistyped errors, misspelled errors are harder to detect, especially by people who type it out. This error comes from the wrong understanding about the spelling of the word. For that reason, checking a sentence with a vocabulary filter does not guarantee that the sentence is spelled correctly. Checking Vietnamese spelling errors is a difficult problem. There have been many articles trying to solve this problem with different approaches but they have their own limitations. In this paper, we propose a deep learning approach focusing on consonant misspell errors with superior accuracy compared to the existing methods. The accuracy of our model makes a significant gap compared to the current state-of-the-art model.
KeywordsVietnamese consonant misspell correction Misspell direction encoding Deep learning
- 1.Hai, N.D., Nhi, N.P.H.: Syntactic parser in Vietnamese sentences and its application in spell checking. University of Science Ho Chi Minh City (1999)Google Scholar
- 2.Duy, N.T.N., Dien, D.: An approach in Vietnamese spell checking. University of Science Ho Chi Minh City (2004)Google Scholar
- 3.Thi Xuan Huong, N., Dang, T.-T., Nguyen, T.-T., Le, A.-C.: Using large N-gram for Vietnamese spell checking. In: Nguyen, V.-H., Le, A.-C., Huynh, V.-N. (eds.) Knowledge and Systems Engineering. AISC, vol. 326, pp. 617–627. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-11680-8_49CrossRefGoogle Scholar
- 4.Nguyen, V.H., Nguyen, H.T., Snasel, V.: Normalization of Vietnamese tweets on Twitter. In: Abraham, A., Jiang, X.H., Snášel, V., Pan, J.-S. (eds.) Intelligent Data Analysis and Applications. AISC, vol. 370, pp. 179–189. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21206-7_16CrossRefGoogle Scholar