Advertisement

Deep Learning Approach for Vietnamese Consonant Misspell Correction

  • Ha Thanh NguyenEmail author
  • Tran Binh Dang
  • Le Minh Nguyen
Conference paper
  • 8 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 1215)

Abstract

Vietnamese words are combinations of consonants, vowels, and diacritics. Previous studies on Vietnamese spelling correction often focused on mistyped errors. Misspelled errors are more common and difficult to detect. Based on our literature review, there is no direct study to address this issue. A misspelled Vietnamese word can become another word does exist in the vocabulary but make the sentence a different meaning or meaningless. While mistyped errors are typographical errors, misspelled errors may appear in any type of text including typed documents and handwritten text. Compared to mistyped errors, misspelled errors are harder to detect, especially by people who type it out. This error comes from the wrong understanding about the spelling of the word. For that reason, checking a sentence with a vocabulary filter does not guarantee that the sentence is spelled correctly. Checking Vietnamese spelling errors is a difficult problem. There have been many articles trying to solve this problem with different approaches but they have their own limitations. In this paper, we propose a deep learning approach focusing on consonant misspell errors with superior accuracy compared to the existing methods. The accuracy of our model makes a significant gap compared to the current state-of-the-art model.

Keywords

Vietnamese consonant misspell correction Misspell direction encoding Deep learning 

References

  1. 1.
    Hai, N.D., Nhi, N.P.H.: Syntactic parser in Vietnamese sentences and its application in spell checking. University of Science Ho Chi Minh City (1999)Google Scholar
  2. 2.
    Duy, N.T.N., Dien, D.: An approach in Vietnamese spell checking. University of Science Ho Chi Minh City (2004)Google Scholar
  3. 3.
    Thi Xuan Huong, N., Dang, T.-T., Nguyen, T.-T., Le, A.-C.: Using large N-gram for Vietnamese spell checking. In: Nguyen, V.-H., Le, A.-C., Huynh, V.-N. (eds.) Knowledge and Systems Engineering. AISC, vol. 326, pp. 617–627. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-11680-8_49CrossRefGoogle Scholar
  4. 4.
    Nguyen, V.H., Nguyen, H.T., Snasel, V.: Normalization of Vietnamese tweets on Twitter. In: Abraham, A., Jiang, X.H., Snášel, V., Pan, J.-S. (eds.) Intelligent Data Analysis and Applications. AISC, vol. 370, pp. 179–189. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-21206-7_16CrossRefGoogle Scholar
  5. 5.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Ha Thanh Nguyen
    • 1
    Email author
  • Tran Binh Dang
    • 1
  • Le Minh Nguyen
    • 1
  1. 1.Japan Advanced Institute of Science and TechnologyNomiJapan

Personalised recommendations