Skip to main content

One Novel Word Segmentation Method Based on N-Shortest Path in Vietnamese

  • Conference paper
  • First Online:
Advances in Computer Communication and Computational Sciences

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 924))

Abstract

Automatic word segmentation of Vietnamese is the primary step in Vietnamese text information processing, which would be an important support for cross-language information processing tasks in China and Vietnam. Since the Vietnamese language is an isolating language with tones, each syllable can not only form a word individually, but also create a new word by combining with left and/or right syllables. Therefore, automatic word segmentation of Vietnamese cannot be simply based on spaces. This paper takes automatic word segmentation of the Vietnamese language as the research object. First, it makes a rough segmentation of Vietnamese sentences with the N-shortest path model. Then, syllables in each sentence are abstracted into a directed acyclic graph. Finally, the Vietnamese word segmentation is obtained by calculating the shortest path with the help of the BEMS marking system. The results show that the proposed algorithm achieves a satisfactory performance in Vietnamese word segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Võ, T., Xuân, V.: A brief exposition of the influence of Chinese on Vietnamese. J. Jinan Univ. 5, 56–57 (2001)

    Google Scholar 

  2. Ngo, Q.H., Dien, D., Winiwajrter, W.: Automatic searching for english-Vietnamese documents on the internet. In: 24th International Conference on Computational Linguistics, pp. 211 (2012)

    Google Scholar 

  3. Do, T.N.D., Le, V.B., Bigi, B., et al.: Mining a comparable text corpus fora Vietnamese-French statistical machine translation system. In: Proceedings of the Fourth Workshop on Statistical Machine Translation. Association for Computational Linguistics, pp. 165–172 (2009)

    Google Scholar 

  4. Nguyen, Q.T., Nguyen, N.L.T., Usuke Miyao, Y.: Comparing different criteria for Vietnamese word segmentation. In: Proceedings of 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP), pp 53–68 (2012)

    Google Scholar 

  5. Xiong, M., Li, Y., Guo, J., Mao, C., Yu, Z.: Vietnamese word segmentation with conditional random fields and ambiguity model. J. Data Acquisition Process. 636–642 (2017)

    Google Scholar 

  6. Mo, Y., Guo, J., Mao, C., Yu, Z., Niu, Y.: A bilingual word alignment method of Vietnamese-Chinese based on deep neutral network. J. Shandong Univ. (Nat. Sci.) 51(1), 78–82 (2016)

    Google Scholar 

  7. Yang, Q., Yu, Z., Hong, D., Gao, S., Tang, Z.: Chinese-Vietnamese word similarity computation based on Wikipedia. J. Nanjing Univ. Sci. Technol. 40(4), 462–466 (2016)

    Google Scholar 

  8. Luo, L.: Research on the construction of web-based comparable corpora of Chinese and Vietnamese. Kunming University of Science and Technology Master’s Thesis, P17–20 (2015)

    Google Scholar 

  9. Tang, M., Zhu M., Yu, Z., Tang P., Gao, S.: Chinese-Vietnamese Bilingual Event Correlation Analysis Based on Bilingual Topic and Factor Graph. ACL 2017 (2017)

    Google Scholar 

  10. Generation of Summarization for Chinese-Vietnamese Bilingual News Event Differences. ACL2017 (Note: Oral presentation at conference)

    Google Scholar 

  11. Yuan, L., Yangxiu, Z.: Modern Vietnamese grammar. World Publishing Guangdong Corporation, Guangzhou (2012)

    Google Scholar 

  12. Huaping, Z., Qun, L.: Model of Chinese words rough segmentation based on N-shortest-paths method. J. Chin. Inf. Process. 5, 1–7 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohua Ke .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ke, X., Luo, H., Chen, J., Huang, R., Lai, J. (2019). One Novel Word Segmentation Method Based on N-Shortest Path in Vietnamese. In: Bhatia, S., Tiwari, S., Mishra, K., Trivedi, M. (eds) Advances in Computer Communication and Computational Sciences. Advances in Intelligent Systems and Computing, vol 924. Springer, Singapore. https://doi.org/10.1007/978-981-13-6861-5_47

Download citation

Publish with us

Policies and ethics