Abstract
Automatic word segmentation of Vietnamese is the primary step in Vietnamese text information processing, which would be an important support for cross-language information processing tasks in China and Vietnam. Since the Vietnamese language is an isolating language with tones, each syllable can not only form a word individually, but also create a new word by combining with left and/or right syllables. Therefore, automatic word segmentation of Vietnamese cannot be simply based on spaces. This paper takes automatic word segmentation of the Vietnamese language as the research object. First, it makes a rough segmentation of Vietnamese sentences with the N-shortest path model. Then, syllables in each sentence are abstracted into a directed acyclic graph. Finally, the Vietnamese word segmentation is obtained by calculating the shortest path with the help of the BEMS marking system. The results show that the proposed algorithm achieves a satisfactory performance in Vietnamese word segmentation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Võ, T., Xuân, V.: A brief exposition of the influence of Chinese on Vietnamese. J. Jinan Univ. 5, 56–57 (2001)
Ngo, Q.H., Dien, D., Winiwajrter, W.: Automatic searching for english-Vietnamese documents on the internet. In: 24th International Conference on Computational Linguistics, pp. 211 (2012)
Do, T.N.D., Le, V.B., Bigi, B., et al.: Mining a comparable text corpus fora Vietnamese-French statistical machine translation system. In: Proceedings of the Fourth Workshop on Statistical Machine Translation. Association for Computational Linguistics, pp. 165–172 (2009)
Nguyen, Q.T., Nguyen, N.L.T., Usuke Miyao, Y.: Comparing different criteria for Vietnamese word segmentation. In: Proceedings of 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP), pp 53–68 (2012)
Xiong, M., Li, Y., Guo, J., Mao, C., Yu, Z.: Vietnamese word segmentation with conditional random fields and ambiguity model. J. Data Acquisition Process. 636–642 (2017)
Mo, Y., Guo, J., Mao, C., Yu, Z., Niu, Y.: A bilingual word alignment method of Vietnamese-Chinese based on deep neutral network. J. Shandong Univ. (Nat. Sci.) 51(1), 78–82 (2016)
Yang, Q., Yu, Z., Hong, D., Gao, S., Tang, Z.: Chinese-Vietnamese word similarity computation based on Wikipedia. J. Nanjing Univ. Sci. Technol. 40(4), 462–466 (2016)
Luo, L.: Research on the construction of web-based comparable corpora of Chinese and Vietnamese. Kunming University of Science and Technology Master’s Thesis, P17–20 (2015)
Tang, M., Zhu M., Yu, Z., Tang P., Gao, S.: Chinese-Vietnamese Bilingual Event Correlation Analysis Based on Bilingual Topic and Factor Graph. ACL 2017 (2017)
Generation of Summarization for Chinese-Vietnamese Bilingual News Event Differences. ACL2017 (Note: Oral presentation at conference)
Yuan, L., Yangxiu, Z.: Modern Vietnamese grammar. World Publishing Guangdong Corporation, Guangzhou (2012)
Huaping, Z., Qun, L.: Model of Chinese words rough segmentation based on N-shortest-paths method. J. Chin. Inf. Process. 5, 1–7 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ke, X., Luo, H., Chen, J., Huang, R., Lai, J. (2019). One Novel Word Segmentation Method Based on N-Shortest Path in Vietnamese. In: Bhatia, S., Tiwari, S., Mishra, K., Trivedi, M. (eds) Advances in Computer Communication and Computational Sciences. Advances in Intelligent Systems and Computing, vol 924. Springer, Singapore. https://doi.org/10.1007/978-981-13-6861-5_47
Download citation
DOI: https://doi.org/10.1007/978-981-13-6861-5_47
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6860-8
Online ISBN: 978-981-13-6861-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)