One Novel Word Segmentation Method Based on N-Shortest Path in Vietnamese

Ke, Xiaohua; Luo, Haijiao; Chen, JiHua; Huang, Ruibin; Lai, Jinwen

doi:10.1007/978-981-13-6861-5_47

Xiaohua Ke¹⁸,
Haijiao Luo¹⁸,
JiHua Chen¹⁹,
Ruibin Huang¹⁸ &
…
Jinwen Lai¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 924))

1171 Accesses
1 Citations

Abstract

Automatic word segmentation of Vietnamese is the primary step in Vietnamese text information processing, which would be an important support for cross-language information processing tasks in China and Vietnam. Since the Vietnamese language is an isolating language with tones, each syllable can not only form a word individually, but also create a new word by combining with left and/or right syllables. Therefore, automatic word segmentation of Vietnamese cannot be simply based on spaces. This paper takes automatic word segmentation of the Vietnamese language as the research object. First, it makes a rough segmentation of Vietnamese sentences with the N-shortest path model. Then, syllables in each sentence are abstracted into a directed acyclic graph. Finally, the Vietnamese word segmentation is obtained by calculating the shortest path with the help of the BEMS marking system. The results show that the proposed algorithm achieves a satisfactory performance in Vietnamese word segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Vietnamese Word Segmentation with SVM: Ambiguity Reduction and Suffix Capture

Thai Words Segmentation Using an Unsupervised Learning Technique

Construction of Word Segmentation Model Based on HMM + BI-LSTM

References

Võ, T., Xuân, V.: A brief exposition of the influence of Chinese on Vietnamese. J. Jinan Univ. 5, 56–57 (2001)
Google Scholar
Ngo, Q.H., Dien, D., Winiwajrter, W.: Automatic searching for english-Vietnamese documents on the internet. In: 24th International Conference on Computational Linguistics, pp. 211 (2012)
Google Scholar
Do, T.N.D., Le, V.B., Bigi, B., et al.: Mining a comparable text corpus fora Vietnamese-French statistical machine translation system. In: Proceedings of the Fourth Workshop on Statistical Machine Translation. Association for Computational Linguistics, pp. 165–172 (2009)
Google Scholar
Nguyen, Q.T., Nguyen, N.L.T., Usuke Miyao, Y.: Comparing different criteria for Vietnamese word segmentation. In: Proceedings of 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP), pp 53–68 (2012)
Google Scholar
Xiong, M., Li, Y., Guo, J., Mao, C., Yu, Z.: Vietnamese word segmentation with conditional random fields and ambiguity model. J. Data Acquisition Process. 636–642 (2017)
Google Scholar
Mo, Y., Guo, J., Mao, C., Yu, Z., Niu, Y.: A bilingual word alignment method of Vietnamese-Chinese based on deep neutral network. J. Shandong Univ. (Nat. Sci.) 51(1), 78–82 (2016)
Google Scholar
Yang, Q., Yu, Z., Hong, D., Gao, S., Tang, Z.: Chinese-Vietnamese word similarity computation based on Wikipedia. J. Nanjing Univ. Sci. Technol. 40(4), 462–466 (2016)
Google Scholar
Luo, L.: Research on the construction of web-based comparable corpora of Chinese and Vietnamese. Kunming University of Science and Technology Master’s Thesis, P17–20 (2015)
Google Scholar
Tang, M., Zhu M., Yu, Z., Tang P., Gao, S.: Chinese-Vietnamese Bilingual Event Correlation Analysis Based on Bilingual Topic and Factor Graph. ACL 2017 (2017)
Google Scholar
Generation of Summarization for Chinese-Vietnamese Bilingual News Event Differences. ACL2017 (Note: Oral presentation at conference)
Google Scholar
Yuan, L., Yangxiu, Z.: Modern Vietnamese grammar. World Publishing Guangdong Corporation, Guangzhou (2012)
Google Scholar
Huaping, Z., Qun, L.: Model of Chinese words rough segmentation based on N-shortest-paths method. J. Chin. Inf. Process. 5, 1–7 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, Guangdong, 510006, People’s Republic of China
Xiaohua Ke, Haijiao Luo, Ruibin Huang & Jinwen Lai
Faculty of Asian Language and Cultures, Guangdong University of Foreign Studies, Guangzhou, Guangdong, 510420, People’s Republic of China
JiHua Chen

Authors

Xiaohua Ke
View author publications
You can also search for this author in PubMed Google Scholar
Haijiao Luo
View author publications
You can also search for this author in PubMed Google Scholar
JiHua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ruibin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jinwen Lai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaohua Ke .

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, University of Missouri–St. Louis, St. Louis, MO, USA
Sanjiv K. Bhatia
CSED, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
Shailesh Tiwari
Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology, Allahabad, Uttar Pradesh, India
Krishn K. Mishra
Department of Information Technology, Rajkiya Engineering College, Azamgarh, Uttar Pradesh, India
Munesh C. Trivedi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ke, X., Luo, H., Chen, J., Huang, R., Lai, J. (2019). One Novel Word Segmentation Method Based on N-Shortest Path in Vietnamese. In: Bhatia, S., Tiwari, S., Mishra, K., Trivedi, M. (eds) Advances in Computer Communication and Computational Sciences. Advances in Intelligent Systems and Computing, vol 924. Springer, Singapore. https://doi.org/10.1007/978-981-13-6861-5_47

Download citation

DOI: https://doi.org/10.1007/978-981-13-6861-5_47
Published: 22 May 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6860-8
Online ISBN: 978-981-13-6861-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

One Novel Word Segmentation Method Based on N-Shortest Path in Vietnamese

Abstract

Access this chapter

Similar content being viewed by others

Vietnamese Word Segmentation with SVM: Ambiguity Reduction and Suffix Capture

Thai Words Segmentation Using an Unsupervised Learning Technique

Construction of Word Segmentation Model Based on HMM + BI-LSTM

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

One Novel Word Segmentation Method Based on N-Shortest Path in Vietnamese

Abstract

Access this chapter

Similar content being viewed by others

Vietnamese Word Segmentation with SVM: Ambiguity Reduction and Suffix Capture

Thai Words Segmentation Using an Unsupervised Learning Technique

Construction of Word Segmentation Model Based on HMM + BI-LSTM

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation