Abstract
Word segmentation is a key problem for Chinese text analysis. In this paper, with the consideration of both word-coverage rate and sentence-coverage rate, based on the classic Bi-Directed Maximum Match (BDMM) segmentation method, a character Directed Graph with ambiguity mark is designed for searching multiple possible segmentation sequences. This method is compared with the classic Maximum Match algorithm and Omni-segmentation algorithm. The experiment result shows that Directed Graph based BDMM algorithm can achieve higher coverage rate and lower complexity.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Wu, A., Jiang, Z.: Word Segmentation in Sentence Analysis. In: Proc. of 1998 International Conference on Chinese Information Processing, Beijing, China, pp. 169–180 (1998)
Sun, M., Zou, J.: A review of the study of Chinese automatic segmentation. Journal of Modern Language 3, 22–32 (2001)
Liu, K.: Chinese text automatic segmentation and tagging. Commercial Press, Beijing (2000)
Yan, Y., Zhou, X.: Study of Segmentation Strategy on Ambiguous Phrases of Overlap Type. Journal of the China Society for Scientific and Technical Information 19(6) (2000)
Wan, J., Yang, C.: An Algorithm Model of Word Omni-segmentation for Written Chinese. Mini-micro Systems 23(7), 1247–1255 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, Y., Wang, T., Chen, H. (2005). Using Directed Graph Based BDMM Algorithm for Chinese Word Segmentation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-30586-6_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)