Abstract
This paper describes the model we designed for the Chinese word segmentation Task of NLPCC 2015. We firstly apply a word-based perceptron algorithm to build the base segmenter. Then, we use a Bootstrap Aggregating model of bagging which improves the segmentation results consistently on the three tracks of closed, semi-open and open test. Considering the characteristics of Weibo text, we also perform rule-based adaptation before decoding. Finally, our model achieves F-score 95.12% on closed track, 95.3% on semi-open track and 96.09% on open track.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zhang, Y., Clark, S.: Syntactic processing using the generalized perceptron and beam search. Computational Linguistics 37(1), 105–151 (2011)
Zhang, Y., Clark, S.: Chinese segmentation with a word-based perceptron algorithm. In: Proceedings of ACL, Prague, pp. 840–847 (2007)
Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of EMNLP, Philadelphia, PA, pp. 1–8 (2002)
Zhang, K., Sun, M., Zhou, C.: Word segmentation on Chinese mirco-blog data with a linear-time incremental model. In: Second CIPS-SIGHAN Joint Conference on Chinese Language Processing (2012)
Xue, N.: Chinese word segmentation as character tagging. International Journal of Computational Linguistics and Chinese Language Processing 8(1) (2003)
Feng, H., Chen, K., Deng, X., Zheng, W.: Accessor variety criteria for Chinese word extraction. Computational Linguistics 30(1), 75–93 (2004)
Sun, W., Xu, J.: Enhancing Chinese word segmentation using unlabeled data. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2011)
Sun, W.: Word-based and character-based word segmentation models: comparison and combination. In: Coling 2010: Posters, Beijing, China, August, pp. 1211–1219. Coling 2010 Organizing Committee (2010)
Liu, Y., Che, W.: Micro blogs oriented word segmentation system. In: Second CIPS-SIGHAN Joint Conference on Chinese Language Processing (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Yu, Z., Dai, XY., Shen, S., Huang, S., Chen, J. (2015). Word Segmentation of Micro Blogs with Bagging. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2015. Lecture Notes in Computer Science(), vol 9362. Springer, Cham. https://doi.org/10.1007/978-3-319-25207-0_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-25207-0_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25206-3
Online ISBN: 978-3-319-25207-0
eBook Packages: Computer ScienceComputer Science (R0)