Abstract
Since Chinese dependency parsing is lack of a large amount of manually annotated dependency treebank. Some unsupervised methods of using large-scale unannotated data are proposed and inevitably introduce too much noise from automatic annotation. In order to solve this problem, this paper proposes an approach of iteratively integrating unsupervised features for training Chinese dependency parsing model. Considering that more errors occurred in parsing longer sentences, this paper divide raw data according to sentence length and then iteratively train model. The model trained on shorter sentences will be used in the next iteration to analyze longer sentences. This paper adopts a character-based dependency model for joint word segmentation, POS tagging and dependency parsing in Chinese. The advantage of the joint model is that one task can be promoted by other tasks during processing by exploring the available internal results from the other tasks. The higher accuracy of the three tasks on shorter sentences can bring about higher accuracy of the whole model. This paper verified the proposed approach on the Penn Chinese Treebank and two raw corpora. The experimental results show that F1-scores of the three tasks were improved at each iteration, and F1-score of the dependency parsing was increased by 0.33%, compared with the conventional method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Koo, T., Collins, M.: Efficient third-order dependency parsers. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 1–11 (2010)
McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 91–98. Association for Computational Linguistics (2005)
Yamada, H., Matsumoto, Y.: Statistical dependency analysis with support vector machines. In: Proceedings of IWPT, vol. 3 (2003)
Nivre, J.: Algorithms for deterministic incremental dependency parsing. Comput. Linguist. 34(4), 513–553 (2008)
朱慕华, 王会珍, 朱靖波, 等. 向上学习方法改进移进-归约中文句法分析. 中文信息学报 29(2), 33–39 (2015)
Zhou, G., Zhao, J., Liu, K., et al.: Exploiting web-derived selectional preference to improve statistical dependency parsing. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 1556–1565. Association for Computational Linguistics (2011)
Chen, W., Kawahara, D., Uchimoto, K., et al.: Dependency parsing with short dependency relations in unlabeled data. In: IJCNLP, pp. 88–94 (2008)
Chen, W., Kazama, J., Uchimoto, K., et al.: Improving dependency parsing with subtrees from auto-parsed data. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 570–579. Association for Computational Linguistics (2009)
Chen, W., Zhang, M., Li, H.: Utilizing dependency language models for graph-based dependency parsing models. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Long Papers-Volume 1, pp. 213–222. Association for Computational Linguistics (2012)
Zhang, M., Zhang, Y., Che, W., et al.: Chinese Parsing Exploiting Characters. Proceedings of the 51st Annual meeting of the Association for Computational Linguistics, Long Papers- volume 1. Association for Computational Linguistics, pp. 125–134 (2013)
Hatori, J., Matsuzaki, T., Miyao, Y., et al.: Incremental joint approach to word segmentation, pos tagging, and dependency parsing in Chinese. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Long Papers-Volume 1. Association for Computational Linguistics, pp. 1045–1053 (2012)
Guo, Z., Zhang, Y., et al.: Character-level dependency model for joint word segmentation, POS tagging, and dependency parsing in Chinese. IEICE TRANS. Inf. Syst. 99, 257–264 (2016)
Zhang, M., Zhang, Y., Che, W., et al.: Character-level chinese dependency parsing. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1326–1336 (2014)
Collins, M., Roark, B.: Incremental parsing with the perceptron algorithm. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 111. Association for Computational Linguistics (2004)
Zhang, Y., Nivre, J.: Analyzing the effect of global learning and beam-search on transition-based dependency parsing. In: Proceedings of the COLING (Posters), pp. 1391–1400 (2012)
Wang, Y., Jun’ichi Kazama Y.T., Tsuruoka Y., et al.: Improving Chinese word segmentation and POS tagging with semi-supervised methods using large auto-analyzed data. In: IJCNLP, pp. 309–317 (2011)
Ozeki, K.: A multi-stage decision algorithm to select optimum bunsetsu sequences based on degree of Kakariuke-dependency. IEICE Trans. Inf. Syst. 70, 601–609 (1987)
Acknowledgments
The authors are supported by National Nature Science Foundation of China (Contract 61370130 and 61473294), and the Fundamental Research Funds of the Central Universities (2014RC040).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Luo, T., Zhang, Y., Xu, J., Chen, Y. (2016). Iterative Integration of Unsupervised Features for Chinese Dependency Parsing. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-50496-4_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)