Skip to main content
Log in

Improving Syntactic Parsing of Chinese with Empty Element Recovery

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

This paper puts forward and explores the problem of empty element (EE) recovery in Chinese from the syntactic parsing perspective, which has been largely ignored in the literature. First, we demonstrate why EEs play a critical role in syntactic parsing of Chinese and how EEs can better benefit syntactic parsing of Chinese via re-categorization from the syntactic perspective. Then, we propose two ways to automatically recover EEs: a joint constituent parsing approach and a chunk-based dependency parsing approach. Evaluation on the Chinese TreeBank (CTB) 5.1 corpus shows that integrating EE recovery into the Charniak parser achieves a significant performance improvement of 1.29 in F 1-measure. To the best of our knowledge, this is the first close examination of EEs in syntactic parsing of Chinese, which deserves more attention in the future with regard to its specific importance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Marcus M P, Marcinkiewicz M A, Santorini B. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 1993, 19(2): 313–330.

    Google Scholar 

  2. Collins M. Head-driven statistical models for natural language parsing [Ph.D. Thesis]. University of Pennsylvania, 1999.

  3. Charniak E. A maximum-entropy-inspired parser. In Proc. the 1st North American Chapter of the Association for Computational Linguistics Conference, April 2000, pp.132-139.

  4. Petrov S, Klein D. Improved inference for unlexicalized parsing. In Proc. Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, April 2007, pp.404-411.

  5. Zhao S H, Ng H T. Identification and resolution of Chinese zero pronouns: A machine learning approach. In Proc. the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 2007, pp.541-550.

  6. Kong F, Zhou G D. A tree kernel-based unified framework for Chinese zero anaphora resolution. In Proc. the 2010 Conference on Empirical Methods in Natural Language Processing, October 2010, pp.882-891.

  7. Kim Y J. Subject/object drop in the acquisition of Korean: A cross-linguistic comparison. Journal of East Asian Linguistics, 2000, 9(4): 325–351.

    Article  Google Scholar 

  8. Chung T, Gildea D. Effects of empty categories on machine translation. In Proc. the 2010 Conference on Empirical Methods in Natural Language Processing, October 2010, pp.636-645.

  9. Campbell R. Using linguistic principles to recover empty categories. In Proc. the 42nd Annual Meeting of the Association for Computational Linguistics, July 2004, pp.645-652.

  10. Guo Y Q, Wang H F, van Genabith J. Recovering non-local dependencies for Chinese. In Proc. the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 2007, pp.257-266.

  11. Bikel D M. On the parameter space of generative lexicalized statistical parsing models [Ph.D. Thesis]. University of Pennsylvania, 2004.

  12. Johnson M. A simple pattern-matching algorithm for recovering empty nodes and their antecedents. In Proc. the 40th Annual Meeting of the Association for Computational Linguistics, July 2002, pp.136-143.

  13. Dienes P, Dubey A. Antecedent recovery: Experiments with a trace tagger. In Proc. the 2003 Conference on Empirical Methods in Natural Language Processing, July 2003, pp.33-40.

  14. Dienes P, Dubey A. Deep syntactic processing by combining shallow methods. In Proc. the 41st Annual Meeting of the Association for Computational Linguistic, July 2003, pp.431-438.

  15. Yang Y Q, Xue N W. Chasing the ghost: Recovering empty categories in the Chinese TreeBank. In Proc. the 23rd International Conference on Computational Linguistics, August 2010, pp.1382-1390.

  16. Xue N W, Yang Y Q. Dependency-based empty category detection via phrase structure trees. In Proc. the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2013, pp.1051-1060.

  17. Cai S, Chiang D, Goldbery Y. Language- independent parsing with empty elements. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics, June 2011, pp.212-216.

  18. Cahill A, Burke M, O’Donovan R, van Genabith J, Way A. Long-distance dependency resolution in automatically acquired wide-coverage pcfg-based LFG approximations. In Proc. the 42nd Annual Meeting of the Association for Computational Linguistics, July 2004, pp.319-326.

  19. Schmid H. Trace prediction and recovery with unlexicalized PCFGs and slash features. In Proc. the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, July 2006, pp.177-184.

  20. Xue N W, Xia F. The bracketing guidelines for Penn Chinese Treebank project. Technical Report, IRCS 00–08, University of Pennsylvania.

  21. Finkel R J, Manning D C. Joint parsing and named entity recognition. In Proc. the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, May 2009, pp.326-334.

  22. Nivre J. An efficient algorithm for projective dependency parsing. In Proc. the 8th International Workshop on Parsing Technology, April 2003, pp.149-160.

  23. Xue N W. Labeling Chinese predicates with semantic roles. Computational Linguistics, 2008, 34(2): 225–255.

    Article  MathSciNet  Google Scholar 

  24. Li J H, Zhou G D, Zhao H, Zhu Q M, Qian P D. Improving nominal SRL in Chinese language with verbal SRL information and automatic predicate recognition. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, August 2009, pp.1280-1288.

  25. Li J H, Zhou G D, Ng H T. Joint syntactic and semantic parsing of Chinese. In Proc. the 48th Annual Meeting of the Association for Computational Linguistics, July 2010, pp.1108-1117.

  26. Cohen P R. Empirical Methods for Artificial Intelligence. Cambridge, USA: MIT Press, 1995.

  27. Chen W L, Kazama J, Uchimoto K, Torisawa K. Improving dependency parsing with subtrees from auto-parsed data. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, August 2009, pp.570-579.

  28. Zhou G D, Kong F. Learning noun phrase anaphoricity in coreference resolution via label propagation. Journal of Computer Science and Technology, 2011, 26(1): 34–44.

    Article  MathSciNet  Google Scholar 

  29. Zhou G D, Zhu Q M. Kernel-based semantic relation detection and classification via enriched parse tree structure. Journal of Computer Science and Technology, 2011, 26(1): 45–56.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Supported by the National Natural Science Foundation of China under Grant Nos. 61273320, 61331011, 61070123, and the National High Technology Research and Development 863 Program of China under Grant No. 2012AA011102.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(DOC 30 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, GD., Li, PF. Improving Syntactic Parsing of Chinese with Empty Element Recovery. J. Comput. Sci. Technol. 28, 1106–1116 (2013). https://doi.org/10.1007/s11390-013-1401-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-013-1401-x

Keywords

Navigation