Skip to main content

A CDT-Styled End-to-End Chinese Discourse Parser

  • Conference paper
  • First Online:
Natural Language Understanding and Intelligent Applications (ICCPOL 2016, NLPCC 2016)

Abstract

Discourse parsing is a challenging task and plays a critical role in discourse analysis. Since the release of the Rhetorical Structure Theory Discourse Treebank (RST-DT) and the Penn Discourse Treebank (PDTB), the research on English discourse parsing has attracted increasing attention and achieved considerable success in recent years. At the same time, some preliminary research on certain subtasks about discourse parsing for other languages, such as Chinese, has been conducted. In this paper, the Connective-driven Dependency Treebank (CDTB) corpus is introduced. Then an end-to-end Chinese discourse parser to parse free texts into the Connective-driven Dependency Tree (CDT) style is presented. The parser consists of multiple components including elementary discourse unit detector, discourse relation recognizer, discourse parse tree generator and attribution labeler. In particular, attribution labeler determines two attributions (sense and centering) for every non-terminal node in the discourse parse trees. Effective feature sets are proposed for every component respectively. Comprehensive experiments are conducted on the Connective-driven Dependency Treebank (CDTB) corpus with an overall F1 score of 20.0%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use POS combination of the parent, left sibling and right sibling of the given comma to present the context.

  2. 2.

    We use POS combination of the parent, left sibling and right sibling of the dominating node to represent the context. When no parent or siblings, it is marked NULL.

  3. 3.

    http://maxent.sourceforge.net/.

References

  1. Carlson, L., Marcu, D., Okurowski, M.E.: Building a discourse-tagged corpus in the framework of rhetorical structure theory. In: Proceedings of 2001 SIGdial Workshop on Discourse and Dialogue (2001)

    Google Scholar 

  2. Feng, V.W., Hirst, G.: Text-level discourse parsing with rich linguistic features. In: Proceedings of ACL 2012 (2012)

    Google Scholar 

  3. Huang, H.H., Chen, H.H.: An annotation system for development of Chinese discourse corpus. In: Proceedings of COLING 2012 Demonstration Papers (2012)

    Google Scholar 

  4. Huang, H.H., Chen, H.H.: Chinese discourse relation recognition. In: Proceedings of IJCNLP 2011 (2011)

    Google Scholar 

  5. Huang, H.H., Chen, H.H.: Contingency and comparison relation labeling and structure prediction in Chinese sentences. In: Proceedings of 2012 Special Interest Group on Discourse and Dialogue (2012)

    Google Scholar 

  6. Kong, F., Ng, H.T., Zhou, G.: A constituent-based approach to argument labeling with joint inference in discourse parsing. In: Proceedings of EMNLP 2014 (2014)

    Google Scholar 

  7. Li, Y., Feng, W., Sun, J., Kong, F., Zhou, G.: Building Chinese discourse corpus with connective-driven dependency tree structure. In: Proceedings of EMNLP 2014 (2014)

    Google Scholar 

  8. Lin, Z., Ng, H.T., Kan, M.Y.: Automatically evaluating text coherence using discourse relations. In: Proceedings of ACL 2011 (2011)

    Google Scholar 

  9. Lin, Z., Ng, H.T., Kan, M.Y.: A PDTB-styled end-to-end discourse parser. Nat. Lang. Eng. 20(2), 151–184 (2014)

    Article  Google Scholar 

  10. Meyer, T., Webber, B.: Implicitation of discourse connectives in (machine) translation. In: Proceedings of 2013 Workshop on Discourse in Machine Translation (2013)

    Google Scholar 

  11. Pitler, E., Nenkova, A.: Using syntax to disambiguate explicit discourse connectives in text. In: Proceedings of ACL-IJCNLP 2009 Short Papers (2009)

    Google Scholar 

  12. Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The Penn Discourse TreeBank 2.0. In: Proceedings of LREC 2008 (2008)

    Google Scholar 

  13. Xue, N.: Annotating discourse connectives in the Chinese Treebank. In: Proceedings of 2005 Workshop on Frontiers in Corpus Annotations (2005)

    Google Scholar 

  14. Xue, N., Xia, F., Chiou, F.D., Palmer, M.: The Penn Chinese Treebank: phrase structure annotation of a large corpus. Nat. Lang. Eng. 11, 207–238 (2005)

    Article  Google Scholar 

  15. Yang, Y., Xue, N.: Chinese comma disambiguation for discourse analysis. In: Proceedings of ACL 2012 (2012)

    Google Scholar 

  16. Zhou, L., Li, B., Wei, Z., Wong, K.F.: The CUHK Discourse Treebank for Chinese: annotating explicit discourse connectives for the Chinese Treebank. In: Proceedings of LREC 2014 (2014)

    Google Scholar 

  17. Zhou, Y., Xue, N.: PDTB-style discourse annotation of Chinese text. In: Proceedings of ACL 2012 (2012)

    Google Scholar 

  18. Zhou, Y., Xue, N.: The Chinese Discourse Treebank: a Chinese corpus annotated with discourse relations. Lang. Resour. Eval. 49(2), 397–431 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by Key project 61333018 under the National Natural Science Foundation of China, Project 61472264 and 61402314 under the National Natural Science Foundation of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fang Kong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Kong, F., Wang, H., Zhou, G. (2016). A CDT-Styled End-to-End Chinese Discourse Parser. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50496-4_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50495-7

  • Online ISBN: 978-3-319-50496-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics