Skip to main content

Building A Parallel Corpus with Bilingual Discourse Alignment

  • Conference paper
  • First Online:
Chinese Lexical Semantics (CLSW 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10709))

Included in the following conference series:

  • 1695 Accesses

Abstract

This paper describes a discourse resource, namely a Chinese-English parallel corpus, based on the idea of bilingual discourse alignment. We introduce a bilingual collaborative annotation approach, which annotates English discourse units based on Chinese ones, and annotates Chinese discourse structure based on English ones subsequently. Such approach can ensure full discourse structure alignment between parallel texts, and reduce cost for annotating texts of two languages as well. Annotation Evaluation of the parallel corpus justifies the appropriateness of the discourse alignment framework to parallel texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Guzman, F., Joty, S., Marquez, L. ı. and Nakov, P.: Using discourse structure improves machine translation evaluation. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA (2014).

    Google Scholar 

  2. Saint-Dizier, P.: Emerging applications of natural language processing: concepts and new research, Chapter 28. IGI Global (2013).

    Google Scholar 

  3. Ghorbel, H.: Experiments in cross-lingual sentiment analysis in discussion forums Proceedings of 4th SocInfo conference, Lausanne, Switzerland (2012).

    Google Scholar 

  4. Koehn, P.: Europarl: a parallel corpus for statistical machine translation. Proceedings of the 10th Machine Translation Summit, Phuket, Thailand (2005).

    Google Scholar 

  5. Ralf, R. S., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., TufiÅŸ, D., et al.: The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages. Proceedings of the 5th International Conference on Language Resources and Evaluation, Genoa, Italy (2006).

    Google Scholar 

  6. Tian, L., Wong, D. F., Chao, L. S., Quaresma, P., Oliveira, F., Lu, Y., et al.: UM-Corpus: A large english-chinese parallel corpus for statistical machine translation. Proceedings of the Ninth International Conference on Language Resources and Evaluation, Reykjavik, Iceland (2014).

    Google Scholar 

  7. Carlson, L., Marcu, D. and Okurowski, M. E.: Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. Proceedings of the Second SIGdial Workshop on Discourse and Dialogue, Aalborg, Denmark (2001).

    Google Scholar 

  8. Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., et al.: The Penn discourse treebank 2.0. Proceedings of the Sixth International Conference on Language Resources and Evaluation, Marrakech, Morocco (2008).

    Google Scholar 

  9. Li, Y., Feng, W., Sun, J., Kong, F. and Zhou, G.: Building chinese discourse corpus with connective-driven dependency tree structure. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar (2014).

    Google Scholar 

  10. Prasad, R., Husain, S., Sharma, D. M. and Joshi, A.: Towards an annotated corpus of discourse relations in Hindi. Proceedings of the 6th Workshop on Asian Languae Resources (2008).

    Google Scholar 

  11. Rachakonda, R. T. and Sharma, D. M.: Creating an Annotated Tamil Corpus as a Discourse Resource. Proceedings of the Fifth Law Workshop, Portland, Oregon (2011).

    Google Scholar 

  12. Xue, N.: Annotating Discourse Connectives in the Chinese Treebank. Proceedings of the Workshop on Frontiers in Corpus Annotation II: Pie in the Sky (2005).

    Google Scholar 

  13. Zeyrek, D. and Webber, B.: A discourse resource for Turkish: annotating discourse connectives in the METU corpus. Proceedings of the 6th Workshop on Asian Languae Resources (2008).

    Google Scholar 

  14. Zhou, Y. and Xue, N.: PDTB-style discourse annotation of Chinese text. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea (2012).

    Google Scholar 

  15. Li, Y., Feng, W., Zhou, G. and Zhu, K.: Research of Chinese Clause Identificiton Based on Comma. Acta Scientiarum Naturalium Universitatis Pekinensis(Chinese), 49(1), 7-14 (2013).

    Google Scholar 

  16. Sun, J., Li, Y., Zhou, G. and Feng, W.: Research of Chinese Implicit Discourse Relation Recognition. Acta Scientiarum Naturalium Universitatis Pekinensis(Chinese), 50(1), 111-117 (2014).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Han Ren .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Feng, W., Ren, H., Li, X., Guo, H. (2018). Building A Parallel Corpus with Bilingual Discourse Alignment. In: Wu, Y., Hong, JF., Su, Q. (eds) Chinese Lexical Semantics. CLSW 2017. Lecture Notes in Computer Science(), vol 10709. Springer, Cham. https://doi.org/10.1007/978-3-319-73573-3_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73573-3_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73572-6

  • Online ISBN: 978-3-319-73573-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics