Abstract
This paper describes a discourse resource, namely a Chinese-English parallel corpus, based on the idea of bilingual discourse alignment. We introduce a bilingual collaborative annotation approach, which annotates English discourse units based on Chinese ones, and annotates Chinese discourse structure based on English ones subsequently. Such approach can ensure full discourse structure alignment between parallel texts, and reduce cost for annotating texts of two languages as well. Annotation Evaluation of the parallel corpus justifies the appropriateness of the discourse alignment framework to parallel texts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Guzman, F., Joty, S., Marquez, L. ı. and Nakov, P.: Using discourse structure improves machine translation evaluation. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA (2014).
Saint-Dizier, P.: Emerging applications of natural language processing: concepts and new research, Chapter 28. IGI Global (2013).
Ghorbel, H.: Experiments in cross-lingual sentiment analysis in discussion forums Proceedings of 4th SocInfo conference, Lausanne, Switzerland (2012).
Koehn, P.: Europarl: a parallel corpus for statistical machine translation. Proceedings of the 10th Machine Translation Summit, Phuket, Thailand (2005).
Ralf, R. S., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., TufiÅŸ, D., et al.: The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages. Proceedings of the 5th International Conference on Language Resources and Evaluation, Genoa, Italy (2006).
Tian, L., Wong, D. F., Chao, L. S., Quaresma, P., Oliveira, F., Lu, Y., et al.: UM-Corpus: A large english-chinese parallel corpus for statistical machine translation. Proceedings of the Ninth International Conference on Language Resources and Evaluation, Reykjavik, Iceland (2014).
Carlson, L., Marcu, D. and Okurowski, M. E.: Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. Proceedings of the Second SIGdial Workshop on Discourse and Dialogue, Aalborg, Denmark (2001).
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., et al.: The Penn discourse treebank 2.0. Proceedings of the Sixth International Conference on Language Resources and Evaluation, Marrakech, Morocco (2008).
Li, Y., Feng, W., Sun, J., Kong, F. and Zhou, G.: Building chinese discourse corpus with connective-driven dependency tree structure. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar (2014).
Prasad, R., Husain, S., Sharma, D. M. and Joshi, A.: Towards an annotated corpus of discourse relations in Hindi. Proceedings of the 6th Workshop on Asian Languae Resources (2008).
Rachakonda, R. T. and Sharma, D. M.: Creating an Annotated Tamil Corpus as a Discourse Resource. Proceedings of the Fifth Law Workshop, Portland, Oregon (2011).
Xue, N.: Annotating Discourse Connectives in the Chinese Treebank. Proceedings of the Workshop on Frontiers in Corpus Annotation II: Pie in the Sky (2005).
Zeyrek, D. and Webber, B.: A discourse resource for Turkish: annotating discourse connectives in the METU corpus. Proceedings of the 6th Workshop on Asian Languae Resources (2008).
Zhou, Y. and Xue, N.: PDTB-style discourse annotation of Chinese text. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea (2012).
Li, Y., Feng, W., Zhou, G. and Zhu, K.: Research of Chinese Clause Identificiton Based on Comma. Acta Scientiarum Naturalium Universitatis Pekinensis(Chinese), 49(1), 7-14 (2013).
Sun, J., Li, Y., Zhou, G. and Feng, W.: Research of Chinese Implicit Discourse Relation Recognition. Acta Scientiarum Naturalium Universitatis Pekinensis(Chinese), 50(1), 111-117 (2014).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Feng, W., Ren, H., Li, X., Guo, H. (2018). Building A Parallel Corpus with Bilingual Discourse Alignment. In: Wu, Y., Hong, JF., Su, Q. (eds) Chinese Lexical Semantics. CLSW 2017. Lecture Notes in Computer Science(), vol 10709. Springer, Cham. https://doi.org/10.1007/978-3-319-73573-3_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-73573-3_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73572-6
Online ISBN: 978-3-319-73573-3
eBook Packages: Computer ScienceComputer Science (R0)