Building A Parallel Corpus with Bilingual Discourse Alignment

Feng, Wenhe; Ren, Han; Li, Xia; Guo, Haifang

doi:10.1007/978-3-319-73573-3_34

Wenhe Feng¹⁶,
Han Ren¹⁷,
Xia Li¹⁷ &
…
Haifang Guo¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10709))

Included in the following conference series:

Workshop on Chinese Lexical Semantics

1695 Accesses

Abstract

This paper describes a discourse resource, namely a Chinese-English parallel corpus, based on the idea of bilingual discourse alignment. We introduce a bilingual collaborative annotation approach, which annotates English discourse units based on Chinese ones, and annotates Chinese discourse structure based on English ones subsequently. Such approach can ensure full discourse structure alignment between parallel texts, and reduce cost for annotating texts of two languages as well. Annotation Evaluation of the parallel corpus justifies the appropriateness of the discourse alignment framework to parallel texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Guzman, F., Joty, S., Marquez, L. ı. and Nakov, P.: Using discourse structure improves machine translation evaluation. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA (2014).
Google Scholar
Saint-Dizier, P.: Emerging applications of natural language processing: concepts and new research, Chapter 28. IGI Global (2013).
Google Scholar
Ghorbel, H.: Experiments in cross-lingual sentiment analysis in discussion forums Proceedings of 4th SocInfo conference, Lausanne, Switzerland (2012).
Google Scholar
Koehn, P.: Europarl: a parallel corpus for statistical machine translation. Proceedings of the 10th Machine Translation Summit, Phuket, Thailand (2005).
Google Scholar
Ralf, R. S., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufiş, D., et al.: The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages. Proceedings of the 5th International Conference on Language Resources and Evaluation, Genoa, Italy (2006).
Google Scholar
Tian, L., Wong, D. F., Chao, L. S., Quaresma, P., Oliveira, F., Lu, Y., et al.: UM-Corpus: A large english-chinese parallel corpus for statistical machine translation. Proceedings of the Ninth International Conference on Language Resources and Evaluation, Reykjavik, Iceland (2014).
Google Scholar
Carlson, L., Marcu, D. and Okurowski, M. E.: Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. Proceedings of the Second SIGdial Workshop on Discourse and Dialogue, Aalborg, Denmark (2001).
Google Scholar
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., et al.: The Penn discourse treebank 2.0. Proceedings of the Sixth International Conference on Language Resources and Evaluation, Marrakech, Morocco (2008).
Google Scholar
Li, Y., Feng, W., Sun, J., Kong, F. and Zhou, G.: Building chinese discourse corpus with connective-driven dependency tree structure. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar (2014).
Google Scholar
Prasad, R., Husain, S., Sharma, D. M. and Joshi, A.: Towards an annotated corpus of discourse relations in Hindi. Proceedings of the 6th Workshop on Asian Languae Resources (2008).
Google Scholar
Rachakonda, R. T. and Sharma, D. M.: Creating an Annotated Tamil Corpus as a Discourse Resource. Proceedings of the Fifth Law Workshop, Portland, Oregon (2011).
Google Scholar
Xue, N.: Annotating Discourse Connectives in the Chinese Treebank. Proceedings of the Workshop on Frontiers in Corpus Annotation II: Pie in the Sky (2005).
Google Scholar
Zeyrek, D. and Webber, B.: A discourse resource for Turkish: annotating discourse connectives in the METU corpus. Proceedings of the 6th Workshop on Asian Languae Resources (2008).
Google Scholar
Zhou, Y. and Xue, N.: PDTB-style discourse annotation of Chinese text. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea (2012).
Google Scholar
Li, Y., Feng, W., Zhou, G. and Zhu, K.: Research of Chinese Clause Identificiton Based on Comma. Acta Scientiarum Naturalium Universitatis Pekinensis(Chinese), 49(1), 7-14 (2013).
Google Scholar
Sun, J., Li, Y., Zhou, G. and Feng, W.: Research of Chinese Implicit Discourse Relation Recognition. Acta Scientiarum Naturalium Universitatis Pekinensis(Chinese), 50(1), 111-117 (2014).
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, Wuhan University, Wuhan, 430072, China
Wenhe Feng
Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou, 510420, China
Han Ren & Xia Li
Henan Institute of Science and Technology, Xinxiang, 453003, Henan, China
Haifang Guo

Authors

Wenhe Feng
View author publications
You can also search for this author in PubMed Google Scholar
Han Ren
View author publications
You can also search for this author in PubMed Google Scholar
Xia Li
View author publications
You can also search for this author in PubMed Google Scholar
Haifang Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Han Ren .

Editor information

Editors and Affiliations

Peking University , Beijing, China
Yunfang Wu
National Taiwan Normal University , Taipei, Taiwan
Jia-Fei Hong
Peking University , Beijing, China
Qi Su

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, W., Ren, H., Li, X., Guo, H. (2018). Building A Parallel Corpus with Bilingual Discourse Alignment. In: Wu, Y., Hong, JF., Su, Q. (eds) Chinese Lexical Semantics. CLSW 2017. Lecture Notes in Computer Science(), vol 10709. Springer, Cham. https://doi.org/10.1007/978-3-319-73573-3_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-73573-3_34
Published: 20 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73572-6
Online ISBN: 978-3-319-73573-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics