Chinese Sentence Compression: Corpus and Evaluation

Zhang, Chunliang; Hu, Minghan; Xiao, Tong; Jiang, Xue; Shi, Lixin; Zhu, Jingbo

doi:10.1007/978-3-642-41491-6_24

Chunliang Zhang²³,
Minghan Hu²³,
Tong Xiao²³,
Xue Jiang²³,
Lixin Shi²³ &
…
Jingbo Zhu²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8202))

Included in the following conference series:

1626 Accesses
1 Citations

Abstract

In this paper we present a first-ever manually-built Chinese sentence compression corpus. Based on this corpus, we develop a Chinese sentence compression system and study various measures for evaluation of Chinese sentence compression. We find that 1) using multi-references is very helpful for automatic evaluation in Chinese sentence compression; and 2) besides relational F1, some machine translation evaluation measures are correlated well with human judgments and thus are very promising for future use in this task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bikel, D.M.: Intricacies of Collins’ Parsing Model. Computational Linguistics 30(4), 479–511 (2004)
Article MATH Google Scholar
Clarke, J., Lapata, M.: Global inference for sentence compression: An integer linear programming approach. Journal of Artificial Intelligence Research 1(31), 399–429 (2008)
Google Scholar
Clarke, J., Lapata, M.: Models for sentence compression: A comparison across domains, training requirements and evaluation measures. In: Proceedings of ACL-COLING, pp. 377–384 (2006b)
Google Scholar
Cohn, T., Lapata, M.: Sentence compression beyond word deletion. In: Proceedings of the 22nd COLING, pp. 137–144 (2009)
Google Scholar
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceeding of HLT 2002 Proceedings of the Second International Conference on Human Language Technology Research, pp. 138–145 (2002)
Google Scholar
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5), 378–382 (1971)
Article Google Scholar
Galley, M., McKeown, K.R.: Lexicalized Markov Grammars for Sentence Compression. In: Proceedings of HLT-NAACL, pp. 180–187 (2007)
Google Scholar
Jing, H.: Sentence Reduction for automatic summarization. In: Proceedings of ANLP, pp. 310–315 (2000)
Google Scholar
Knight, K., Marcu, D.: Statistical-based summarization-step one: sentence compression. In: Proceedings of AAAI 2000, pp. 703–710 (2000)
Google Scholar
Knight, K., Marcu, D.: Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artificial Intelligence 139(1), 91–107 (2002)
Article MathSciNet MATH Google Scholar
McDonald, R.: Discriminative sentence compression with soft syntactic constraints. In: Proceedings of EACL, pp. 297–304 (2006)
Google Scholar
Nießen, S., Och, F.J., Leusch, G., Ney, H.: An evaluation tool for machine translation: Fast evaluation for machine translation research. In: Proceedings of the Second International Conference on Language Resources and Evaluation (LREC), pp. 39–45 (2000)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a Method for Automatic Evaluation of Machine Translation. In: Proceedings of 40th Annual Meeting of ACL, pp. 311–318 (2002)
Google Scholar
Riezler, S., King, T.H., Crouch, R., Zaenen, A.: Statistical sentence condensation using ambiguity packing and stochastic disambiguation methods for lexical-functional grammar. In: Proceedings of HLT-NAACL, pp. 118–125 (2003)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A Study of Translation Edit Rate with Targeted Human Annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
Google Scholar
Tillmann, C., Vogel, S., Ney, H., Zubiaga, A., Sawaf, H.: Accelerated DP based search for statistical translation. In: Proceedings of European Conference on Speech Communication and Technology, pp. 2667–2670 (1997)
Google Scholar
Turian, J.P., Shen, L., Melamed, I.D.: Evaluation of Machine Translation and its Evaluation. In: Proceedings of MT Summit IX, pp. 386–393 (2003)
Google Scholar
Turner, J.P., Charniak, E.: Supervised and unsupervised learning for sentence compression. In: Proceedings of 43rd Annual Meeting of ACL, pp. 290–297 (2005)
Google Scholar
Xue, N., Xia, F., Chiou, F., Palmer, M.: The Penn Chinese TreeBank: Phrase structure annotation of a large corpus. Natural Language Engineering 11(2), 207–238 (2005)
Article Google Scholar
Yamangil, E., Nelken, R.: Mining Wikipedia revision histories for improving sentence compression. In: Proceedings of 46th Annual Meeting of ACL, pp. 137–140 (2008)
Google Scholar
Yamangil, E., Shieber, S.M.: Bayesian synchronous tree-substitution grammar induction and its application to sentence compression. In: Proceedings of 48th Annual Meeting of ACL, pp. 934–947 (2010)
Google Scholar
Yoshikawa, K., Iida, R., Hirao, T., Okumura, M.: Sentence Compression with Semantic Role Constraints. In: Proceedings of 50th Annual Meeting of ACL, pp. 349–353 (2012)
Google Scholar
Zhang, Y., Clark, S.: Syntactic Processing Using the Generalized Perceptron and Beam Search. Computational Linguistics 37(1), 105–151 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Natural Language Lab, Northeastern University, Shenyang, China, 110819
Chunliang Zhang, Minghan Hu, Tong Xiao, Xue Jiang, Lixin Shi & Jingbo Zhu

Authors

Chunliang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Minghan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Tong Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Xue Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Lixin Shi
View author publications
You can also search for this author in PubMed Google Scholar
Jingbo Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Maosong Sun
Horizon Doctoral Training Centre, School of Computer Science, University of Nottingham, NG8 1BB, Nottingham, UK
Min Zhang
Google Inc., Mountain View, CA, USA
Dekang Lin
Baidu Inc., Beijing, China
Haifeng Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, C., Hu, M., Xiao, T., Jiang, X., Shi, L., Zhu, J. (2013). Chinese Sentence Compression: Corpus and Evaluation. In: Sun, M., Zhang, M., Lin, D., Wang, H. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2013 2013. Lecture Notes in Computer Science(), vol 8202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41491-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-41491-6_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41490-9
Online ISBN: 978-3-642-41491-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics