Using Parallel Corpora for Multilingual (Multi-document) Summarisation Evaluation

Turchi, Marco; Steinberger, Josef; Kabadjov, Mijail; Steinberger, Ralf

doi:10.1007/978-3-642-15998-5_7

Marco Turchi²¹,
Josef Steinberger²¹,
Mijail Kabadjov²¹ &
…
Ralf Steinberger²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6360))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

651 Accesses
11 Citations

Abstract

We are presenting a method for the evaluation of multilingual multi-document summarisation that allows saving precious annotation time and that makes the evaluation results across languages directly comparable. The approach is based on the manual selection of the most important sentences in a cluster of documents from a sentence-aligned parallel corpus, and by projecting the sentence selection to various target languages. We also present two ways of exploiting inter-annotator agreement levels, apply them both to a baseline sentence extraction summariser in seven languages, and discuss the result differences between the two evaluation versions, as well as a preliminary analysis between languages. The same method can in principle be used to evaluate single-document summarisers or information extraction tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Steinberger, R., Pouliquen, B., van der Goot, E.: An Introduction to the Europe Media Monitor Family of Applications. In: Information Access in a Multilingual World workshop at SIGIR, Boston, USA, pp. 1–8 (2009)
Google Scholar
Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. In: X Machine Translation Summit, Phuket, Thailand, pp. 79–86 (2005)
Google Scholar
Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., Varga, D.: The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In: LREC, Genova, Italy, pp. 24–26 (2006)
Google Scholar
Steinberger, J., Ježek, K.: Update summarisation based on Latent Semantic Analysis. In: TSD, Pilsen, Czech Republic (2009)
Google Scholar
Kanungo, T., Resnik, P.: The Bible, truth, and multilingual OCR evaluation. International Society for Optical Engineering, 86–96 (1999)
Google Scholar
Koehn, P.: Europarl: A Multilingual Corpus for Evaluation of Machine Translation, unpublished draft (2002)
Google Scholar
Van Zaanen, M., Roberts, A., Atwell, E.: A multilingual parallel parsed corpus as gold standard for grammatical inference evaluation. In: The Amazing Utility of Parallel and Comparable Corpora Workshop, pp. 58–61 (2004)
Google Scholar
Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., et al.: MEAD-a platform for multidocument multilingual text summarisation. In: LREC, Lisbon, Portugal, pp. 86–96 (2004)
Google Scholar
Lin, C., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: HLT-NAACL, Edmonton, Canada, pp. 71–78 (2003)
Google Scholar
Hovy, E., Lin, C., Zhou, L.: Evaluating duc 2005 using basic elements. In: DUC 2005 (2005)
Google Scholar
Nenkova, A., Passonneau, R.: Evaluating content selection in summarisation: The pyramid method. In: NAACL, Boston, USA (2004)
Google Scholar
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Computational Linguistics 19, 75–102 (1994)
Google Scholar
Piskorski, J.: CORLEONE-Core Linguistic Entity Online Extraction. Technical report EUR 23393 EN, European Commission (2008)
Google Scholar
Gong, Y., Liu, X.: Generic text summarisation using relevance measure and latent semantic analysis. In: ACM SIGIR, New Orleans, US, pp. 19–25
Google Scholar
Steinberger, J., Ježek, K.: Text summarisation and singular value decomposition. In: Yakhno, T. (ed.) ADVIS 2004. LNCS, vol. 3261, pp. 245–254. Springer, Heidelberg (2004)
Chapter Google Scholar
Steinberger, J., Kabadjov, M., Pouliquen, B., Steinberger, R., Poesio, M.: WB-JRC-UT’s Participation in TAC 2009: Update summarisation and AESOP Tasks. In: TAC, NIST (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

European Commission - Joint Research Centre (JRC), IPSC - GlobSec, Via Fermi 2749, 21027, Ispra (VA), Italy
Marco Turchi, Josef Steinberger, Mijail Kabadjov & Ralf Steinberger

Authors

Marco Turchi
View author publications
You can also search for this author in PubMed Google Scholar
Josef Steinberger
View author publications
You can also search for this author in PubMed Google Scholar
Mijail Kabadjov
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Steinberger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Via Gradenigo 6/a, 35131, Padova, Italy
Maristella Agosti
University of Padua, Padua, Italy
Nicola Ferro
ISTI-CNR, Area Ricerca CNR, Via Moruzzi, 1, 56124, Pisa, Italy
Carol Peters
ISLA, University of Amsterdam, Amsterdam, The Netherlands
Maarten de Rijke
Dublin City University, Dublin, Ireland
Alan Smeaton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Turchi, M., Steinberger, J., Kabadjov, M., Steinberger, R. (2010). Using Parallel Corpora for Multilingual (Multi-document) Summarisation Evaluation. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds) Multilingual and Multimodal Information Access Evaluation. CLEF 2010. Lecture Notes in Computer Science, vol 6360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15998-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-15998-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15997-8
Online ISBN: 978-3-642-15998-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics