Skip to main content

Using Parallel Corpora for Multilingual (Multi-document) Summarisation Evaluation

  • Conference paper
Multilingual and Multimodal Information Access Evaluation (CLEF 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6360))

Abstract

We are presenting a method for the evaluation of multilingual multi-document summarisation that allows saving precious annotation time and that makes the evaluation results across languages directly comparable. The approach is based on the manual selection of the most important sentences in a cluster of documents from a sentence-aligned parallel corpus, and by projecting the sentence selection to various target languages. We also present two ways of exploiting inter-annotator agreement levels, apply them both to a baseline sentence extraction summariser in seven languages, and discuss the result differences between the two evaluation versions, as well as a preliminary analysis between languages. The same method can in principle be used to evaluate single-document summarisers or information extraction tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Steinberger, R., Pouliquen, B., van der Goot, E.: An Introduction to the Europe Media Monitor Family of Applications. In: Information Access in a Multilingual World workshop at SIGIR, Boston, USA, pp. 1–8 (2009)

    Google Scholar 

  2. Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. In: X Machine Translation Summit, Phuket, Thailand, pp. 79–86 (2005)

    Google Scholar 

  3. Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., Varga, D.: The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In: LREC, Genova, Italy, pp. 24–26 (2006)

    Google Scholar 

  4. Steinberger, J., Ježek, K.: Update summarisation based on Latent Semantic Analysis. In: TSD, Pilsen, Czech Republic (2009)

    Google Scholar 

  5. Kanungo, T., Resnik, P.: The Bible, truth, and multilingual OCR evaluation. International Society for Optical Engineering, 86–96 (1999)

    Google Scholar 

  6. Koehn, P.: Europarl: A Multilingual Corpus for Evaluation of Machine Translation, unpublished draft (2002)

    Google Scholar 

  7. Van Zaanen, M., Roberts, A., Atwell, E.: A multilingual parallel parsed corpus as gold standard for grammatical inference evaluation. In: The Amazing Utility of Parallel and Comparable Corpora Workshop, pp. 58–61 (2004)

    Google Scholar 

  8. Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., et al.: MEAD-a platform for multidocument multilingual text summarisation. In: LREC, Lisbon, Portugal, pp. 86–96 (2004)

    Google Scholar 

  9. Lin, C., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: HLT-NAACL, Edmonton, Canada, pp. 71–78 (2003)

    Google Scholar 

  10. Hovy, E., Lin, C., Zhou, L.: Evaluating duc 2005 using basic elements. In: DUC 2005 (2005)

    Google Scholar 

  11. Nenkova, A., Passonneau, R.: Evaluating content selection in summarisation: The pyramid method. In: NAACL, Boston, USA (2004)

    Google Scholar 

  12. Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Computational Linguistics 19, 75–102 (1994)

    Google Scholar 

  13. Piskorski, J.: CORLEONE-Core Linguistic Entity Online Extraction. Technical report EUR 23393 EN, European Commission (2008)

    Google Scholar 

  14. Gong, Y., Liu, X.: Generic text summarisation using relevance measure and latent semantic analysis. In: ACM SIGIR, New Orleans, US, pp. 19–25

    Google Scholar 

  15. Steinberger, J., Ježek, K.: Text summarisation and singular value decomposition. In: Yakhno, T. (ed.) ADVIS 2004. LNCS, vol. 3261, pp. 245–254. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  16. Steinberger, J., Kabadjov, M., Pouliquen, B., Steinberger, R., Poesio, M.: WB-JRC-UT’s Participation in TAC 2009: Update summarisation and AESOP Tasks. In: TAC, NIST (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Turchi, M., Steinberger, J., Kabadjov, M., Steinberger, R. (2010). Using Parallel Corpora for Multilingual (Multi-document) Summarisation Evaluation. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds) Multilingual and Multimodal Information Access Evaluation. CLEF 2010. Lecture Notes in Computer Science, vol 6360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15998-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15998-5_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15997-8

  • Online ISBN: 978-3-642-15998-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics