Advertisement

Summary Evaluation: Together We Stand NPowER-ed

  • George Giannakopoulos
  • Vangelis Karkaletsis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7817)

Abstract

Summary evaluation has been a distinct domain of research for several years. Human summary evaluation appears to be a high-level cognitive process and, thus, difficult to reproduce. Even though several automatic evaluation methods correlate well to human evaluations over systems, we fail to get equivalent results when judging individual summaries. In this work, we propose the NPowER evaluation method based on machine learning and a set of methods from the family of “n-gram graph”-based summary evaluation methods. First, we show that the combined, optimized use of the evaluation methods outperforms the individual ones. Second, we compare the proposed method to a combination of ROUGE metrics. Third, we study and discuss what can make future evaluation measures better, based on the results of feature selection. We show that we can easily provide per summary evaluations that are far superior to existing performance of evaluation systems and face different measures under a unified view.

Keywords

Natural Language Processing Information Gain Machine Translation Computational Linguistics Automatic Evaluation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), pp. 25–26 (2004)Google Scholar
  2. 2.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)Google Scholar
  3. 3.
    Dang, H.T.: Overview of DUC 2005. In: Proceedings of the Document Understanding Conf. Wksp. (DUC 2005) at the Human Language Technology Conf./Conf. on Empirical Methods in Natural Language Processing, HLT/EMNLP 2005 (2005)Google Scholar
  4. 4.
    Dang, H.T., Owczarzak, K.: Overview of the TAC 2008 update summarization task. In: TAC 2008 Workshop - Notebook Papers and Results, Maryland MD, USA, pp. 10–23 (2008)Google Scholar
  5. 5.
    Conroy, J.M., Dang, H.T.: Mind the gap: Dangers of divorcing evaluations of summary content from linguistic quality. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, Coling 2008 Organizing Committee, pp. 145–152 (2008)Google Scholar
  6. 6.
    Rankel, P., Conroy, J., Schlesinger, J.: Better metrics to automatically predict the quality of a text summary. Algorithms 5, 398–420 (2012)CrossRefGoogle Scholar
  7. 7.
    Giannakopoulos, G., El-Haj, M., Favre, B., Litvak, M., Steinberger, J., Varma, V.: TAC 2011 MultiLing pilot overview. In: TAC 2011 Workshop, Maryland MD, USA (2011)Google Scholar
  8. 8.
    Owczarzak, K., Conroy, J., Dang, H., Nenkova, A.: An assessment of the accuracy of automatic evaluation in summarization. In: NAACL-HLT 2012, p. 1 (2012)Google Scholar
  9. 9.
    Mani, I., Bloedorn, E.: Multi-document summarization by graph search and matching. In: Proceedings of AAAI 1997, pp. 622–628. AAAI (1997)Google Scholar
  10. 10.
    Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study: Final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, vol. 1998 (1998)Google Scholar
  11. 11.
    Van Halteren, H., Teufel, S.: Examining the consensus between human summaries: Initial experiments with factoid analysis. In: Proceedings of the HLT-NAACL 2003 on Text Summarization Workshop, vol. 5, pp. 57–64. Association for Computational Linguistics, Morristown (2003)CrossRefGoogle Scholar
  12. 12.
    Lin, C.Y., Hovy, E.: Manual and automatic evaluation of summaries. In: Proceedings of the ACL 2002 Workshop on Automatic Summarization, vol. 4, pp. 45–51. Association for Computational Linguistics, Morristown (2002)CrossRefGoogle Scholar
  13. 13.
    Jones, K.S.: Automatic summarising: The state of the art. Information Processing & Management, Text Summarization 43, 1449–1481 (2007)CrossRefGoogle Scholar
  14. 14.
    Baldwin, B., Donaway, R., Hovy, E., Liddy, E., Mani, I., Marcu, D., McKeown, K., Mittal, V., Moens, M., Radev, D.: Others: An evaluation roadmap for summarization research. Technical report (2000)Google Scholar
  15. 15.
    Nenkova, A.: Understanding the Process of Multi-Document Summarization: Content Selection, Rewriting and Evaluation. PhD thesis (2006)Google Scholar
  16. 16.
    Radev, D.R., Jing, H., Budzikowska, M.: Centroid-based summarization of multiple documents: Sentence extraction, utility-based evaluation, and user studies. In: ANLP/NAACL Workshop on Summarization (2000)Google Scholar
  17. 17.
    Marcu, D.: Theory and Practice of Discourse Parsing and Summarization, The. The MIT Press (2000)Google Scholar
  18. 18.
    Saggion, H., Lapalme, G.: Generating indicative-informative summaries with sumum. Computational Linguistics 28, 497–526 (2002)CrossRefGoogle Scholar
  19. 19.
    Passonneau, R.J., McKeown, K., Sigelman, S., Goodkind, A.: Applying the pyramid method in the 2006 document understanding conference. In: Proceedings of Document Understanding Conference (DUC) Workshop 2006 (2006)Google Scholar
  20. 20.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2001)Google Scholar
  21. 21.
    Hovy, E., Lin, C.Y., Zhou, L., Fukumoto, J.: Basic elements (2005)Google Scholar
  22. 22.
    Hovy, E., Lin, C.Y., Zhou, L., Fukumoto, J.: Automated summarization evaluation with basic elements. In: Proceedings of the Fifth Conference on Language Resources and Evaluation, LREC (2006)Google Scholar
  23. 23.
    Owczarzak, K.: Depeval (summ): dependency-based evaluation for automatic summaries. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1, pp. 190–198. Association for Computational Linguistics (2009)Google Scholar
  24. 24.
    Giannakopoulos, G., Karkaletsis, V., Vouros, G., Stamatopoulos, P.: Summarization system evaluation revisited: N-gram graphs. ACM Trans. Speech Lang. Process. 5, 1–39 (2008)CrossRefGoogle Scholar
  25. 25.
    Giannakopoulos, G., Karkaletsis, V.: Summarization system evaluation variations based on n-gram graphs. In: TAC 2010 Workshop, Maryland MD, USA (2010)Google Scholar
  26. 26.
    Schilder, F., Kondadadi, R.: A metric for automatically evaluating coherent summaries via context chains. In: IEEE International Conference on Semantic Computing, ICSC 2009, pp. 65–70 (2009)Google Scholar
  27. 27.
    Conroy, J., Schlesinger, J., O’Leary, D.: Nouveau-rouge: A novelty metric for update summarization. Computational Linguistics 37, 1–8 (2011)CrossRefGoogle Scholar
  28. 28.
    Amigó, E., Gonzalo, J., Verdejo, F.: The heterogeneity principle in evaluation measures for automatic summarization. In: Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization, pp. 36–43. Association for Computational Linguistics, Stroudsburg (2012)Google Scholar
  29. 29.
    Louis, A., Nenkova, A.: Automatically evaluating content selection in summarization without human models. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 306–314. Association for Computational Linguistics (2009)Google Scholar
  30. 30.
    Saggion, H., Torres-Moreno, J., Cunha, I., SanJuan, E.: Multilingual summarization evaluation without human models. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1059–1067. Association for Computational Linguistics (2010)Google Scholar
  31. 31.
    Vadlapudi, R., Katragadda, R.: Quantitative evaluation of grammaticality of summaries. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 736–747. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  32. 32.
    Lloret, E., Palomar, M.: Text summarisation in progress: a literature review. Artificial Intelligence Review (2011)Google Scholar
  33. 33.
    Pitler, E., Louis, A., Nenkova, A.: Automatic evaluation of linguistic quality in multi-document summarization. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 544–554. Association for Computational Linguistics (2010)Google Scholar
  34. 34.
    Menard, S.: Applied logistic regression analysis, vol. 106. Sage Publications, Incorporated (2001)Google Scholar
  35. 35.
    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. Software 80, 604–611 (2001), http://www.Csie.Ntu.Edu.Tw/cjlin/libsvm Google Scholar
  36. 36.
    Akaike, H.: Likelihood of a model and information criteria. Journal of Econometrics 16, 3–14 (1981)zbMATHCrossRefGoogle Scholar
  37. 37.
    Witten, I.H., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.J.: Weka: Practical machine learning tools and techniques with java implementations. In: ICONIP/ANZIIS/ANNES, pp. 192–196 (1999)Google Scholar
  38. 38.
    Spearman, C.: Footrule for measuring correlation. British Journal of Psychology 2, 89–108 (1906)Google Scholar
  39. 39.
    Kendall, M.G.: Rank Correlation Methods. Hafner New York (1962)Google Scholar
  40. 40.
    Team, R.C.: R: A Language and Environment for Statistical Computing. In: R Foundation for Statistical Computing, Vienna, Austria (2012) ISBN 3-900051-07-0Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • George Giannakopoulos
    • 1
  • Vangelis Karkaletsis
    • 1
  1. 1.Institute of Informatics and TelecommunicationsNCSR DemokritosAghia ParaskeviGreece

Personalised recommendations