Skip to main content

Coreference Applications to Summarization

  • Chapter
  • First Online:
Book cover Anaphora Resolution

Abstract

In this chapter we discuss the connection between anaphora/coreference resolution and summarization. The discussion follows the summarization framework based on Latent Semantic Analysis (LSA), however, the ideas can be applied to any sentence-scoring approach. After describing the ways of combining basic (lexical) features of the summarizer with those received from the coreference resolution system we try to answer the question whether coreference resolution helps to improve the quality of selected content even if coreference resolution systems are still far from perfect. Both single-document and multi-document summarization branches are discussed. Then we focus on post-processing techniques to improve the referential clarity and coherence of extracted summaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We rather use the term ‘coreference resolution’ as a more general term to anaphora resolution. However, when we discuss single-document summarization the term refers to the task of identifying successive mentions of the same discourse entity (intra-document coreference resolution/anaphora resolution), as opposed to the task of ‘inter-document coreference resolution’ appropriate in the case of multi-document summarization which involves collecting all information about an entity, including information expressed by appositions and other predicative constructions.

  2. 2.

    The National Institute of Standards and Technology (NIST) initiated the Document Understanding Conference (DUC) series [11] to evaluate automatic text summarization. Its goal is to further progress in summarization and enable researchers to participate in large-scale experiments. Since 2008 DUC has moved to TAC (Text Analysis Conference) [52] that follows the summarization evaluation roadmap with new or upgraded tracks.

  3. 3.

    When producing an update summary of a set of topic-related documents the summarizer assumes prior knowledge of the reader determined by a set of older documents of the same topic. The update summarizer thus must solve a novelty vs. redundancy problem.

  4. 4.

    In the aspect summarization scenario a given list of core information aspects for different event types should be addressed in the automatic summaries.

  5. 5.

    In many non-educational texts only an ‘entity-centered’ structure can be clearly identified, as opposed to a ‘relation-centered’ structure of the type hypothesized in Rhetorical Structures Theory and which serves as the basis for discourse structure-based summarization methods [21, 40].

  6. 6.

    The approach deals with object coreference and event coreference. They further consider the issue of referential relations beyond the identity relation covering a few domain-specific special cases.

  7. 7.

    Available as open source software at http://guitar-essex.sourceforge.net/.

  8. 8.

    This step includes heuristic methods for guessing agreement features.

  9. 9.

    All system summaries were truncated to 100 words as traditionally done in DUC. ROUGE version and settings:

    $$\displaystyle{\mathtt{ROUGEeval-1.4.2.pl-c95-m-n2-l100-s-24-aduc.xml.}}$$
  10. 10.

    The previous sentence in the source is: “Walton continued talking with customers during the concert.”

  11. 11.

    The multilingual named entity disambiguator and geo-tagger developed at the JRC have already been used for cross-lingual linking of multilingual news clusters produced by the EMM system [50].

  12. 12.

    The use of the multilingual tools in higher-level applications can be seen at http://emm.newsexplorer.eu/.

References

  1. Azzam, S., Humphreys, K., Gaizauskas, R.: Using coreference chains for text summarization. In: Proceedings of the ACL’99 Workshop on Conference and Its Applications, Baltimore. ACL (1999)

    Google Scholar 

  2. Baldwin, B., Morton, T.S.: Dynamic coreference-based summarization. In: Proceedings of EMNLP’98, Granada. ACL (1998)

    Google Scholar 

  3. Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Mani, I., Maybury, M. (eds.) Advances in Automated Text Summarization. MIT, Cambridge (1997)

    Google Scholar 

  4. Barzilay, R., Lapata, M.: Modeling local coherence: an entity-based approach. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor. ACL (2005)

    Google Scholar 

  5. Belz, A., Kow, E., Viethen, J.: The GREC named entity generation challenge 2009: overview and evaluation results. In: Proceedings of ACL-IJCNLP’09 Workshop on Language Generation and Summarisation, Singapore. ACL (2009)

    Google Scholar 

  6. Bergler, S., Witte, R., Khalife, M., Li, Z., Rudzicz, F.: Using knowledge-poor coreference resolution for text summarization. In: Proceedings of DUC’03, Edmonton. NIST (2003)

    Google Scholar 

  7. Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent IR. SIAM Rev. 37 (4), 573–595 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  8. Boguraev, B., Kennedy, C.: Salience-based content characterisation of text documents. In: Mani, I., Maybury, M. (eds.) Advances in Automated Text Summarization. MIT, Cambridge (1997)

    Google Scholar 

  9. Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of NAACL’00, Philadelphia. ACL (2000)

    Google Scholar 

  10. Choi, F.Y.Y., Wiemer-Hastings, P., Moore, J.D.: Latent semantic analysis for text segmentation. In Proceedings of EMNLP, Pittsburgh. ACL (2001)

    Google Scholar 

  11. Document understanding conference: http://duc.nist.gov/

  12. Edmundson, H.: New methods in automatic extracting. J. Assoc. Comput. Mach. 16 (2), 264–285 (1969). ACM

    Google Scholar 

  13. Giannakopoulos, G., El-Haj, M., Favre, B., Litvak, M., Steinberger, J., Varma, V.: TAC 2011 multiling pilot overview. In: Proceedings of the Text Analysis Conference 2011, Gaithersburg. NIST (2011)

    Google Scholar 

  14. Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, pp. 19–25. ACM (2001)

    Google Scholar 

  15. Grosz, B., Aravind, J., Scott, W.: Centering: a framework for modelling the local coherence of discourse. Comput. Linguist. 21 (2), 203–225. ACL (1995)

    Google Scholar 

  16. Hasler, L., Orasan, C., Mitkov, R.: Building better corpora for summarization. In: Proceedings of Corpus Linguistics, Lancaster. UCREL, Lancaster University (2003)

    Google Scholar 

  17. Hirschman, L.: MUC-7 coreference task definition, version 3.0. In: Proceedings of the 7th Message Understanding Conference, Fairfax. NIST (1998)

    Google Scholar 

  18. Hovy, E., Lin, C.: Automated text summarization in SUMMARIST. In: Mani, I., Maybury, M. (eds.) Advances in Automated Text Summarization. MIT, Cambridge (1997)

    Google Scholar 

  19. Kabadjov, M.: A comprehensive evaluation of anaphora resolution and discourse-new recognition. PhD thesis, University of Essex 2007

    Google Scholar 

  20. Kabadjov, M., Steinberger, J., Pouliquen, B., Steinberger, R., Poesio, M.: Multilingual statistical news summarisation: preliminary experiments with English. In: Proceedings of IAPWNC at the IEEE/WIC/ACM WI-IAT, Milano. IEEE Computer Society (2009)

    Google Scholar 

  21. Knott, A., Oberlander, J., O’Donnell, M., Mellish, C.: Beyond elaboration: the interaction of relations and focus in coherent text. In: Sanders, T., Schilperoord, J., Spooren, W. (eds.) Text Representation: Linguistic and Psycholinguistic Aspects. John Benjamins, Amsterdam/Philadelphia (2001)

    Google Scholar 

  22. Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, pp. 68–73. ACM (1995)

    Google Scholar 

  23. Landauer, C.T., Dumais, S.: A solution to plato’s problem: the latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychol. Rev. 104, 211–240 (1997). American Psychological Association

    Google Scholar 

  24. Lin, C., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of HLT-NAACL, Edmonton. ACL (2003)

    Book  Google Scholar 

  25. Lin, C.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, Barcelona. ACL (2004)

    Google Scholar 

  26. Luhn, H.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2 (2), 159–165 (1958). IBM

    Google Scholar 

  27. Mani, I. (ed.): Proceedings of the Workshop on Intelligent and Scalable Text Summarization at the Annual Joint Meeting of the ACL/EACL, Madrid. ACL (1997)

    Google Scholar 

  28. Mani, I., Firmin, T., House, D., Klein, G., Sundheim, B., Hirschman, L.: The TIPSTER summac text summarization evaluation. In: Proceedings of the 9th Meeting of the European Chapter of the Association for Computational Linguistics, Bergen. ACL (1999)

    Google Scholar 

  29. Mani, I., Maybury, M. (eds.): Advances in Automatic Text Summarization. MIT, Cambridge (1999)

    Google Scholar 

  30. Marcu, D.: From discourse structures to text summaries. In: Mani [27]

    Google Scholar 

  31. Maybury, M.: Generating summaries from event data. In: Mani and Maybury [29]

    Google Scholar 

  32. McKeown, K., Radev, D.: Generating summaries of multiple news articles. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, pp. 74–82. ACM (1995)

    Google Scholar 

  33. Morris, A., Kasper, G., Adams, D.: The effects and limitations of automatic text condensing on reading comprehension performance. Inf. Syst. Res. 3 (1), 17–35 (1992). INFORMS

    Google Scholar 

  34. Mueller, C., Strube, M.: MMAX: a tool for the annotation of multi-modal corpora. In: Proceedings of the 2nd IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems, Seattle. Morgan Kaufmann (2001)

    Google Scholar 

  35. Nenkova, A., Passonneau, R., McKeown, K.: The pyramid method: incorporating human content selection variation in summarization evaluation. ACM Trans. Speech Lang. Process. 4 (2), 1–23 (2007). ACM

    Google Scholar 

  36. Orasan, C., Mitkov, R., Hasler, L.: CAST: a computer-aided summarization tool. In: Proceedings of EACL’03, Budapest. ACL (2003)

    Google Scholar 

  37. Over, P., Dang, H., Harman, D.: DUC in context. Inf. Process. Manag. 43 (6), 1506–1520 (2007). Special Issue on Text Summarisation (Donna Harman, ed.). Elsevier.

    Google Scholar 

  38. Pitler, E., Louis, A., Nenkova, A.: Automatic evaluation of linguistic quality in multi-document summarization. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, pp 544–554. ACL (2010)

    Google Scholar 

  39. Poesio, M., Kabadjov, M.: A general-purpose, off-the-shelf anaphora resolution module: implementation and preliminary evaluation. In: Proceedings of LREC, Lisbon. ELRA (2004)

    Google Scholar 

  40. Poesio, M., Stevenson, R., Di Eugenio, B., Hitzeman, J.: Centering: a parametric theory and its instantiations. Comput. Linguist. 30 (3), 309–363 (2004). ACL

    Google Scholar 

  41. Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Blackler, K., Fuart, F., Zaghouani, W., Widiger, A., Forslund, A., Best, C.: Geocoding multilingual texts: recognition, disambiguation and visualisation. In: Proceedings of the 5th LREC, Genoa, pp. 53–58. ELRA (2006)

    Google Scholar 

  42. Pouliquen, B., Steinberger, R.: Automatic construction of multilingual name dictionaries. In: Goutte, C., Cancedda, N., Dymetman, M., Foster, G. (eds.) Learning Machine Translation. MIT, Cambridge (2009)

    Google Scholar 

  43. Radev, D., Jing, H., Budzikowska, M.: Centroid-based summarization of multiple documents. In: ANLP/NAACL Workshop on Automatic Summarization, Seattle. ACL (2000)

    Google Scholar 

  44. Radev, D., Teufel, S., Saggion, H., Lam, W., Blitzer, J., Qi, H., Celebi, A., Liu, D., Drabek, E.: Evaluation challenges in large-scale document summarization. In: Proceeding of the 41st meeting of the Association for Computational Linguistics, Sapporo. ACL (2003)

    Google Scholar 

  45. Sparck-Jones, K.: Automatic summarising: factors and directions. In: Mani and Maybury [29]

    Google Scholar 

  46. Steinberger, J., Ježek, K.: Text summarization and singular value decomposition. Lect. Notes Comput. Sci. 2457, 245–254 (2004). Springer

    Google Scholar 

  47. Steinberger, J., Kabadjov, M., Poesio, M.: Improving LSA-based summarization with anaphora resolution. In: Proceedings of HLT/EMNLP’05, Vancouver, pp. 1–8. ACL (2005)

    Google Scholar 

  48. Steinberger, J., Kabadjov, M., Pouliquen, B., Steinberger, R., Poesio, M.: WB-JRC-UT’s participation in TAC 2009: update summarization and AESOP tasks. In: Proceedings of TAC’09, Gaithersburg. NIST (2009)

    Google Scholar 

  49. Steinberger, J., Poesio, M., Kabadjov, M., Ježek, K.: Two uses of anaphora resolution in summarization. Inf. Process. Manag. 43 (6), 1663–1680 (2007). Elsevier

    Google Scholar 

  50. Steinberger, R., Pouliquen, B., Ignat, C.: Using language-independent rules to achieve high multilinguality in text mining. In: Fogelman-Soulié, F., Perrotta, D., Piskorski, J., Steinberger, R. (eds.) Mining Massive Data Sets for Security. IOS-Press, Amsterdam (2009)

    Google Scholar 

  51. Stuckardt, R.: Coreference-based summarization and question answering: a case for high precision anaphor resolution. In: International Symposium on Reference Resolution and Its Applications to Question Answering and Summarization, Venice (2003)

    Google Scholar 

  52. Text analysis conference: http://www.nist.gov/tac

  53. Turchi, M., Steinberger, J., Kabadjov, M., Steinberger, R.: Using parallel corpora for multilingual (multi-document) summarisation evaluation. In: Proceedings of CLEF-10, pp. 52–63. Springer (2010)

    Google Scholar 

  54. Wan, X., Li, H., Xiao, J.: Cross-language document summarization based on machine translation quality prediction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 917–926. ACL (2010)

    Google Scholar 

Download references

Acknowledgements

This work was supported by project “NTIS - New Technologies for Information Society”, European Centre of Excellence, CZ.1.05/1.1.00/02.0090, and by project MediaGist, EU’s FP7 People Programme (Marie Curie Actions), no. 630786.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mijail Kabadjov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Steinberger, J., Kabadjov, M., Poesio, M. (2016). Coreference Applications to Summarization. In: Poesio, M., Stuckardt, R., Versley, Y. (eds) Anaphora Resolution. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-47909-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-47909-4_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-47908-7

  • Online ISBN: 978-3-662-47909-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics