Skip to main content

Deeper Summarisation: The Second Time Around

An Overview and Some Practical Suggestions

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9624))

  • 1160 Accesses

Abstract

This paper advocates deeper summarisation methods: methods that are closer to text understanding; methods that manipulate intermediate semantic representations. As a field, we are not yet in a position to create these representations perfectly, but I still believe that now is a good time to be a bit more ambitious again in our goals for summarisation. I think that a summariser should be able to provide some form of explanation for the summary it just created; and if we want those types of summarisers, we will have to start manipulating semantic representations.

Considering the state of the art in NLP in 2016, I believe that the field is ready for a second attempt at going deeper in summarisation. We NLP folk have come a long way since the days of early AI research. Twenty-five years of statistical research in NLP have given us more robust, more informative processing of many aspects of semantics – such as semantic similarity and relatedness between words (and maybe larger things), semantic role labelling, co-reference resolution, and sentiment detection. Now, with these new tools under our belt, we can try again to create the right kind of intermediate representations for summarisation, and then do something exciting with them. Of course, exactly how is a very big question. In this opinion paper, I will bring forward some suggestions, by taking a second look at historical summarisation models from the era of Strong AI. These may have been over-ambitious back then, but people still talk about them now because of their explanatory power: they make statements about which meaning units in a text are always important, and why.

I will discuss two 1980s models for text understanding and summarisation (Wendy Lehnert’s Plot Units, and Kintsch and van Dijk’s memory-restricted discourse structure), both of which have recently been revived by their first modern implementations. The implementation of Plot Unit-style affect analysis is by Goyal et al. (2013), the KvD implementation is by my student Yimai Fang, using a new corpus of language learner texts (Fang and Teufel 2014). Looking at those systems, I will argue that even an imperfect deeper summariser is exciting news.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.cl.cam.ac.uk/~smb89/form.html.

References

  • Androutsopoulos, I., Malakasiotis, P.: A survey of paraphrasing and textual entailment methods. J. Artif. Intell. Res. 38, 135–187 (2010)

    MATH  Google Scholar 

  • Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley framenet project. In: Proceedings COLING, pp. 86–90 (1998)

    Google Scholar 

  • Baroni, M., Lenci, A.: Distributional memory: a general framework for corpus-based semantics. Comput. Linguist. 36(4), 673–721 (2010)

    Article  Google Scholar 

  • Barzilay, R., Lee, L.: Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In: Proceedings of HLT, pp. 16–23 (2003)

    Google Scholar 

  • Bauer, S., Teufel, S.: A methodology for evaluating timeline generation algorithms based on deep semantic units. In: Proceedings of ACL, p. 834 (2015)

    Google Scholar 

  • Björkelund, A., Farkas, R.: Data-driven multilingual coreference resolution using resolver stacking. In: Joint Conference on EMNLP and CoNLL-Shared Task, pp. 49–55 (2012)

    Google Scholar 

  • Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR, pp. 335–336 (1998)

    Google Scholar 

  • Chambers, N., Jurafsky, D.: Unsupervised learning of narrative schemas and their participants. In: Proceedings of ACL, pp. 602–610 (2009)

    Google Scholar 

  • Chambers, N., Jurafsky, D.: Unsupervised learning of narrative event chains. In: Proceedings of ACL, pp. 789–797 (2008)

    Google Scholar 

  • Cohen, R.: A computational theory of the function of clue words in argument understanding. In: Proceedings of COLING, pp. 251–255 (1984)

    Google Scholar 

  • Copestake, A.: Slacker semantics: why superficiality, dependency and avoidance of commitment can be the right way to go. In: Proceedings of EACL, pp. 1–9 (2009)

    Google Scholar 

  • DeJong, G.F.: An overview of the FRUMP system. In: Lehner, W.G., Ringle, M.H. (eds.) Strategies for Natural Language Processing, chap. 5. Lawrence Erlbaum, Hillsdale (1982)

    Google Scholar 

  • Dorr, B.J., Zajic, D., Schwartz, R.: Hedge: a parse-and-trim approach to headline generation. In: Proceedings of the HLT Text Summarization Workshop, pp. 1–8 (2003)

    Google Scholar 

  • Elson, D.K., McKeown, K.: Extending and evaluating a platform for story understanding. In: AAAI Spring Symposium: Intelligent Narrative Technologies II, pp. 32–35 (2009)

    Google Scholar 

  • Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)

    Google Scholar 

  • Fang, Y., Teufel, S.: A summariser based on human memory limitations and lexical competition. In: Proceedings of EACL, pp. 732–741 (2014)

    Google Scholar 

  • Farrow, E., Dickinson, T., Aylett, M.P.: Generating narratives from personal digital data: using sentiment, themes, and named entities to construct stories. In: Abascal, J., Barbosa, S., Fetter, M., Gross, T., Palanque, P., Winckler, M. (eds.) INTERACT 2015. LNCS, vol. 9299, pp. 473–477. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22723-8_41

    Chapter  Google Scholar 

  • Galley, M., McKeown, K., Fosler-Lussier, E., Jing, H.: Discourse segmentation of multi-party conversation. In: Proceedings of ACL, pp. 562–569 (2003)

    Google Scholar 

  • Gay, L.R., Mills, G.E., Airasian, P.W.: Educational Research: Competencies for Analysis and Application. Merrill, Columbus (1976)

    Google Scholar 

  • Goldberg, A.B., Fillmore, N., Andrzejewski, D., Xu, Z., Gibson, B., Zhu, X.: May all your wishes come true: a study of wishes and how to recognize them. In: Proceedings of HLT/NAACL, pp. 263–271 (2009)

    Google Scholar 

  • Goyal, A., Riloff, E., Daume III, H.: A computational model for plot units. Computational Intelligence 29(3), 466–488 (2013)

    Article  MathSciNet  Google Scholar 

  • Hahn, U., Reimer, U.: Computing text constituency: an algorithmic approach to the generation of text graphs. In: Proceedings of SIGIR, pp. 343–368 (1984)

    Google Scholar 

  • van Halteren, H., Teufel, S.: Examining the consensus between human summaries: initial experiments with factoid analysis. In: Proceedings of the HLT Text Summarization Workshop (2003)

    Google Scholar 

  • Kasch, N., Oates, T.: Mining script-like structures from the web. In: Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pp. 34–42 (2010)

    Google Scholar 

  • Kintsch, W., van Dijk, T.A.: Toward a model of text comprehension and production. Psychol. Rev. 85(5), 363–394 (1978)

    Article  Google Scholar 

  • Knight, K., Marcu, D.: Statistics-based summarization – step one: sentence compression. In: Proceeding of AAAI-2000, pp. 703–710 (2000)

    Google Scholar 

  • Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211 (1997)

    Article  Google Scholar 

  • Lave, J.: Cognition in Practice: Mind, Mathematics and Culture in Everyday Life. Cambridge University Press, Cambridge (1988)

    Book  Google Scholar 

  • Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, pp. 28–34 (2011)

    Google Scholar 

  • Lehnert, W.G.: Plot units: a narrative summarisation strategy. In: Lehnert, W.G., Ringle, M.H. (eds.) Strategies for Natural Language Processing, chap. 4, pp. 223–244. Lawrence Erlbaum, Hillsdale (1981a)

    Google Scholar 

  • Lehnert, W.G.: Plot units and narrative summarization. Cogn. Sci. 4, 293–331 (1981)

    Article  Google Scholar 

  • Lehnert, W.G., Dyer, M.G., Johnson, P.N., Yang, C., Harley, S.: BORISan experiment in in-depth understanding of narratives. Artif. Intell. 20(1), 15–62 (1983)

    Article  Google Scholar 

  • Lin, D.: Using collocation statistics in information extraction. In: Proceedings of ACL/COLING 1998, Montreal, Canada (1998)

    Google Scholar 

  • Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Proceedings of Workshop “Text Summarization Branches Out” at ACL 2004 (2004)

    Google Scholar 

  • Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)

    Article  MathSciNet  Google Scholar 

  • McKeown, K., Robin, J., Kukich, K.: Generating concise natural language summaries. Inf. Process. Manag. 31(5), 703–733 (1995)

    Article  Google Scholar 

  • Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Proceedings of EMNLP (2004)

    Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  • Moens, M.F., Angheluta, R., De Busser, R.: Summarization of texts found on the world wide web. In: Abramowicz, W. (ed.) Knowledge-Based Information Retrieval and Filtering from the Web, vol. 746, pp. 101–120. Springer, Boston (2003). https://doi.org/10.1007/978-1-4757-3739-4_5

    Chapter  Google Scholar 

  • Nenkova, A., Passonneau, R.J.: Evaluating content selection in summarization: the pyramid method. In: Proceedings of NAACL/HLT 2004, Boston, MA (2004)

    Google Scholar 

  • Nishikawa, H., Arita, K., Tanaka, K., Hirao, T., Makino, T., Matsuo, Y.: Learning to generate coherent summary with discriminative hidden semi-Markov model. In: Proceedings of COLING, pp. 1648–1659 (2014)

    Google Scholar 

  • Paivio, A.: Mental Representations. Oxford University Press, Oxford (1990)

    Book  Google Scholar 

  • Pustejovsky, J., Castano, J.M., Ingria, R., Sauri, R., Gaizauskas, R.J., Setzer, A., Katz, G., Radev, D.R.: TimeML: robust specification of event and temporal expressions in text. In: New Directions in Question Answering, vol. 3, pp. 28–34 (2003)

    Google Scholar 

  • Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., Dimitrov, S., Drabek, E., Hakim, A., Lam, W., Liu, D., et al.: MEAD - a platform for multidocument multilingual text summarization. In: Proceedings of LREC (2004)

    Google Scholar 

  • Radev, D.R., McKeown, K.R.: Generating natural language summaries from multiple on-line sources. Comput. Linguist. 24(3), 469–500 (1998)

    Google Scholar 

  • Reiter, E., Sripada, S., Hunter, J., Yu, J., Davy, I.: Choosing words in computer-generated weather forecasts. Artif. Intell. 167(1), 137–169 (2005)

    Article  Google Scholar 

  • Rumelhart, D.E.: Understanding and summarizing brief stories. In: Laberge, D., Samuels, S. (eds.) Basic Processes in Reading, Perception and Comprehension. Lawrence Erlbaum, Hillsdale (1977)

    Google Scholar 

  • Saurí, R., Littman, J., Gaizauskas, R., Setzer, A., Pustejovsky, J.: TimeML Annotation Guidelines, Version 1.2.1 (2006)

    Google Scholar 

  • Schank, R.C.: Conceptual Information Processing. North-Holland, Amsterdam (1975)

    MATH  Google Scholar 

  • Schank, R.C., Abelson, R.P.: Scripts, Goals, Plans and Understanding. Lawrence Erlbaum, Hillsdale (1977)

    MATH  Google Scholar 

  • Stanovsky, G., Ficler, J., Dagan, I., Goldberg, Y.: Intermediary semantic representation through proposition structures. In: Proceedings of ACL, p. 66 (2014)

    Google Scholar 

  • Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Handbook of Latent Semantic Analysis, vol. 427, no. 7, pp. 424–440 (2007)

    Google Scholar 

  • Takamura, H., Inui, T., Okumura, M.: Extracting semantic orientations of phrases from dictionary. In: Proceedings of HLT-NAACL, vol. 2007, pp. 292–299 (2007)

    Google Scholar 

  • Unno, Y., Ninomiya, T., Miyao, Y., Tsujii, J.: Trimming CFG parse trees for sentence compression using machine learning approaches. In: Proceedings of COLING/ACL, pp. 850–857 (2006)

    Google Scholar 

  • Uyttendaele, C., Moens, M.F., Dumortier, J.: Salomon: automatic abstracting of legal cases for effective access to court decisions. Artif. Intell. Law 6(1), 59–79 (1998)

    Article  Google Scholar 

  • Weischedel, R., Consortium, L.D., et al.: OntoNotes Release 4.0. Linguistic Data Consortium (2011)

    Google Scholar 

  • Wierzbicka, A.: English Speech Act Verbs: A Semantic Dictionary. Academic Press, Cambridge (1987)

    Google Scholar 

  • Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: OpinionFinder: a system for subjectivity analysis. In: Proceedings of HLT/EMNLP, pp. 34–35 (2005)

    Google Scholar 

  • Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on HLT and EMNLP, pp. 347–354 (2005b)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simone Teufel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Teufel, S. (2018). Deeper Summarisation: The Second Time Around. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75487-1_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75486-4

  • Online ISBN: 978-3-319-75487-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics