Deeper Summarisation: The Second Time Around

Teufel, Simone

doi:10.1007/978-3-319-75487-1_44

Simone Teufel¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9624))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1160 Accesses

Abstract

This paper advocates deeper summarisation methods: methods that are closer to text understanding; methods that manipulate intermediate semantic representations. As a field, we are not yet in a position to create these representations perfectly, but I still believe that now is a good time to be a bit more ambitious again in our goals for summarisation. I think that a summariser should be able to provide some form of explanation for the summary it just created; and if we want those types of summarisers, we will have to start manipulating semantic representations.

Considering the state of the art in NLP in 2016, I believe that the field is ready for a second attempt at going deeper in summarisation. We NLP folk have come a long way since the days of early AI research. Twenty-five years of statistical research in NLP have given us more robust, more informative processing of many aspects of semantics – such as semantic similarity and relatedness between words (and maybe larger things), semantic role labelling, co-reference resolution, and sentiment detection. Now, with these new tools under our belt, we can try again to create the right kind of intermediate representations for summarisation, and then do something exciting with them. Of course, exactly how is a very big question. In this opinion paper, I will bring forward some suggestions, by taking a second look at historical summarisation models from the era of Strong AI. These may have been over-ambitious back then, but people still talk about them now because of their explanatory power: they make statements about which meaning units in a text are always important, and why.

I will discuss two 1980s models for text understanding and summarisation (Wendy Lehnert’s Plot Units, and Kintsch and van Dijk’s memory-restricted discourse structure), both of which have recently been revived by their first modern implementations. The implementation of Plot Unit-style affect analysis is by Goyal et al. (2013), the KvD implementation is by my student Yimai Fang, using a new corpus of language learner texts (Fang and Teufel 2014). Looking at those systems, I will argue that even an imperfect deeper summariser is exciting news.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.cl.cam.ac.uk/~smb89/form.html.

References

Androutsopoulos, I., Malakasiotis, P.: A survey of paraphrasing and textual entailment methods. J. Artif. Intell. Res. 38, 135–187 (2010)
MATH Google Scholar
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley framenet project. In: Proceedings COLING, pp. 86–90 (1998)
Google Scholar
Baroni, M., Lenci, A.: Distributional memory: a general framework for corpus-based semantics. Comput. Linguist. 36(4), 673–721 (2010)
Article Google Scholar
Barzilay, R., Lee, L.: Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In: Proceedings of HLT, pp. 16–23 (2003)
Google Scholar
Bauer, S., Teufel, S.: A methodology for evaluating timeline generation algorithms based on deep semantic units. In: Proceedings of ACL, p. 834 (2015)
Google Scholar
Björkelund, A., Farkas, R.: Data-driven multilingual coreference resolution using resolver stacking. In: Joint Conference on EMNLP and CoNLL-Shared Task, pp. 49–55 (2012)
Google Scholar
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR, pp. 335–336 (1998)
Google Scholar
Chambers, N., Jurafsky, D.: Unsupervised learning of narrative schemas and their participants. In: Proceedings of ACL, pp. 602–610 (2009)
Google Scholar
Chambers, N., Jurafsky, D.: Unsupervised learning of narrative event chains. In: Proceedings of ACL, pp. 789–797 (2008)
Google Scholar
Cohen, R.: A computational theory of the function of clue words in argument understanding. In: Proceedings of COLING, pp. 251–255 (1984)
Google Scholar
Copestake, A.: Slacker semantics: why superficiality, dependency and avoidance of commitment can be the right way to go. In: Proceedings of EACL, pp. 1–9 (2009)
Google Scholar
DeJong, G.F.: An overview of the FRUMP system. In: Lehner, W.G., Ringle, M.H. (eds.) Strategies for Natural Language Processing, chap. 5. Lawrence Erlbaum, Hillsdale (1982)
Google Scholar
Dorr, B.J., Zajic, D., Schwartz, R.: Hedge: a parse-and-trim approach to headline generation. In: Proceedings of the HLT Text Summarization Workshop, pp. 1–8 (2003)
Google Scholar
Elson, D.K., McKeown, K.: Extending and evaluating a platform for story understanding. In: AAAI Spring Symposium: Intelligent Narrative Technologies II, pp. 32–35 (2009)
Google Scholar
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Google Scholar
Fang, Y., Teufel, S.: A summariser based on human memory limitations and lexical competition. In: Proceedings of EACL, pp. 732–741 (2014)
Google Scholar
Farrow, E., Dickinson, T., Aylett, M.P.: Generating narratives from personal digital data: using sentiment, themes, and named entities to construct stories. In: Abascal, J., Barbosa, S., Fetter, M., Gross, T., Palanque, P., Winckler, M. (eds.) INTERACT 2015. LNCS, vol. 9299, pp. 473–477. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22723-8_41
Chapter Google Scholar
Galley, M., McKeown, K., Fosler-Lussier, E., Jing, H.: Discourse segmentation of multi-party conversation. In: Proceedings of ACL, pp. 562–569 (2003)
Google Scholar
Gay, L.R., Mills, G.E., Airasian, P.W.: Educational Research: Competencies for Analysis and Application. Merrill, Columbus (1976)
Google Scholar
Goldberg, A.B., Fillmore, N., Andrzejewski, D., Xu, Z., Gibson, B., Zhu, X.: May all your wishes come true: a study of wishes and how to recognize them. In: Proceedings of HLT/NAACL, pp. 263–271 (2009)
Google Scholar
Goyal, A., Riloff, E., Daume III, H.: A computational model for plot units. Computational Intelligence 29(3), 466–488 (2013)
Article MathSciNet Google Scholar
Hahn, U., Reimer, U.: Computing text constituency: an algorithmic approach to the generation of text graphs. In: Proceedings of SIGIR, pp. 343–368 (1984)
Google Scholar
van Halteren, H., Teufel, S.: Examining the consensus between human summaries: initial experiments with factoid analysis. In: Proceedings of the HLT Text Summarization Workshop (2003)
Google Scholar
Kasch, N., Oates, T.: Mining script-like structures from the web. In: Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pp. 34–42 (2010)
Google Scholar
Kintsch, W., van Dijk, T.A.: Toward a model of text comprehension and production. Psychol. Rev. 85(5), 363–394 (1978)
Article Google Scholar
Knight, K., Marcu, D.: Statistics-based summarization – step one: sentence compression. In: Proceeding of AAAI-2000, pp. 703–710 (2000)
Google Scholar
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211 (1997)
Article Google Scholar
Lave, J.: Cognition in Practice: Mind, Mathematics and Culture in Everyday Life. Cambridge University Press, Cambridge (1988)
Book Google Scholar
Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, pp. 28–34 (2011)
Google Scholar
Lehnert, W.G.: Plot units: a narrative summarisation strategy. In: Lehnert, W.G., Ringle, M.H. (eds.) Strategies for Natural Language Processing, chap. 4, pp. 223–244. Lawrence Erlbaum, Hillsdale (1981a)
Google Scholar
Lehnert, W.G.: Plot units and narrative summarization. Cogn. Sci. 4, 293–331 (1981)
Article Google Scholar
Lehnert, W.G., Dyer, M.G., Johnson, P.N., Yang, C., Harley, S.: BORISan experiment in in-depth understanding of narratives. Artif. Intell. 20(1), 15–62 (1983)
Article Google Scholar
Lin, D.: Using collocation statistics in information extraction. In: Proceedings of ACL/COLING 1998, Montreal, Canada (1998)
Google Scholar
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Proceedings of Workshop “Text Summarization Branches Out” at ACL 2004 (2004)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
McKeown, K., Robin, J., Kukich, K.: Generating concise natural language summaries. Inf. Process. Manag. 31(5), 703–733 (1995)
Article Google Scholar
Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Proceedings of EMNLP (2004)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Moens, M.F., Angheluta, R., De Busser, R.: Summarization of texts found on the world wide web. In: Abramowicz, W. (ed.) Knowledge-Based Information Retrieval and Filtering from the Web, vol. 746, pp. 101–120. Springer, Boston (2003). https://doi.org/10.1007/978-1-4757-3739-4_5
Chapter Google Scholar
Nenkova, A., Passonneau, R.J.: Evaluating content selection in summarization: the pyramid method. In: Proceedings of NAACL/HLT 2004, Boston, MA (2004)
Google Scholar
Nishikawa, H., Arita, K., Tanaka, K., Hirao, T., Makino, T., Matsuo, Y.: Learning to generate coherent summary with discriminative hidden semi-Markov model. In: Proceedings of COLING, pp. 1648–1659 (2014)
Google Scholar
Paivio, A.: Mental Representations. Oxford University Press, Oxford (1990)
Book Google Scholar
Pustejovsky, J., Castano, J.M., Ingria, R., Sauri, R., Gaizauskas, R.J., Setzer, A., Katz, G., Radev, D.R.: TimeML: robust specification of event and temporal expressions in text. In: New Directions in Question Answering, vol. 3, pp. 28–34 (2003)
Google Scholar
Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., Dimitrov, S., Drabek, E., Hakim, A., Lam, W., Liu, D., et al.: MEAD - a platform for multidocument multilingual text summarization. In: Proceedings of LREC (2004)
Google Scholar
Radev, D.R., McKeown, K.R.: Generating natural language summaries from multiple on-line sources. Comput. Linguist. 24(3), 469–500 (1998)
Google Scholar
Reiter, E., Sripada, S., Hunter, J., Yu, J., Davy, I.: Choosing words in computer-generated weather forecasts. Artif. Intell. 167(1), 137–169 (2005)
Article Google Scholar
Rumelhart, D.E.: Understanding and summarizing brief stories. In: Laberge, D., Samuels, S. (eds.) Basic Processes in Reading, Perception and Comprehension. Lawrence Erlbaum, Hillsdale (1977)
Google Scholar
Saurí, R., Littman, J., Gaizauskas, R., Setzer, A., Pustejovsky, J.: TimeML Annotation Guidelines, Version 1.2.1 (2006)
Google Scholar
Schank, R.C.: Conceptual Information Processing. North-Holland, Amsterdam (1975)
MATH Google Scholar
Schank, R.C., Abelson, R.P.: Scripts, Goals, Plans and Understanding. Lawrence Erlbaum, Hillsdale (1977)
MATH Google Scholar
Stanovsky, G., Ficler, J., Dagan, I., Goldberg, Y.: Intermediary semantic representation through proposition structures. In: Proceedings of ACL, p. 66 (2014)
Google Scholar
Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Handbook of Latent Semantic Analysis, vol. 427, no. 7, pp. 424–440 (2007)
Google Scholar
Takamura, H., Inui, T., Okumura, M.: Extracting semantic orientations of phrases from dictionary. In: Proceedings of HLT-NAACL, vol. 2007, pp. 292–299 (2007)
Google Scholar
Unno, Y., Ninomiya, T., Miyao, Y., Tsujii, J.: Trimming CFG parse trees for sentence compression using machine learning approaches. In: Proceedings of COLING/ACL, pp. 850–857 (2006)
Google Scholar
Uyttendaele, C., Moens, M.F., Dumortier, J.: Salomon: automatic abstracting of legal cases for effective access to court decisions. Artif. Intell. Law 6(1), 59–79 (1998)
Article Google Scholar
Weischedel, R., Consortium, L.D., et al.: OntoNotes Release 4.0. Linguistic Data Consortium (2011)
Google Scholar
Wierzbicka, A.: English Speech Act Verbs: A Semantic Dictionary. Academic Press, Cambridge (1987)
Google Scholar
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: OpinionFinder: a system for subjectivity analysis. In: Proceedings of HLT/EMNLP, pp. 34–35 (2005)
Google Scholar
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on HLT and EMNLP, pp. 347–354 (2005b)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Laboratory, University Cambridge, JJ Thomson Avenue, Cambridge, CB3 0FD, UK
Simone Teufel

Authors

Simone Teufel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simone Teufel .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Teufel, S. (2018). Deeper Summarisation: The Second Time Around. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-75487-1_44
Published: 21 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75486-4
Online ISBN: 978-3-319-75487-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics