Text Summarization as Controlled Search

Copeck, Terry; Japkowicz, Nathalie; Szpakowicz, Stan

doi:10.1007/3-540-47922-8_22

Terry Copeck³,
Nathalie Japkowicz³ &
Stan Szpakowicz³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2338))

Included in the following conference series:

Conference of the Canadian Society for Computational Studies of Intelligence

858 Accesses
3 Citations

Abstract

We present a framework for text summarization based on the generate-and-test model. A large set of summaries is generated for all plausible values of six parameters that control a three-stage process that includes segmentation and keyphrase extraction, and a number of features that characterize the document. Quality is assessed by measuring the summaries against the abstract of the summarized document. The large number of summaries produced for our corpus dictates automated validation and fine-tuning of the summary generator. We use supervised machine learning to detect good and bad parameters. In particular, we identify parameters and ranges of their values within which the summary generator might be used with high reliability on documents for which no author’s abstract exists.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Choi, F.: Advances in domain independent linear text segmentation. In Proceedings of ANLP/NAACL-00 (2000)
Google Scholar
Hearst, M.: TexTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics (1997) 23(1) 33–64.
Google Scholar
Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarizing Text Documents: Sentence Selection and Evaluation Metrics. In Proceedings of SIGIR-99 (1999) 121–128.
Google Scholar
Kan, M.-Y., Klavans, J., McKeown, K.: Linear Segmentation and Segment Significance. In Proceedings of WVLC-6 (1998) 197–205.
Google Scholar
Klavans, J., McKeown, K., Kan, M.-Y., Lee, S.: Resources for Evaluation of Summarization Techniques. In Proceedings of the 1st International Conference on Language Resources and Evaluation (1998).
Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical statistics and probability (1967) (1) 281–297.
Google Scholar
Mittal, V., Kantrowitz, M., Goldstein, J., Carbonell, J.: Selecting Text Spans for Document Summaries: Heuristics and Metrics. In Proceedings of AAAI-99 (1999)467–473.
Google Scholar
Quinlan, J.R.: C5.0: An Informal Tutorial, www.rulequest.com/see5-unix.html. Rulequest Research (2002).
Turney, P.: Learning algorithms for keyphrase extraction. Information Retrieval (2000) 2(4) 303–336.
Article Google Scholar
Witten, I.H., Paynter, G., Frank, E., Gutwin, C, Nevill-Manning, C: KEA: Practical automatic keyphrase extraction. In Proceedings of DL-99 (1999) 254–256.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology & Engineering, University of Ottawa, Ontario, Canada
Terry Copeck, Nathalie Japkowicz & Stan Szpakowicz

Authors

Terry Copeck
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Japkowicz
View author publications
You can also search for this author in PubMed Google Scholar
Stan Szpakowicz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science, University of Waterloo, 200 University Ave.W., Waterloo, Ontario, Canada, N2L 3G1
Robin Cohen
IIT - e-Business, National Research Council, Incutech Brunswick, 2 Garland Court, Frederiction, New Brunswick, Canada, E3B 6C2
Bruce Spencer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Copeck, T., Japkowicz, N., Szpakowicz, S. (2002). Text Summarization as Controlled Search. In: Cohen, R., Spencer, B. (eds) Advances in Artificial Intelligence. Canadian AI 2002. Lecture Notes in Computer Science(), vol 2338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47922-8_22

Download citation

DOI: https://doi.org/10.1007/3-540-47922-8_22
Published: 28 May 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43724-6
Online ISBN: 978-3-540-47922-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics