Abstract
We present a framework for text summarization based on the generate-and-test model. A large set of summaries is generated for all plausible values of six parameters that control a three-stage process that includes segmentation and keyphrase extraction, and a number of features that characterize the document. Quality is assessed by measuring the summaries against the abstract of the summarized document. The large number of summaries produced for our corpus dictates automated validation and fine-tuning of the summary generator. We use supervised machine learning to detect good and bad parameters. In particular, we identify parameters and ranges of their values within which the summary generator might be used with high reliability on documents for which no author’s abstract exists.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Choi, F.: Advances in domain independent linear text segmentation. In Proceedings of ANLP/NAACL-00 (2000)
Hearst, M.: TexTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics (1997) 23(1) 33–64.
Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarizing Text Documents: Sentence Selection and Evaluation Metrics. In Proceedings of SIGIR-99 (1999) 121–128.
Kan, M.-Y., Klavans, J., McKeown, K.: Linear Segmentation and Segment Significance. In Proceedings of WVLC-6 (1998) 197–205.
Klavans, J., McKeown, K., Kan, M.-Y., Lee, S.: Resources for Evaluation of Summarization Techniques. In Proceedings of the 1st International Conference on Language Resources and Evaluation (1998).
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical statistics and probability (1967) (1) 281–297.
Mittal, V., Kantrowitz, M., Goldstein, J., Carbonell, J.: Selecting Text Spans for Document Summaries: Heuristics and Metrics. In Proceedings of AAAI-99 (1999)467–473.
Quinlan, J.R.: C5.0: An Informal Tutorial, www.rulequest.com/see5-unix.html. Rulequest Research (2002).
Turney, P.: Learning algorithms for keyphrase extraction. Information Retrieval (2000) 2(4) 303–336.
Witten, I.H., Paynter, G., Frank, E., Gutwin, C, Nevill-Manning, C: KEA: Practical automatic keyphrase extraction. In Proceedings of DL-99 (1999) 254–256.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Copeck, T., Japkowicz, N., Szpakowicz, S. (2002). Text Summarization as Controlled Search. In: Cohen, R., Spencer, B. (eds) Advances in Artificial Intelligence. Canadian AI 2002. Lecture Notes in Computer Science(), vol 2338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47922-8_22
Download citation
DOI: https://doi.org/10.1007/3-540-47922-8_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43724-6
Online ISBN: 978-3-540-47922-2
eBook Packages: Springer Book Archive