Abstract
Summaries of texts found on the World Wide Web are valuable. They help the user of a search engine to select information and are an aid for processing the vast amount of information found on the Web. This chapter describes the technologies that can be applied for summarizing the texts of Web pages. The focus is on technologies that currently generate the best results and are suited for the specific heterogeneous environment that makes up the World Wide Web. This chapter gives an overview of generic, query-biased and task-specific summarization, as well as single-document and multi-document summarization. Among the technologies that are discussed are semantic frame technologies, rhetorical structure analysis, learning discourse patterns, techniques relying upon lexical cohesion, and text clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Angheluta, M.-F. Moens und R. De Busser, “The Use of Topic Segmentation for Automatic Summarization”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 66–70.
R. Barzilay und M. Elhadad, “Using Lexical Chains for Text Summarization”, in I. Mani und M.T. Maybury (Eds.), Advances in Automatic Text Summarization, Cambridge, MA, MIT Press, 1999, pp. 111–121.
R. Barzilay, N. Elhadad und K. McKeown, “Sentence Ordering in Multi-document Summarization”, in Proceedings of the Human Language Technology (HLT) 2001 Conference,San Diego, http://www.cs.columbia.edu/~regina/.
F. Choi, “Advances in Domain Independent Linear Text Segmentation”, in Proceedings of the ANLP/NAACL-00,2000, pp. 26–33.
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam und S. Slattery. “Learning to Construct Knowledge Bases from the World Wide Web”, Artificial Intelligence, 118, 2000, pp. 69–113
R. De Busser, R. Angheluta und M.-F. Moens, “Semantic Case Role Detection for Information Extraction”, in COLING 2002–Proceedings of the Main Conference, New Brunswick, ACL, 2002, pp. 1198–1202.
G. DeJong, “An Overview of the FRUMP System”, in W.G. Lehnert und M.H. Ringle (Eds.), Strategies for Natural Language Processing, Hillsdale, Lawrence Erlbaum, 1982, pp. 149–176.
Proceedings of the Document Understanding Conference,2001, http://www-nlpir.nist.gov/projects/dud.
Proceedings of the Document Understanding Conference,2002, http://www-nlpir.nist.gov/projects/dud.
C.J. Fillmore, “Frame Semantics and the Nature of Language”, in Annals of the New York Academy of Sciences: Conference on the Origin and Development of Speech, 280, 1976, New York, Academy of Sciences, pp. 20–32.
D. Gildea und D. Jurafsky, “Automatic Labeling of Semantic Roles”, Computational Linguistics, 28 (3), 2002, pp. 245–288.
J. Goldstein, “Automatic Text Summarization of Multiple Documents”, 1999, http://citeseer.nj.nec.com/goldstein99automatic.html.
U. Hahn, “Topic Parsing: Accounting for Text Macro Structures in Full-text Analysis”, Information Processing and Management, 26 (1), 1990, pp. 135–170.
M.A.K. Halliday, An Introduction to Functional Grammar, London, Arnold, 1994.
S. Harabagiu und F. Läcätu§u, “Generating Single and Multi-Document Summaries with GISTexter”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 30–38.
S.M. Harabagiu und S.J. Maiorano, “Acquisition of Linguistic Patterns for Knowledge-based Information Extraction”, in Proceedings on the Second International Conference on Language Resources and Evaluation (LREC), Athens Greece, 2000, http://engr.smu.edu/—sanda/papers.html.
H. Hardy, N. Shimizu, T. Strzalkowski, L. Ting und X. Zhang, “Cross-Document Summarization by Concept Classification”, in Proceedings of the Document Understanding Conference 2001, 2001, pp. 65–70.
V. Hatzivassiloglou, J. Klavans und E. Eskin, “’Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning”, in Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-1999),College Park, MD, 1999, http://www.cs.huji.ac.il/~eeskin/papers/.
M.A. Hearst, “TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages”, Computational Linguistics, 23 (1), 1997, pp. 33–64.
T. Hirao, Y. Sasaki, H. Isozaki und E. Maeda, “NTF’s Text Summarization System for DUC-2002”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 104–107.
J. Hobbs, D. Appelt, J. Bear, D. Israel, M. Kameyama, M. Stickel und M. Tyson, “FASTUS: A Cascaded Finite-state Transducer for Extracting Information from Natural-language Text”, in E. Roche und Y. Schabes (Eds.), Finite State Devices for Natural Language Processing, MIT Press, Cambridge, MA, 1996, http://www.ai.sri.com/pubs/files/356.pdf.
E.H. Hovy, “Automated discourse generation using discourse structure relations”, Artificial Intelligence, 63, 1993, pp. 341–385.
M.-Y. Kan, J.L. Klavans und K.R. McKeown, “Linear Segmentation and Segment Relevance”, in Proceedings of 6th International Workshop of Very Large Corpora (WVLC-6), Montréal, Québec, Canada, 1998, pp. 197–205.
L. Kaufman und P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, New York, John Wiley und Sons, 1990.
W. Kintsch und T.A. van Dijk, “Toward a model of text comprehension and production”, Psychological Review, 85 (5), 1978, pp. 363–394.
W. Kraaij, M. Spitters und A. Hulth, “Headline Extraction Based on a Combination of Uni-and Multidocument Summarization Techniques”, in Proceedings of the ACL-2002 Post-Conference Workshop on Automatic Summarization, 2002, pp. 95–104.
J. Kupiec, J. Pedersen und F. Chen, “A Trainable Document Summarizer”, in E.A. Fox, P. Ingwersen und R. Fidel (Eds.), Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, ACM, 1995, pp. 68–73.
C.-Y. Lin und E. Hovy, “Neats in DUC 2002”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 99–103.
H.P. Luhn, “The automatic creation of literature abstracts”, IBM Journal of Research and Development, 2 (2), 1958, pp. 159–165.
I. Mani und M.T. Maybury (Eds.), Advances in Automatic Text Summarization, Cambridge, MA, MIT Press, 1999.
W.C. Mann und S.A. Thompson, “ Rhetorical Structure Theory: Toward a Functional Theory of Text Organization”, Text, 8 (3), 1988, pp. 243–281.
D. Marcu, The Theory and Practice of Discourse Parsing and Summarization, Cambridge, MA, MIT Press, 2000.
K.R. McKeown, J.L. Klavans, V. Hatzivassiloglou, R. Barzilay und E. Eskin, “Towards Multi-document Summarization by Reformulation: Progress and Prospects”, in Proceedings of AAAI’99,San Francisco, Morgan Kaufmann 1999, pp. 453460
K. McKeown und D. Radev, “Generating Summaries of Multiple News Articles”, in I. Mani und M.T. Maybury (Eds.), Advances in Automatic Text Summarization, Cambridge, MA, MIT Press, 1999, pp. 381–389.
M.-F. Moens, Automatic Indexing and Abstracting of Document Texts (The Kluwer International Series on Information Retrieval 6 ). Kluwer Academic Publishers, Boston, 2000.
M.-F. Moens und R. De Busser, “Generic Topic Segmentation of Document Texts”, in Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, 2001, pp. 418–419.
J. Morris und G. Hirst, “Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text”, Computational Linguistics 17 (1), 1991, pp. 21–43.
E. Riloff, “An Empirical Study for Automated Dictionary Construction for Information Extraction in Three Domains”, Artificial Intelligence, 85, 1996, pp. 101–134.
G. Salton, J. Allan, C. Buckley und A. Singhal, “Automatic Analysis, Theme Generation, and Summarization of Machine-readable Texts”, Science, 264, 1994, pp. 1421–1426.
R.C. Schank, “Conceptual Dependency: A Theory of Natural Language Understanding”, Cognitive Psychology 3, 1972, pp. 552–631.
S. Soderland, “Learning Information Extraction Rules for Semi-structured and Free Text”, Machine Learning, 34 (1–3), 1999, pp. 233–272.
K. Sparck Jones, “What Might Be in a Summary?” in G. Knorz, J. Krause, und C. Womser-Hacker (Eds.), Information Retrieval ‘83: Von der Modellierung zur Anwendung, Konstanz, Universitätsverlag, 1993, pp. 9–26.
A. Tombros und M. Sanderson, “Advantages of Query Biased Summaries in Information Retrieval”, in W.B. Croft, A. Moffat, C.J. van Rijsbergen, R. Wilkinson und J. Zobel (Eds.), Proceedings of the 21“ Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, ACM, 1998, pp. 2–10.
H. van Halteren, “Writing Style Recognition and Sentence Extraction”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 50–63.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer Science+Business Media New York
About this chapter
Cite this chapter
Moens, MF., Angheluta, R., De Busser, R. (2003). Summarization of Texts Found on the World Wide Web. In: Abramowicz, W. (eds) Knowledge-Based Information Retrieval and Filtering from the Web. The Springer International Series in Engineering and Computer Science, vol 746. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3739-4_5
Download citation
DOI: https://doi.org/10.1007/978-1-4757-3739-4_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5376-6
Online ISBN: 978-1-4757-3739-4
eBook Packages: Springer Book Archive