Summarization of Texts Found on the World Wide Web

Moens, Marie-Francine; Angheluta, Roxana; De Busser, Rik

doi:10.1007/978-1-4757-3739-4_5

Marie-Francine Moens²,
Roxana Angheluta² &
Rik De Busser²

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 746))

149 Accesses
1 Citations

Abstract

Summaries of texts found on the World Wide Web are valuable. They help the user of a search engine to select information and are an aid for processing the vast amount of information found on the Web. This chapter describes the technologies that can be applied for summarizing the texts of Web pages. The focus is on technologies that currently generate the best results and are suited for the specific heterogeneous environment that makes up the World Wide Web. This chapter gives an overview of generic, query-biased and task-specific summarization, as well as single-document and multi-document summarization. Among the technologies that are discussed are semantic frame technologies, rhetorical structure analysis, learning discourse patterns, techniques relying upon lexical cohesion, and text clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Angheluta, M.-F. Moens und R. De Busser, “The Use of Topic Segmentation for Automatic Summarization”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 66–70.
Google Scholar
R. Barzilay und M. Elhadad, “Using Lexical Chains for Text Summarization”, in I. Mani und M.T. Maybury (Eds.), Advances in Automatic Text Summarization, Cambridge, MA, MIT Press, 1999, pp. 111–121.
Google Scholar
R. Barzilay, N. Elhadad und K. McKeown, “Sentence Ordering in Multi-document Summarization”, in Proceedings of the Human Language Technology (HLT) 2001 Conference,San Diego, http://www.cs.columbia.edu/~regina/.
F. Choi, “Advances in Domain Independent Linear Text Segmentation”, in Proceedings of the ANLP/NAACL-00,2000, pp. 26–33.
Google Scholar
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam und S. Slattery. “Learning to Construct Knowledge Bases from the World Wide Web”, Artificial Intelligence, 118, 2000, pp. 69–113
Article MATH Google Scholar
R. De Busser, R. Angheluta und M.-F. Moens, “Semantic Case Role Detection for Information Extraction”, in COLING 2002–Proceedings of the Main Conference, New Brunswick, ACL, 2002, pp. 1198–1202.
Google Scholar
G. DeJong, “An Overview of the FRUMP System”, in W.G. Lehnert und M.H. Ringle (Eds.), Strategies for Natural Language Processing, Hillsdale, Lawrence Erlbaum, 1982, pp. 149–176.
Google Scholar
Proceedings of the Document Understanding Conference,2001, http://www-nlpir.nist.gov/projects/dud.
Proceedings of the Document Understanding Conference,2002, http://www-nlpir.nist.gov/projects/dud.
C.J. Fillmore, “Frame Semantics and the Nature of Language”, in Annals of the New York Academy of Sciences: Conference on the Origin and Development of Speech, 280, 1976, New York, Academy of Sciences, pp. 20–32.
Google Scholar
D. Gildea und D. Jurafsky, “Automatic Labeling of Semantic Roles”, Computational Linguistics, 28 (3), 2002, pp. 245–288.
Article MathSciNet Google Scholar
J. Goldstein, “Automatic Text Summarization of Multiple Documents”, 1999, http://citeseer.nj.nec.com/goldstein99automatic.html.
U. Hahn, “Topic Parsing: Accounting for Text Macro Structures in Full-text Analysis”, Information Processing and Management, 26 (1), 1990, pp. 135–170.
Article Google Scholar
M.A.K. Halliday, An Introduction to Functional Grammar, London, Arnold, 1994.
Google Scholar
S. Harabagiu und F. Läcätu§u, “Generating Single and Multi-Document Summaries with GISTexter”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 30–38.
Google Scholar
S.M. Harabagiu und S.J. Maiorano, “Acquisition of Linguistic Patterns for Knowledge-based Information Extraction”, in Proceedings on the Second International Conference on Language Resources and Evaluation (LREC), Athens Greece, 2000, http://engr.smu.edu/—sanda/papers.html.
H. Hardy, N. Shimizu, T. Strzalkowski, L. Ting und X. Zhang, “Cross-Document Summarization by Concept Classification”, in Proceedings of the Document Understanding Conference 2001, 2001, pp. 65–70.
Google Scholar
V. Hatzivassiloglou, J. Klavans und E. Eskin, “’Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning”, in Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-1999),College Park, MD, 1999, http://www.cs.huji.ac.il/~eeskin/papers/.
M.A. Hearst, “TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages”, Computational Linguistics, 23 (1), 1997, pp. 33–64.
Google Scholar
T. Hirao, Y. Sasaki, H. Isozaki und E. Maeda, “NTF’s Text Summarization System for DUC-2002”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 104–107.
Google Scholar
J. Hobbs, D. Appelt, J. Bear, D. Israel, M. Kameyama, M. Stickel und M. Tyson, “FASTUS: A Cascaded Finite-state Transducer for Extracting Information from Natural-language Text”, in E. Roche und Y. Schabes (Eds.), Finite State Devices for Natural Language Processing, MIT Press, Cambridge, MA, 1996, http://www.ai.sri.com/pubs/files/356.pdf.
E.H. Hovy, “Automated discourse generation using discourse structure relations”, Artificial Intelligence, 63, 1993, pp. 341–385.
Article Google Scholar
M.-Y. Kan, J.L. Klavans und K.R. McKeown, “Linear Segmentation and Segment Relevance”, in Proceedings of 6th International Workshop of Very Large Corpora (WVLC-6), Montréal, Québec, Canada, 1998, pp. 197–205.
Google Scholar
L. Kaufman und P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, New York, John Wiley und Sons, 1990.
Book Google Scholar
W. Kintsch und T.A. van Dijk, “Toward a model of text comprehension and production”, Psychological Review, 85 (5), 1978, pp. 363–394.
Article Google Scholar
W. Kraaij, M. Spitters und A. Hulth, “Headline Extraction Based on a Combination of Uni-and Multidocument Summarization Techniques”, in Proceedings of the ACL-2002 Post-Conference Workshop on Automatic Summarization, 2002, pp. 95–104.
Google Scholar
J. Kupiec, J. Pedersen und F. Chen, “A Trainable Document Summarizer”, in E.A. Fox, P. Ingwersen und R. Fidel (Eds.), Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, ACM, 1995, pp. 68–73.
Google Scholar
C.-Y. Lin und E. Hovy, “Neats in DUC 2002”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 99–103.
Google Scholar
H.P. Luhn, “The automatic creation of literature abstracts”, IBM Journal of Research and Development, 2 (2), 1958, pp. 159–165.
Article MathSciNet Google Scholar
I. Mani und M.T. Maybury (Eds.), Advances in Automatic Text Summarization, Cambridge, MA, MIT Press, 1999.
Google Scholar
W.C. Mann und S.A. Thompson, “ Rhetorical Structure Theory: Toward a Functional Theory of Text Organization”, Text, 8 (3), 1988, pp. 243–281.
Google Scholar
D. Marcu, The Theory and Practice of Discourse Parsing and Summarization, Cambridge, MA, MIT Press, 2000.
MATH Google Scholar
K.R. McKeown, J.L. Klavans, V. Hatzivassiloglou, R. Barzilay und E. Eskin, “Towards Multi-document Summarization by Reformulation: Progress and Prospects”, in Proceedings of AAAI’99,San Francisco, Morgan Kaufmann 1999, pp. 453460
Google Scholar
K. McKeown und D. Radev, “Generating Summaries of Multiple News Articles”, in I. Mani und M.T. Maybury (Eds.), Advances in Automatic Text Summarization, Cambridge, MA, MIT Press, 1999, pp. 381–389.
Google Scholar
M.-F. Moens, Automatic Indexing and Abstracting of Document Texts (The Kluwer International Series on Information Retrieval 6 ). Kluwer Academic Publishers, Boston, 2000.
Google Scholar
M.-F. Moens und R. De Busser, “Generic Topic Segmentation of Document Texts”, in Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, 2001, pp. 418–419.
Google Scholar
J. Morris und G. Hirst, “Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text”, Computational Linguistics 17 (1), 1991, pp. 21–43.
Google Scholar
E. Riloff, “An Empirical Study for Automated Dictionary Construction for Information Extraction in Three Domains”, Artificial Intelligence, 85, 1996, pp. 101–134.
Article Google Scholar
G. Salton, J. Allan, C. Buckley und A. Singhal, “Automatic Analysis, Theme Generation, and Summarization of Machine-readable Texts”, Science, 264, 1994, pp. 1421–1426.
Article Google Scholar
R.C. Schank, “Conceptual Dependency: A Theory of Natural Language Understanding”, Cognitive Psychology 3, 1972, pp. 552–631.
Article Google Scholar
S. Soderland, “Learning Information Extraction Rules for Semi-structured and Free Text”, Machine Learning, 34 (1–3), 1999, pp. 233–272.
Article MATH Google Scholar
K. Sparck Jones, “What Might Be in a Summary?” in G. Knorz, J. Krause, und C. Womser-Hacker (Eds.), Information Retrieval ‘83: Von der Modellierung zur Anwendung, Konstanz, Universitätsverlag, 1993, pp. 9–26.
Google Scholar
A. Tombros und M. Sanderson, “Advantages of Query Biased Summaries in Information Retrieval”, in W.B. Croft, A. Moffat, C.J. van Rijsbergen, R. Wilkinson und J. Zobel (Eds.), Proceedings of the 21“ Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, ACM, 1998, pp. 2–10.
Google Scholar
H. van Halteren, “Writing Style Recognition and Sentence Extraction”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 50–63.
Google Scholar

Download references

Author information

Authors and Affiliations

Katholieke Universiteit Leuven, Belgium
Marie-Francine Moens, Roxana Angheluta & Rik De Busser

Authors

Marie-Francine Moens
View author publications
You can also search for this author in PubMed Google Scholar
Roxana Angheluta
View author publications
You can also search for this author in PubMed Google Scholar
Rik De Busser
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Management Information Systems, The Poznań University of Economics, Poland
Witold Abramowicz

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Moens, MF., Angheluta, R., De Busser, R. (2003). Summarization of Texts Found on the World Wide Web. In: Abramowicz, W. (eds) Knowledge-Based Information Retrieval and Filtering from the Web. The Springer International Series in Engineering and Computer Science, vol 746. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3739-4_5

Download citation

DOI: https://doi.org/10.1007/978-1-4757-3739-4_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5376-6
Online ISBN: 978-1-4757-3739-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics