Skip to main content

Abstract

Summaries of texts found on the World Wide Web are valuable. They help the user of a search engine to select information and are an aid for processing the vast amount of information found on the Web. This chapter describes the technologies that can be applied for summarizing the texts of Web pages. The focus is on technologies that currently generate the best results and are suited for the specific heterogeneous environment that makes up the World Wide Web. This chapter gives an overview of generic, query-biased and task-specific summarization, as well as single-document and multi-document summarization. Among the technologies that are discussed are semantic frame technologies, rhetorical structure analysis, learning discourse patterns, techniques relying upon lexical cohesion, and text clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Angheluta, M.-F. Moens und R. De Busser, “The Use of Topic Segmentation for Automatic Summarization”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 66–70.

    Google Scholar 

  2. R. Barzilay und M. Elhadad, “Using Lexical Chains for Text Summarization”, in I. Mani und M.T. Maybury (Eds.), Advances in Automatic Text Summarization, Cambridge, MA, MIT Press, 1999, pp. 111–121.

    Google Scholar 

  3. R. Barzilay, N. Elhadad und K. McKeown, “Sentence Ordering in Multi-document Summarization”, in Proceedings of the Human Language Technology (HLT) 2001 Conference,San Diego, http://www.cs.columbia.edu/~regina/.

  4. F. Choi, “Advances in Domain Independent Linear Text Segmentation”, in Proceedings of the ANLP/NAACL-00,2000, pp. 26–33.

    Google Scholar 

  5. M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam und S. Slattery. “Learning to Construct Knowledge Bases from the World Wide Web”, Artificial Intelligence, 118, 2000, pp. 69–113

    Article  MATH  Google Scholar 

  6. R. De Busser, R. Angheluta und M.-F. Moens, “Semantic Case Role Detection for Information Extraction”, in COLING 2002–Proceedings of the Main Conference, New Brunswick, ACL, 2002, pp. 1198–1202.

    Google Scholar 

  7. G. DeJong, “An Overview of the FRUMP System”, in W.G. Lehnert und M.H. Ringle (Eds.), Strategies for Natural Language Processing, Hillsdale, Lawrence Erlbaum, 1982, pp. 149–176.

    Google Scholar 

  8. Proceedings of the Document Understanding Conference,2001, http://www-nlpir.nist.gov/projects/dud.

  9. Proceedings of the Document Understanding Conference,2002, http://www-nlpir.nist.gov/projects/dud.

  10. C.J. Fillmore, “Frame Semantics and the Nature of Language”, in Annals of the New York Academy of Sciences: Conference on the Origin and Development of Speech, 280, 1976, New York, Academy of Sciences, pp. 20–32.

    Google Scholar 

  11. D. Gildea und D. Jurafsky, “Automatic Labeling of Semantic Roles”, Computational Linguistics, 28 (3), 2002, pp. 245–288.

    Article  MathSciNet  Google Scholar 

  12. J. Goldstein, “Automatic Text Summarization of Multiple Documents”, 1999, http://citeseer.nj.nec.com/goldstein99automatic.html.

  13. U. Hahn, “Topic Parsing: Accounting for Text Macro Structures in Full-text Analysis”, Information Processing and Management, 26 (1), 1990, pp. 135–170.

    Article  Google Scholar 

  14. M.A.K. Halliday, An Introduction to Functional Grammar, London, Arnold, 1994.

    Google Scholar 

  15. S. Harabagiu und F. Läcätu§u, “Generating Single and Multi-Document Summaries with GISTexter”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 30–38.

    Google Scholar 

  16. S.M. Harabagiu und S.J. Maiorano, “Acquisition of Linguistic Patterns for Knowledge-based Information Extraction”, in Proceedings on the Second International Conference on Language Resources and Evaluation (LREC), Athens Greece, 2000, http://engr.smu.edu/—sanda/papers.html.

  17. H. Hardy, N. Shimizu, T. Strzalkowski, L. Ting und X. Zhang, “Cross-Document Summarization by Concept Classification”, in Proceedings of the Document Understanding Conference 2001, 2001, pp. 65–70.

    Google Scholar 

  18. V. Hatzivassiloglou, J. Klavans und E. Eskin, “’Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning”, in Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-1999),College Park, MD, 1999, http://www.cs.huji.ac.il/~eeskin/papers/.

  19. M.A. Hearst, “TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages”, Computational Linguistics, 23 (1), 1997, pp. 33–64.

    Google Scholar 

  20. T. Hirao, Y. Sasaki, H. Isozaki und E. Maeda, “NTF’s Text Summarization System for DUC-2002”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 104–107.

    Google Scholar 

  21. J. Hobbs, D. Appelt, J. Bear, D. Israel, M. Kameyama, M. Stickel und M. Tyson, “FASTUS: A Cascaded Finite-state Transducer for Extracting Information from Natural-language Text”, in E. Roche und Y. Schabes (Eds.), Finite State Devices for Natural Language Processing, MIT Press, Cambridge, MA, 1996, http://www.ai.sri.com/pubs/files/356.pdf.

  22. E.H. Hovy, “Automated discourse generation using discourse structure relations”, Artificial Intelligence, 63, 1993, pp. 341–385.

    Article  Google Scholar 

  23. M.-Y. Kan, J.L. Klavans und K.R. McKeown, “Linear Segmentation and Segment Relevance”, in Proceedings of 6th International Workshop of Very Large Corpora (WVLC-6), Montréal, Québec, Canada, 1998, pp. 197–205.

    Google Scholar 

  24. L. Kaufman und P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, New York, John Wiley und Sons, 1990.

    Book  Google Scholar 

  25. W. Kintsch und T.A. van Dijk, “Toward a model of text comprehension and production”, Psychological Review, 85 (5), 1978, pp. 363–394.

    Article  Google Scholar 

  26. W. Kraaij, M. Spitters und A. Hulth, “Headline Extraction Based on a Combination of Uni-and Multidocument Summarization Techniques”, in Proceedings of the ACL-2002 Post-Conference Workshop on Automatic Summarization, 2002, pp. 95–104.

    Google Scholar 

  27. J. Kupiec, J. Pedersen und F. Chen, “A Trainable Document Summarizer”, in E.A. Fox, P. Ingwersen und R. Fidel (Eds.), Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, ACM, 1995, pp. 68–73.

    Google Scholar 

  28. C.-Y. Lin und E. Hovy, “Neats in DUC 2002”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 99–103.

    Google Scholar 

  29. H.P. Luhn, “The automatic creation of literature abstracts”, IBM Journal of Research and Development, 2 (2), 1958, pp. 159–165.

    Article  MathSciNet  Google Scholar 

  30. I. Mani und M.T. Maybury (Eds.), Advances in Automatic Text Summarization, Cambridge, MA, MIT Press, 1999.

    Google Scholar 

  31. W.C. Mann und S.A. Thompson, “ Rhetorical Structure Theory: Toward a Functional Theory of Text Organization”, Text, 8 (3), 1988, pp. 243–281.

    Google Scholar 

  32. D. Marcu, The Theory and Practice of Discourse Parsing and Summarization, Cambridge, MA, MIT Press, 2000.

    MATH  Google Scholar 

  33. K.R. McKeown, J.L. Klavans, V. Hatzivassiloglou, R. Barzilay und E. Eskin, “Towards Multi-document Summarization by Reformulation: Progress and Prospects”, in Proceedings of AAAI’99,San Francisco, Morgan Kaufmann 1999, pp. 453460

    Google Scholar 

  34. K. McKeown und D. Radev, “Generating Summaries of Multiple News Articles”, in I. Mani und M.T. Maybury (Eds.), Advances in Automatic Text Summarization, Cambridge, MA, MIT Press, 1999, pp. 381–389.

    Google Scholar 

  35. M.-F. Moens, Automatic Indexing and Abstracting of Document Texts (The Kluwer International Series on Information Retrieval 6 ). Kluwer Academic Publishers, Boston, 2000.

    Google Scholar 

  36. M.-F. Moens und R. De Busser, “Generic Topic Segmentation of Document Texts”, in Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, 2001, pp. 418–419.

    Google Scholar 

  37. J. Morris und G. Hirst, “Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text”, Computational Linguistics 17 (1), 1991, pp. 21–43.

    Google Scholar 

  38. E. Riloff, “An Empirical Study for Automated Dictionary Construction for Information Extraction in Three Domains”, Artificial Intelligence, 85, 1996, pp. 101–134.

    Article  Google Scholar 

  39. G. Salton, J. Allan, C. Buckley und A. Singhal, “Automatic Analysis, Theme Generation, and Summarization of Machine-readable Texts”, Science, 264, 1994, pp. 1421–1426.

    Article  Google Scholar 

  40. R.C. Schank, “Conceptual Dependency: A Theory of Natural Language Understanding”, Cognitive Psychology 3, 1972, pp. 552–631.

    Article  Google Scholar 

  41. S. Soderland, “Learning Information Extraction Rules for Semi-structured and Free Text”, Machine Learning, 34 (1–3), 1999, pp. 233–272.

    Article  MATH  Google Scholar 

  42. K. Sparck Jones, “What Might Be in a Summary?” in G. Knorz, J. Krause, und C. Womser-Hacker (Eds.), Information Retrieval ‘83: Von der Modellierung zur Anwendung, Konstanz, Universitätsverlag, 1993, pp. 9–26.

    Google Scholar 

  43. A. Tombros und M. Sanderson, “Advantages of Query Biased Summaries in Information Retrieval”, in W.B. Croft, A. Moffat, C.J. van Rijsbergen, R. Wilkinson und J. Zobel (Eds.), Proceedings of the 21“ Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, ACM, 1998, pp. 2–10.

    Google Scholar 

  44. H. van Halteren, “Writing Style Recognition and Sentence Extraction”, in Proceedings of the Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 50–63.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media New York

About this chapter

Cite this chapter

Moens, MF., Angheluta, R., De Busser, R. (2003). Summarization of Texts Found on the World Wide Web. In: Abramowicz, W. (eds) Knowledge-Based Information Retrieval and Filtering from the Web. The Springer International Series in Engineering and Computer Science, vol 746. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3739-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-3739-4_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-5376-6

  • Online ISBN: 978-1-4757-3739-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics