Advertisement

Multi-Document Viewpoint Summarization Focused on Facts, Opinion and Knowledge

  • Yohei Seki
  • Koji Eguchi
  • Noriko Kando
Part of the The Information Retrieval Series book series (INRE, volume 20)

Abstract

An interactive information retrieval system that provides different types of summaries of retrieved documents according to each user’s information needs, situation, or purpose of search can be effective for understanding document content. The purpose of this study is to build a multi-document summarizer, “Viewpoint Summarizer With Interactive clustering on Multidocuments (v-SWIM)”, which produces summaries according to such viewpoints. We tested its effectiveness on a new test collection, ViewSumm30, which contains human-made reference summaries of three different summary types for each of the 30 document sets. Once a set of documents on a topic (e.g., documents retrieved by a search engine) is provided to v-SWIM, it returns a list of topics discussed in the given document set, so that the user can select a topic or topics of interest as well as the summary type, such as fact-reporting, opinion-oriented or knowledge-focused, and produces a summary from the viewpoints of the topics and summary type selected by the user. We assume that sentence types and document genres are related to the types of information included in the source documents and are useful for selecting appropriate information for each of the summary types. “Sentence type” defines the type of information in a sentence. “Document genre” defines the type of information in a document. The results of the experiments showed that the proposed system using automatically identified sentence types and document genres of the source documents improved the coverage of the system-produced fact-reporting, opinion-oriented, and knowledge-focused summaries, 13.14%, 34.23%, and 15.89%, respectively, compared with our baseline system which did not differentiate sentence types or document genres.

Keywords

multi-document summarization viewpoint opinion genre classification sentence type 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

8. Bibliography

  1. Angheluta, R., Moens, M. F., and De Busser, R. (2003) K. U. Leuven Summarization System-DUC 2003. In Proceedings of the Workshop on Text Summarization (DUC 2003) at the 2003 Human Language Technology Conference (HLT/NAACL 2003), Edmonton, Canada.Google Scholar
  2. Bazerman, C. (2004) Speech Acts, Genres, and Activity Systems: How Texts Organize Activity and People. In Bazerman, C. and Prior, P. (Eds.) What Writing Does and How It Does It — An Introduction to Analyzing Texts and Textual Practices. 309–339. Lawrence Erlbaum Associates, Mahwah, NJ.Google Scholar
  3. Biber, D., Conrad, S., and Reppen, R. (1998) Corpus Linguistics-Investigating Language Structure and Use (Reprinted, 2002). Cambridge Approaches to Linguistics. Cambridge University Press.Google Scholar
  4. Borlund, P. (2003) The Concept of Relevance in IR. Journal of the American Society for Information Science and Technology, 54(10), 913–925.CrossRefGoogle Scholar
  5. Cardie, C., Wiebe, J., Wilson, T., and Litman, D. (2003) Combining Low-level and Summary Representations of Opinions for Multi-Perspective Question Answering. In AAAI Spring Symposium on New Directions in Question Answering, 20–27.Google Scholar
  6. Finn, A., Kushmerick, N., and Smyth, B. (2002) Genre Classification and Domain Transfer for Information Filtering. In Crestani, F., Girolami, M., and van Rijsbergen, C. J. (Eds.) Proceedings of ECIR 2002 Advances in Information Retrieval, 24th BCS-IRSG European Colloquium on IR Research, Glasgow, UK, 353–362. Published in Lecture Notes in Computer Science 2291, Springer-Verlag, Heidelberg, Germany.Google Scholar
  7. Harman, D. and Over, P. (2004) The Effects of Human Variation in DUC Summarization Evaluation. In Proceedings of Text Summarization Branches Out, Workshop at the 42nd ACL 2004, Barcelona, Spain, 10–17.Google Scholar
  8. Hatzivassiloglou, V., Klavans, J. L., Holcombe, M. L., Barzilay, R., Kan, M. Y., and McKeown, K. R. (2001) Simfinder: A Flexible Clustering Tool for Summarization. In Proceedings of the Workshop on Automatic Summarization at the Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2001), Pittsburgh, PA, 41–49.Google Scholar
  9. Hirao, T., Okumura, M., Fukushima, T. and Nanba, H. (2004) Text Summarization Challenge 3: Text summarization evaluation at NTCIR Workshop 4. In Proceedings of the Fourth NTCIR Workshop on Research in Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Summarization. National Institute of Informatics, Japan. Available from: <http://research.nii.ac.jp/ntcir>.Google Scholar
  10. Joachims, T. (2002) Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms. Kluwer Academic Publishers.Google Scholar
  11. Kando, N. (2004) Overview of the Fourth NTCIR Workshop. In Proceedings of the Fourth NTCIR Workshop on Research in Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Summarization. National Institute of Informatics, Japan. Available from: <http://research.nii.ac.jp/ntcir>.Google Scholar
  12. Kando, N. (1996) Text Structure Analysis Based on Human Recognition: Cases of Japanese Newspaper Articles and English Newspaper Articles (in Japanese). In Research Bulletin of National Center for Science Information Systems, 8, 107–126.Google Scholar
  13. Karlgren, J. and Cutting, D. (1994) Recognizing Text Genres with Simple Metrics Using Discriminant Analysis. In Proceedings of the 15th International Conference on Computational Linguistics (COLING 1994), Kyoto, Japan, 1071–1075.Google Scholar
  14. Kessler, B., Nunberg, G., Schuetze, H. (1997) Automatic Detection of Text Genre. In Proceedings of the 35th ACL/8th EACL 1997, Madrid, Spain, 32–38.Google Scholar
  15. Landis, J. R. and Koch, G. G. (1977) The Measurement of Observer Agreement for Categorical Data. Biometrics, 33, 159–74.MathSciNetGoogle Scholar
  16. Lin, C-Y., and Hovy, E. (2002) Manual and Automatic Evaluation of Summaries. In Proceedings of the Workshop on Automatic Summarization at the 40th ACL 2002, University of Pennsylvania, PA.Google Scholar
  17. Maña-López, M. J., Buenaga, M. D., and Gómez-Hidalgo, J. M. (2004) Multidocument Summarization: An Added Value to Clustering in Interactive Retrieval. ACM Transactions on Information Systems (TOIS), 22(2), 215–241.Google Scholar
  18. Mani, I. (2001) Automatic Summarization. Volume 3 of Natural Language Processing, John Benjamins Pub, Amsterdam, Netherlands.Google Scholar
  19. McKnight, L. and Srinivasan, P. (2003) Categorization of Sentence Types in Medical Abstracts. In Proceedings of the American Medical Informatics Association (AMIA) Symposium, Ottawa, Canada, 440–444.Google Scholar
  20. Pomerantz, J. (2002) Question Taxonomies for Digital Reference. Ph. D. thesis, Syracuse University.Google Scholar
  21. Radev, D. R., Jing, H., Sty, M., and Tam, D. (2004) Centroid-based Summarization of Multiple Documents. Information Processing and Management, 40(6), 919–938.CrossRefGoogle Scholar
  22. Rath, G. J., Resnick, A., and Savage, T. R. (1961) The Formation of Abstracts by the Selection of Sentences. American Documentation, 2(12), 139–208.Google Scholar
  23. Sebastiani, F. (2002) Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1), 1–47.CrossRefGoogle Scholar
  24. Seki, Y., Eguchi, K., and Kando, N. (2004a) User-focused Multi-Document Summarization with Paragraph Clustering and Sentence-type Filtering. In Proceedings of the Fourth NTCIR Workshop on Research in Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Summarization. National Institute of Informatics, Japan. Available from: <http://research.nii.ac.jp/ntcir>.Google Scholar
  25. Seki, Y., Eguchi, K., and Kando, N. (2004b) Compact Summarization for Mobile Phones. In Crestani, F., Dunlop, M. and Mizzaro, S. (Eds.) Mobile and Ubiquitous Information Access. 172–186. Published in Lecture Notes in Computer Science 2954, Springer-Verlag, Heidelberg, Germany.Google Scholar
  26. Simpson, J. A. and Weiner, E. S. C. (1991) The Oxford English Dictionary (second edition). Clarendon Press, New York.Google Scholar
  27. Spärck-Jones, K. (1999) Automatic Summarizing: Factors and Directions. In Mani, I., and Maybury, M. T. (Eds.) Advances in Automatic Text Summarization. 1–12. MIT Press, Cambridge, MA.Google Scholar
  28. Stamatatos, E., Fakotakis, N., and Kokkinakis, G. (2000) Text Genre Detection Using Common Word Frequencies. In Proceedings of the 18th International Conference on Computational Linguistics (COLING2000), Saarbrücken, Germany, 808–814.Google Scholar
  29. Stein G. C., Strzalkowski, T., and Wise, G. B. (2000) Evaluating Summaries for Multiple Documents in an Interactive Environment. In Proceedings of the Second International Conference on Language Resources & Evaluation (LREC2000), Athens, Greece, 1651–1657.Google Scholar
  30. Teufel, S. and Moens, M. (2002) Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status. Computational Linguistics, 28(4), 409–445.CrossRefGoogle Scholar
  31. The National Institute for Japanese Language (2004) Bunruigoihyo — enlarged and revised edition. Dainippon-Tosho.Google Scholar
  32. Xu, J. Weischedel, R., and Licuanan, A. (2004) Evaluation of an Extraction-Based Approach to Answering Definitional Questions. In Proceedings of the 27th ACM SIGIR 2004, Sheffield, UK, 418–424.Google Scholar

Copyright information

© Springer 2006

Authors and Affiliations

  • Yohei Seki
    • 1
    • 2
  • Koji Eguchi
    • 1
    • 2
  • Noriko Kando
    • 1
    • 2
  1. 1.Department of InformaticsThe Graduate University for Advanced Studies (Sokendai)TokyoJapan
  2. 2.National Institute of InformaticsTokyoJapan

Personalised recommendations