Automatic Text Summarization Using Unsupervised and Semi-supervised Learning

Amini, Massih-Reza; Gallinari, Patrick

doi:10.1007/3-540-44794-6_2

Massih-Reza Amini³ &
Patrick Gallinari³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2168))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

4036 Accesses
6 Citations

Abstract

This paper investigates a new approach for unsupervised and semisupervised learning. We show that this method is an instance of the Classification EM algorithm in the case of gaussian densities. Its originality is that it relies on a discriminant approach whereas classical methods for unsupervised and semi-supervised learning rely on density estimation. This idea is used to improve a generic document summarization system, it is evaluated on the Reuters news-wire corpus and compared to other strategies.

Download to read the full chapter text

Chapter PDF

Enhancing Semi-supevised Text Classification Using Document Summaries

Benchmarking Semantic, Centroid, and Graph-Based Approaches for Multi-document Summarization

Optimization in Extractive Summarization Processes Through Automatic Classification

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Anderson J.A., Richardson S.C. Logistic Discrimination and Bias correction in maximum likelihood estimation. Technometrics, 21 (1979) 71–78.
Article MATH Google Scholar
Barzilay R., Elhadad M. Using lexical chains for text summarization. Proceedings of the ACL’ 97/EACL’97 Workshop on Intelligent Scalable Text Summarization, (1997) 10–17.
Google Scholar
Blum A., Mitchell T. Combining Labeled and Unlabeled Data with Co-Training. Proceedings of the 1998 Conference on Computational Learning Theory. (1998).
Google Scholar
Carbonell J.G., Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proceedings of the 21st ACM SIGIR, (1998) 335–336.
Google Scholar
Celeux G., Govaert G. A classification EM algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis. 14 (1992) 315–332.
Article MATH MathSciNet Google Scholar
Chuang W.T., Yang J. Extracting sentence segments for text summarization: a machine learning approach. Proceedings of the 23^rd ACM SIGIR. (2000) 152–159.
Google Scholar
Duda R. O., Hart P. T. Pattern Recognition and Scene Analysis. Edn. Wiley (1973).
Google Scholar
Goldstein J., Kantrowitz M., Mittal V., Carbonell J. Summarizing Text Documents: Sentence Selection and Evaluation Metrics. Proceedings of the 22nd ACM SIGIR (1999) 121–127.
Google Scholar
Klavans J.L., Shaw J. Lexical semantics in summarization. Proceedings of the First Annual Workshop of the IFIP working Group for NLP and KR. (1995).
Google Scholar
Knaus D., Mittendorf E., Schauble P., Sheridan P. Highlighting Relevant Passages for Users of the Interactive SPIDER Retrieval System. in TREC-4 proceedings. (1994).
Google Scholar
Kupiec J., Pedersen J., Chen F. A. Trainable Document Summarizer. Proceedings of the 18th ACM SIGIR. (1995) 68–73.
Google Scholar
Luhn P.H. Automatic creation of literature abstracts. IBM Journal (1958) 159–165.
Google Scholar
Mani I., Bloedorn E. Machine Learning of Generic and User-Focused Summarization. Proceedings of the Fifteenth National Conference on AI. (1998) 821–826.
Google Scholar
Marcu D. From discourse structures to text summaries. Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization. (1997) 82–88.
Google Scholar
McLachlan G.J. Discriminant Analysis and Statistical Pattern Recognition. Edn. John Wiley & Sons, New-York (1992).
Google Scholar
Miller D., Uyar H. A Mixture of Experts classifier with learning based on both labeled and unlabeled data. Advances in Neural Information Processing Systems. 9 (1996) 571–577.
Google Scholar
Mittal V., Kantrowitz M., Goldstein J., Carbonell J. Selecting Text Spans for Document Summaries: Heuristics and Metrics. Proceedings of the 6th National Conference on AI. (1999).
Google Scholar
Mitra M., Singhal A., Buckley C. Automatic Text Summarization by Paragraph Extraction. Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization. (1997) 31–36.
Google Scholar
Nigam K., McCallum A., Thrun A., Mitchell T. Text Classification from labeled and unlabeled documents using EM. In proceedings of National Conference on Artificial Intelligence. (1998).
Google Scholar
http://boardwatch.internet.com/mag/95/oct/bwm9.html
NIST. TIPSTER Information-Retrieval Text Research Collection on CD-ROM. National Institute of Standards and Technology, Gaithersburg, Maryland. (1993).
Google Scholar
Radev D., McKeown K. Generating natural language summaries from multiple online sources. Computational Linguistics. (1998).
Google Scholar
Roth V., Steinhage V. Nonlinear Discriminant Analysis using Kernel Functions. Advances in Neural Information Processing Systems. 12 (1999).
Google Scholar
Scott A.J., Symons M.J. Clustering Methods based on Likelihood Ratio Criteria. Biometrics. 27 (1991) 387–397.
Article Google Scholar
Sparck Jones K.: Discourse modeling for automatic summarizing. Technical Report 29D, Computer laboratory, university of Cambridge. (1993).
Google Scholar
Strzalkowski T., Wang J., Wise B. A robust practical text summarization system. Proceedings of the Fifteenth National Conference on AI. (1998) 26–30.
Google Scholar
SUMMAC. TIPSTER Text Summarization Evaluation Conference (SUMMAC). http://www-nlpir.nist.gov/related_projects/tipster_summac/
Symons M.J. Clustering Criteria and Multivariate Normal Mixture. Biometrics. 37 (1981) 35–43.
Article MATH MathSciNet Google Scholar
Teufel S., Moens M. Sentence Extraction as a Classification Task. Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization. (1997). 58–65.
Google Scholar
Xu J., Croft W.B. Query Expansion Using Local and Global Document Analysis. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1996). 4–11.
Google Scholar
Zechner K.: Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences. COLING. (1996) 986–989.
Google Scholar

Download references

Author information

Authors and Affiliations

LIP6, University of Paris 6, Case 169, 4 Place Jussieu, F - 75252, Paris cedex 05, France
Massih-Reza Amini & Patrick Gallinari

Authors

Massih-Reza Amini
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Gallinari
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Albert-Ludwigs University Freiburg, Georges Köhler-Allee, Geb. 079, 79110, Freiburg, Germany
Luc De Raedt
Inst.of Information and Computing Sciences Dept. of Mathematics and Computer Science, University of Utrecht, Padualaan 14, de Uithof, 3508, TB Utrecht, The Netherlands
Arno Siebes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amini, MR., Gallinari, P. (2001). Automatic Text Summarization Using Unsupervised and Semi-supervised Learning. In: De Raedt, L., Siebes, A. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2001. Lecture Notes in Computer Science(), vol 2168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44794-6_2

Download citation

DOI: https://doi.org/10.1007/3-540-44794-6_2
Published: 28 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42534-2
Online ISBN: 978-3-540-44794-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Automatic Text Summarization Using Unsupervised and Semi-supervised Learning

Abstract

Chapter PDF

Similar content being viewed by others

Enhancing Semi-supevised Text Classification Using Document Summaries

Benchmarking Semantic, Centroid, and Graph-Based Approaches for Multi-document Summarization

Optimization in Extractive Summarization Processes Through Automatic Classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automatic Text Summarization Using Unsupervised and Semi-supervised Learning

Abstract

Chapter PDF

Similar content being viewed by others

Enhancing Semi-supevised Text Classification Using Document Summaries

Benchmarking Semantic, Centroid, and Graph-Based Approaches for Multi-document Summarization

Optimization in Extractive Summarization Processes Through Automatic Classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation