Abstract
In this paper, two novel approaches are proposed to extract important sentences from a document to create its summary. The first is a corpus-based approach using feature analysis. It brings up three new ideas: 1) to employ ranked position to emphasize the significance of sentence position, 2) to reshape word unit to achieve higher accuracy of keyword importance, and 3) to train a score function by the genetic algorithm for obtaining a suitable combination of feature weights. The second approach combines the ideas of latent semantic analysis and text relationship maps to interpret conceptual structures of a document. Both approaches are applied to Chinese text summarization. The two approaches were evaluated by using a data corpus composed of 100 articles about politics from New Taiwan Weekly, and when the compression ratio was 30%, average recalls of 52.0% and 45.6% were achieved respectively.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aone, C., Okurowski, M.E., Gorlinsky, J., Larsen, B.: A Trainable Summarizer with Knowledge Acquired from Robust NLP Techniques. In: Mani, I., Maybury, M. (eds.): Advances in Automated Text Summarization. MIT Press (1999) 71–80
Azzam, S., Humphreys, K., Gaizauskas, R.: Using Coreference Chains for Text Summarization. Processing of the ACL’99 Workshop on Coreference and its Applications. ACL, Baltimore (1999)
Barzilay, R., Elhadad, M.: Using Lexical Chains for Text Summarization. Processing of the Workshop on Intelligent Scalable Text Summarization. (1997)
Bellegarda, J.R., Butzberger, J.W., Chow, Y.L.: A Novel Word Clustering Algorithm Based on Latent Semantic Analysis. Conference on Acoustics, Speech, and Signal Processing, Vol. 1. IEEE (1996) 172–175
Edmundson, H.P.: New Methods in Automatic Extracting. In: Mani, I., Maybury, M. (eds.): Advances in Automated Text Summarization. MIT Press (1999) 23–42
Gong, Y., Liu, X.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. SIGIR. ACM, New Orleans Louisiana (2001)
Habn, U., Mani, I.: The Challenge of Automatic Summarization. Computer, Vol. 33, No. 2000. IEEE (2000) 29–36
Han, J., Kember, M.: In Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers (2001)
Hovy, E., Lin, C.Y.: Automated Text Summarization in SUMMARIST. In: Mani, I., Maybury, M. (eds.): Advances in Automated Text Summarization. MIT Press (1999) 81–94
Kim, J.H., Kim, J.H., Hwang, D.: Korean Text Summarization Using an Aggregative Similarity. Processing of the 5th International Workshop on Information Retrieval with Asian Languages. ACM (2000)
Kowalski, G. (ed.): Information Retrieval Systems: Theory and Implementation. Kluwer Academic Publishers (1997)
Kupiec, J., Pedersen, J., Chen, F.: A Trainable Document Summarizer. SIGIR. ACM, Seattle Washington (1995)
Landauer, T.K., Foltz, P.W., Laham, D.: An Introduction to Latent Semantic Analysis. Discourse Processes, Vol. 25. (1998) 259–284
Lin, C.Y.: Training a Selection Function for Extraction. CIKM. ACM, Kansas City (1999)
Mani, I., Maybury, M. (eds.): Advances in Automated Text Summarization. MIT Press (1999)
McKeown, K.R., Radev, D.R.: Generating Summaries of Multiple News Articles. SIGIR. ACM, Seattle Washington (1995) 74–82
Myaeng, S.H., Jang, D.: Development and Evaluation of a Statistical Based Document System. In: Mani, I., Maybury, M. (eds.): Advances in Automated Text Summarization. MIT Press (1999) 61–70
Salton, G., Singhal, A., Mitra, M., Buckley, C.: Automatic Text Structuring and Summarization. Information Processing & Management, Vol. 33, No. 2. Elsevier (1997) 193–207
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yeh, JY., Ke, HR., Yang, WP. (2002). Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis. In: Lim, E.P., et al. Digital Libraries: People, Knowledge, and Technology. ICADL 2002. Lecture Notes in Computer Science, vol 2555. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36227-4_8
Download citation
DOI: https://doi.org/10.1007/3-540-36227-4_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00261-1
Online ISBN: 978-3-540-36227-2
eBook Packages: Springer Book Archive