Abstract
In this paper, we propose an approach to extracting the significant terms in a document by the quantification methods which are both singular value decomposition (SVD) and principal component analysis (PCA). The SVD can remove the noise of variability in term usage of an original sentence-term matrix by using the singular values acquired after computing the SVD. This adjusted sentence-term matrix, which have removed its noisy usage of terms, can be used to perform the PCA, since the dimensionality of the revised matrix is the same as that of the original. Since the PCA can be used to extract the significant terms on the basis of the eigenvalue-eigenvector pairs for the sentence-term matrix, the extracted terms by the revised matrix instead of the original can be regarded as more effective or appropriate. Experimental results on Korean newspaper articles in automatic summarization show that the proposed method is superior to that over the only PCA.
This research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)
Barzilay, R., Elhadad, M.: Using Lexical chains for Text Summarization. In: Mani, I., Maybury, M.T. (eds.) Advances in automatic text summarization, pp. 111–121. The MIT Press, Cambridge (1999)
Deerwester, S., Dumais, S.T., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 381–407 (1990)
Edmundson, H.P.: New Methods in Automatic Extracting. In: Mani, I., Maybury, M.T. (eds.) Advances in automatic text summarization, pp. 23–42. The MIT Press, Cambridge (1999)
Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis, 3rd edn. Prentice-Hall, Englewood Cliffs (1992)
Lee, C., Kim, M., Park, H.: Automatic Summarization Based on Principal Component Analysis. In: Pires, F.M., Abreu, S.P. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902, pp. 409–413. Springer, Heidelberg (2003)
Mani, I.: Automatic Summarization. John Benjamins Publishing Company, Amsterdam (2001)
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes in C++, 2nd edn. Cambridge University Press, New York (1992/2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, C., Choe, H., Park, H., Ock, C. (2005). Extracting the Significant Terms from a Sentence-Term Matrix by Removal of the Noise in Term Usage. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_9
Download citation
DOI: https://doi.org/10.1007/11562382_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)