Abstract
In this chapter we analyze the values used by Latent Semantic Indexing (LSI) for information retrieval. By manipulating the values in the Singular Value Decomposition (SVD) matrices, we find that a significant fraction of the values have little effect on overall performance, and can thus be removed (changed to zero). This allows us to convert the dense term by dimension and document by dimension matrices into sparse matrices by identifying and removing those entries. We empirically show that these entries are unimportant by presenting retrieval and runtime performance results, using seven collections, which show that removal of up 70% of the values in the term by dimension matrix results in similar or improved retrieval performance (as compared to LSI). Removal of 90% of the values degrades retrieval performance slightly for smaller collections, but improves retrieval performance by 60% on the large collection we tested. Our approach additionally has the computational benefit of reducing memory requirements and query response time.
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Editor information
Rights and permissions
About this chapter
Cite this chapter
Kontostathis, A., M. Pottenger, W., D. Davison, B. Identification of Critical Values in Latent Semantic Indexing. In: Young Lin, T., Ohsuga, S., Liau, CJ., Hu, X., Tsumoto, S. (eds) Foundations of Data Mining and knowledge Discovery. Studies in Computational Intelligence, vol 6. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11498186_19
Download citation
DOI: https://doi.org/10.1007/11498186_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26257-2
Online ISBN: 978-3-540-32408-9
eBook Packages: EngineeringEngineering (R0)