Skip to main content

Identification of Critical Values in Latent Semantic Indexing

  • Chapter
  • First Online:
Foundations of Data Mining and knowledge Discovery

Part of the book series: Studies in Computational Intelligence ((SCI,volume 6))

Abstract

In this chapter we analyze the values used by Latent Semantic Indexing (LSI) for information retrieval. By manipulating the values in the Singular Value Decomposition (SVD) matrices, we find that a significant fraction of the values have little effect on overall performance, and can thus be removed (changed to zero). This allows us to convert the dense term by dimension and document by dimension matrices into sparse matrices by identifying and removing those entries. We empirically show that these entries are unimportant by presenting retrieval and runtime performance results, using seven collections, which show that removal of up 70% of the values in the term by dimension matrix results in similar or improved retrieval performance (as compared to LSI). Removal of 90% of the values degrades retrieval performance slightly for smaller collections, but improves retrieval performance by 60% on the large collection we tested. Our approach additionally has the computational benefit of reducing memory requirements and query response time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Authors

Editor information

Tsau Young Lin Setsuo Ohsuga Churn-Jung Liau Xiaohua Hu Shusaku Tsumoto

Rights and permissions

Reprints and permissions

About this chapter

Cite this chapter

Kontostathis, A., M. Pottenger, W., D. Davison, B. Identification of Critical Values in Latent Semantic Indexing. In: Young Lin, T., Ohsuga, S., Liau, CJ., Hu, X., Tsumoto, S. (eds) Foundations of Data Mining and knowledge Discovery. Studies in Computational Intelligence, vol 6. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11498186_19

Download citation

  • DOI: https://doi.org/10.1007/11498186_19

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26257-2

  • Online ISBN: 978-3-540-32408-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics