Skip to main content
Book cover

SIGIR ’94 pp 282–291Cite as

Improving Text Retrieval for the Routing Problem using Latent Semantic Indexing

  • Conference paper

Abstract

Latent Semantic Indexing (LSI) is a novel approach to information retrieval that attempts to model the underlying structure of term associations by transforming the traditional representation of documents as vectors of weighted term frequencies to a new coordinate space where both documents and terms are represented as linear combinations of underlying semantic factors. In previous research, LSI has produced a small improvement in retrieval performance. In this paper, we apply LSI to the routing task, which operates under the assumption that a sample of relevant and non-relevant documents is available to use in constructing the query. Once again, LSI slightly improves performance. However, when LSI is used is conjuction with statistical classification, there is a dramatic improvement in performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gerard Salton, editor. The SMART retrieval system: Experiments in Automatic Document Processing. Prentice-Hall, 1971.

    Google Scholar 

  2. S. Deerwester, S. Dumais, G. Fumas, T. Landauer, and R. Harshrnan. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41 (6): 391–407, 1990.

    Article  Google Scholar 

  3. Donna Harman. Overview of the first TREC conference. In Proc. of the 16th ACM/SIGIR Conference, pages 36–47, 1993.

    Google Scholar 

  4. Gerard Salton and Christopher Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41 (4): 288–297, 1990.

    Article  Google Scholar 

  5. Yonggang Qiu and H.P. Frei. Concept based query expansion. In Proc. of the 16th ACM/SIGIR Conference, pages 160–169, 1993.

    Google Scholar 

  6. Hinrich Schiitze. Dimensions of meaning. In Proceedings of Supercomputing ‘82, pages 787–796, 1992.

    Google Scholar 

  7. S.K.M. Wong, Y.J. Cai, and Y.Y. Yao. Computation of term associations by a neural network. In Proc. of the 16th ACM/SIGIR Conference, pages 107–115, 1993.

    Google Scholar 

  8. J. Friedman, J. Bentley, and R. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3 (3): 209–226, 1977.

    Article  MATH  Google Scholar 

  9. G. Fumas, S. Deerwester, S. Dumais, T. Landauer, R. Harshman, L. Streeter, and K. Lochbaum. Information retrieval using a singular value decomposition model of latent semantic structure. In Proc. of the 11th ACM/SIGIR Conference, pages 465–480, 1988.

    Google Scholar 

  10. B.T. Bartell, G.W. Cottrell, and R.K. Belew. Latent semantic indexing is an optimal special case of multidimensional scaling. In Proc. of the 15th ACM/SIGIR Conference, pages 161–167, 1992.

    Google Scholar 

  11. J.J. Rocchio. Relevance feedback in information retrieval. In Gerard Salton, editor, The SMART retrieval system: Experiments in Automatic Document Processing,pages 313–323. Prentice-Hall, 1971-

    Google Scholar 

  12. Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24 (5): 513–523, 1988.

    Article  Google Scholar 

  13. M. Berry. Large scale singular value computations. International Journal of Supercomputer Applications, 6 (1): 13–49, 1992.

    Google Scholar 

  14. David Hull. Using statistical testing in the evaluation of retrieval performance. In Proc. of the 16th ACM/SIGIR Conference, pages 329–338, 1993.

    Google Scholar 

  15. Donna Harman. Relevance feedback revisited. In Proc. of the 15th ACM/SIGIR Conference, pages 1–10, 1993.

    Google Scholar 

  16. Geoffrey J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition, pages 52–64, 341–346. Wiley, 1992.

    Google Scholar 

  17. Ross Wilkinson and Philip Hingston. Using the cosine measure in a neural network for document retrieval. In Proc. of the 14th ACM/SIGIR Conference, pages 202–210, 1991.

    Google Scholar 

  18. S.K.M. Wong, W. Ziarko, and P.C.N. Wong. Generalized vector space model in information retrieval. In Proc. of the 8th ACM/SIGIR Conference, pages 18–25, 1985.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag London Limited

About this paper

Cite this paper

Hull, D. (1994). Improving Text Retrieval for the Routing Problem using Latent Semantic Indexing. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2099-5_29

  • Publisher Name: Springer, London

  • Print ISBN: 978-3-540-19889-5

  • Online ISBN: 978-1-4471-2099-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics