Skip to main content

Metric Indexing for the Vector Model in Text Retrieval

  • Conference paper
String Processing and Information Retrieval (SPIRE 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3246))

Included in the following conference series:

Abstract

In the area of Text Retrieval, processing a query in the vector model has been verified to be qualitatively more effective than searching in the boolean model. However, in case of the classic vector model the current methods of processing many-term queries are inefficient, in case of LSI model there does not exist an efficient method for processing even the few-term queries. In this paper we propose a method of vector query processing based on metric indexing, which is efficient especially for the LSI model. In addition, we propose a concept of approximate semi-metric search, which can further improve the efficiency of retrieval process. Results of experiments made on moderate text collection are included.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Achlioptas, D.: Database-friendly random projections. In: Symposium on Principles of Database Systems (2001)

    Google Scholar 

  2. Anh, V.N., de Kretser, O., Moffat, A.: Vector-space ranking with effective early termination. In: Proceedings of the 24th annual international ACM SIGIR, pp. 35–42. ACM Press, New York (2001)

    Google Scholar 

  3. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, New York (1999)

    Google Scholar 

  4. Berry, M., Browne, M.: Understanding Search Engines, Mathematical Modeling and Text Retrieval. SIAM, Philadelphia (1999)

    MATH  Google Scholar 

  5. Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Knowledge Discovery and Data Mining, pp. 245–250 (2001)

    Google Scholar 

  6. Blott, S., Weber, R.: An Approximation-Based Data Structure for Similarity Search. Technical report, ESPRIT (1999)

    Google Scholar 

  7. Böhm, C., Berchtold, S., Keim, D.: Searching in High-Dimensional Spaces – Index Structures for Improving the Performance of Multimedia Databases. ACM Computing Surveys 33(3), 322–373 (2001)

    Article  Google Scholar 

  8. Chávez, E., Navarro, G.: A probabilistic spell for the curse of dimensionality. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, p. 147. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  9. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In: Proceedings of the 23rd Athens Intern. Conf. on VLDB, pp. 426–435. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  10. Corazza, P.: Introduction to metric-preserving functions. Amer. Math Monthly 104(4), 309–323 (1999)

    Article  MathSciNet  Google Scholar 

  11. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  12. Deppisch, U.: S-tree: A Dynamic Balanced Signature Index for Office Retrieval. In: Proceedings of ACM SIGIR (1986)

    Google Scholar 

  13. Faloutsos, C.: Signature-based text retrieval methods, a survey. IEEE Computer society Technical Committee on Data Engineering 13(1), 25–32 (1990)

    Google Scholar 

  14. Lee, D.L., Ren, L.: Document Ranking on Weight-Partitioned Signature Files. In: ACM TOIS 14, pp. 109–137 (1996)

    Google Scholar 

  15. Moffat, A., Zobel, J.: Fast ranking in limited space. In: Proceedings of ICDE 1994, pp. 428–437. IEEE Computer Society, Los Alamitos (1994)

    Google Scholar 

  16. Moravec, P., Pokorný, J., Snášel, V.: Vector Query with Signature Filtering. In: Proc. of the 6th Bussiness Information Systems Conference, USA, (2003)

    Google Scholar 

  17. Papadimitriou, C.H., Tamaki, H., Raghavan, P., Vempala, S.: Latent semantic indexing: A probabilistic analysis. In: Proocedings of the ACM Conference on Principles of Database Systems (PODS), Seattle, pp. 159–168 (1998)

    Google Scholar 

  18. Patella, M.: Similarity Search in Multimedia Databases. Dipartmento di Elettronica Informatica e Sistemistica, Bologna (1999)

    Google Scholar 

  19. Persin, M.: Document filtering for fast ranking. In: Proceedings of the 17th annual international ACM SIGIR, pp. 339–348. Springer, New York (1994)

    Google Scholar 

  20. Salton, G., McGill, M.: Introduction to Modern Information Retrieval, 1st edn. McGraw Hill Publications, New York (1983)

    MATH  Google Scholar 

  21. Skopal, T., Pokorný, J., Krátký, M., Snášel, V.: Revisiting M-tree Building Principles. In: Kalinichenko, L.A., Manthey, R., Thalheim, B., Wloka, U. (eds.) ADBIS 2003. LNCS, vol. 2798, pp. 148–162. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Skopal, T., Moravec, P., Pokorný, J., Snášel, V. (2004). Metric Indexing for the Vector Model in Text Retrieval. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30213-1_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23210-0

  • Online ISBN: 978-3-540-30213-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics