Document Search

  • Rafael E. BanchsEmail author


This chapter focuses on a very important problem for information management, which is also relevant for text mining applications, document search. The area of study that deals with this specific problem in detail is known as information retrieval. In this sense, here we will be presenting and discussing several methods and applications that are closely related to the field of information retrieval; however, it is important to mention that such field is indeed much broader and more extensive than what we actually explore here.

This chapter is organized as follows. First, in Sect. 11.1, we introduce the basic evaluation metrics of precision and recall, as well as present some examples on binary search. Then, in Sect. 11.2, we focus our attention on vector search, which is based on the vector space model presented in  Chap. 8. In this section, we also discuss the problems of keyword extraction, relevance estimation and relevance feedback. Finally, in Sect. 11.3, we focus our attention on the problem of cross-language document search, for which we introduce some basic concepts and present some related examples.


Binary Search Query Expansion Vector Space Model Rank Position Target Category 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. Assison-Wesley Professional, BostonGoogle Scholar
  2. Belkin NJ, Croft WB (1987) Retrieval techniques. In: Williams ME (ed) Annu Rev Inf Sci Technol, pp 109–154Google Scholar
  3. Efthimiadis EN (1996) Query expansion. In: Williams ME (ed) Annual Rev Inf Syst Technol, vol 31, pp 121–187Google Scholar
  4. Fox EA, Betrabet S, Koushik M, Lee WC (1992) Extended boolean models. In: Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice Hall, NJGoogle Scholar
  5. Grefenstette G (1998) Cross-language information retrieval. Kluwer, DordrechtCrossRefGoogle Scholar
  6. Lee WC, Fox EA (1988) Experimental comparison of schemes for interpreting Boolean queries. Technical Report TR-88-27, Computer Science, Virginia Polytechnic Institute and State UniversityGoogle Scholar
  7. Littman ML, Dumais ST, Landauer TK (1998) Automatic cross-language information retrieval using latent semantic indexing. In: Grefenstette G (ed) Cross-language information retrieval. Kluwer, DordrechtGoogle Scholar
  8. Liu TY (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331CrossRefGoogle Scholar
  9. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, CambridgezbMATHCrossRefGoogle Scholar
  10. Robertson SE, Jones KS (1976) Relevance weighting of search terms. J Am Soc Inf Sci 27:129–146CrossRefGoogle Scholar
  11. Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system—experiments in automatic document processing. Prentice Hall, NJ, pp 313–323Google Scholar
  12. Salton G, Wong A, Yang CS (1975) A vector space for automatic indexing. Commun ACM 18(11):613–620zbMATHCrossRefGoogle Scholar
  13. Saracevic T (1975) Relevance: a review of and a framework for the thinking on the notion of information science. J Am Soc Inf Sci 26(6):321–343CrossRefGoogle Scholar
  14. Spärk Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Documentation 28(1):11–21CrossRefGoogle Scholar
  15. Witten IH, Moffat A, Bell TC (1999) Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann Publishing, San FranciscoGoogle Scholar
  16. Yu CT, Salton G (1976) Precision weighting: an effective automatic indexing method. J ACM 23(1):76–88MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Barcelona

Personalised recommendations