This chapter focuses on a very important problem for information management, which is also relevant for text mining applications, document search. The area of study that deals with this specific problem in detail is known as information retrieval. In this sense, here we will be presenting and discussing several methods and applications that are closely related to the field of information retrieval; however, it is important to mention that such field is indeed much broader and more extensive than what we actually explore here.
This chapter is organized as follows. First, in Sect. 11.1, we introduce the basic evaluation metrics of precision and recall, as well as present some examples on binary search. Then, in Sect. 11.2, we focus our attention on vector search, which is based on the vector space model presented in Chap. 8. In this section, we also discuss the problems of keyword extraction, relevance estimation and relevance feedback. Finally, in Sect. 11.3, we focus our attention on the problem of cross-language document search, for which we introduce some basic concepts and present some related examples.
- Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. Assison-Wesley Professional, BostonGoogle Scholar
- Belkin NJ, Croft WB (1987) Retrieval techniques. In: Williams ME (ed) Annu Rev Inf Sci Technol, pp 109–154Google Scholar
- Efthimiadis EN (1996) Query expansion. In: Williams ME (ed) Annual Rev Inf Syst Technol, vol 31, pp 121–187Google Scholar
- Fox EA, Betrabet S, Koushik M, Lee WC (1992) Extended boolean models. In: Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice Hall, NJGoogle Scholar
- Lee WC, Fox EA (1988) Experimental comparison of schemes for interpreting Boolean queries. Technical Report TR-88-27, Computer Science, Virginia Polytechnic Institute and State UniversityGoogle Scholar
- Littman ML, Dumais ST, Landauer TK (1998) Automatic cross-language information retrieval using latent semantic indexing. In: Grefenstette G (ed) Cross-language information retrieval. Kluwer, DordrechtGoogle Scholar
- Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system—experiments in automatic document processing. Prentice Hall, NJ, pp 313–323Google Scholar
- Witten IH, Moffat A, Bell TC (1999) Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann Publishing, San FranciscoGoogle Scholar