Abstract
This chapter focuses on a very important problem for information management, which is also relevant for text mining applications, document search. The area of study that deals with this specific problem in detail is known as information retrieval. In this sense, here we will be presenting and discussing several methods and applications that are closely related to the field of information retrieval; however, it is important to mention that such field is indeed much broader and more extensive than what we actually explore here.
This chapter is organized as follows. First, in Sect. 11.1, we introduce the basic evaluation metrics of precision and recall, as well as present some examples on binary search. Then, in Sect. 11.2, we focus our attention on vector search, which is based on the vector space model presented in Chap. 8. In this section, we also discuss the problems of keyword extraction, relevance estimation and relevance feedback. Finally, in Sect. 11.3, we focus our attention on the problem of cross-language document search, for which we introduce some basic concepts and present some related examples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. Assison-Wesley Professional, Boston
Belkin NJ, Croft WB (1987) Retrieval techniques. In: Williams ME (ed) Annu Rev Inf Sci Technol, pp 109–154
Efthimiadis EN (1996) Query expansion. In: Williams ME (ed) Annual Rev Inf Syst Technol, vol 31, pp 121–187
Fox EA, Betrabet S, Koushik M, Lee WC (1992) Extended boolean models. In: Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice Hall, NJ
Grefenstette G (1998) Cross-language information retrieval. Kluwer, Dordrecht
Lee WC, Fox EA (1988) Experimental comparison of schemes for interpreting Boolean queries. Technical Report TR-88-27, Computer Science, Virginia Polytechnic Institute and State University
Littman ML, Dumais ST, Landauer TK (1998) Automatic cross-language information retrieval using latent semantic indexing. In: Grefenstette G (ed) Cross-language information retrieval. Kluwer, Dordrecht
Liu TY (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Robertson SE, Jones KS (1976) Relevance weighting of search terms. J Am Soc Inf Sci 27:129–146
Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system—experiments in automatic document processing. Prentice Hall, NJ, pp 313–323
Salton G, Wong A, Yang CS (1975) A vector space for automatic indexing. Commun ACM 18(11):613–620
Saracevic T (1975) Relevance: a review of and a framework for the thinking on the notion of information science. J Am Soc Inf Sci 26(6):321–343
Spärk Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Documentation 28(1):11–21
Witten IH, Moffat A, Bell TC (1999) Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann Publishing, San Francisco
Yu CT, Salton G (1976) Precision weighting: an effective automatic indexing method. J ACM 23(1):76–88
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Banchs, R.E. (2013). Document Search. In: Text Mining with MATLAB®. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4151-9_11
Download citation
DOI: https://doi.org/10.1007/978-1-4614-4151-9_11
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4150-2
Online ISBN: 978-1-4614-4151-9
eBook Packages: Computer ScienceComputer Science (R0)