Document Search

Banchs, Rafael E.

doi:10.1007/978-1-4614-4151-9_11

Rafael E. Banchs²

6892 Accesses

Abstract

This chapter focuses on a very important problem for information management, which is also relevant for text mining applications, document search. The area of study that deals with this specific problem in detail is known as information retrieval. In this sense, here we will be presenting and discussing several methods and applications that are closely related to the field of information retrieval; however, it is important to mention that such field is indeed much broader and more extensive than what we actually explore here.

This chapter is organized as follows. First, in Sect. 11.1, we introduce the basic evaluation metrics of precision and recall, as well as present some examples on binary search. Then, in Sect. 11.2, we focus our attention on vector search, which is based on the vector space model presented in Chap. 8. In this section, we also discuss the problems of keyword extraction, relevance estimation and relevance feedback. Finally, in Sect. 11.3, we focus our attention on the problem of cross-language document search, for which we introduce some basic concepts and present some related examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. Assison-Wesley Professional, Boston
Google Scholar
Belkin NJ, Croft WB (1987) Retrieval techniques. In: Williams ME (ed) Annu Rev Inf Sci Technol, pp 109–154
Google Scholar
Efthimiadis EN (1996) Query expansion. In: Williams ME (ed) Annual Rev Inf Syst Technol, vol 31, pp 121–187
Google Scholar
Fox EA, Betrabet S, Koushik M, Lee WC (1992) Extended boolean models. In: Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice Hall, NJ
Google Scholar
Grefenstette G (1998) Cross-language information retrieval. Kluwer, Dordrecht
Book Google Scholar
Lee WC, Fox EA (1988) Experimental comparison of schemes for interpreting Boolean queries. Technical Report TR-88-27, Computer Science, Virginia Polytechnic Institute and State University
Google Scholar
Littman ML, Dumais ST, Landauer TK (1998) Automatic cross-language information retrieval using latent semantic indexing. In: Grefenstette G (ed) Cross-language information retrieval. Kluwer, Dordrecht
Google Scholar
Liu TY (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331
Article Google Scholar
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Book MATH Google Scholar
Robertson SE, Jones KS (1976) Relevance weighting of search terms. J Am Soc Inf Sci 27:129–146
Article Google Scholar
Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system—experiments in automatic document processing. Prentice Hall, NJ, pp 313–323
Google Scholar
Salton G, Wong A, Yang CS (1975) A vector space for automatic indexing. Commun ACM 18(11):613–620
Article MATH Google Scholar
Saracevic T (1975) Relevance: a review of and a framework for the thinking on the notion of information science. J Am Soc Inf Sci 26(6):321–343
Article Google Scholar
Spärk Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Documentation 28(1):11–21
Article Google Scholar
Witten IH, Moffat A, Bell TC (1999) Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann Publishing, San Francisco
Google Scholar
Yu CT, Salton G (1976) Precision weighting: an effective automatic indexing method. J ACM 23(1):76–88
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

, , Barcelona
Rafael E. Banchs

Authors

Rafael E. Banchs
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafael E. Banchs .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Banchs, R.E. (2013). Document Search. In: Text Mining with MATLAB®. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4151-9_11

Download citation

DOI: https://doi.org/10.1007/978-1-4614-4151-9_11
Published: 14 August 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4150-2
Online ISBN: 978-1-4614-4151-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics