Skip to main content

Document Search

  • Chapter
  • First Online:
Text Mining with MATLAB®
  • 6892 Accesses

Abstract

This chapter focuses on a very important problem for information management, which is also relevant for text mining applications, document search. The area of study that deals with this specific problem in detail is known as information retrieval. In this sense, here we will be presenting and discussing several methods and applications that are closely related to the field of information retrieval; however, it is important to mention that such field is indeed much broader and more extensive than what we actually explore here.

This chapter is organized as follows. First, in Sect. 11.1, we introduce the basic evaluation metrics of precision and recall, as well as present some examples on binary search. Then, in Sect. 11.2, we focus our attention on vector search, which is based on the vector space model presented in Chap. 8. In this section, we also discuss the problems of keyword extraction, relevance estimation and relevance feedback. Finally, in Sect. 11.3, we focus our attention on the problem of cross-language document search, for which we introduce some basic concepts and present some related examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. Assison-Wesley Professional, Boston

    Google Scholar 

  • Belkin NJ, Croft WB (1987) Retrieval techniques. In: Williams ME (ed) Annu Rev Inf Sci Technol, pp 109–154

    Google Scholar 

  • Efthimiadis EN (1996) Query expansion. In: Williams ME (ed) Annual Rev Inf Syst Technol, vol 31, pp 121–187

    Google Scholar 

  • Fox EA, Betrabet S, Koushik M, Lee WC (1992) Extended boolean models. In: Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice Hall, NJ

    Google Scholar 

  • Grefenstette G (1998) Cross-language information retrieval. Kluwer, Dordrecht

    Book  Google Scholar 

  • Lee WC, Fox EA (1988) Experimental comparison of schemes for interpreting Boolean queries. Technical Report TR-88-27, Computer Science, Virginia Polytechnic Institute and State University

    Google Scholar 

  • Littman ML, Dumais ST, Landauer TK (1998) Automatic cross-language information retrieval using latent semantic indexing. In: Grefenstette G (ed) Cross-language information retrieval. Kluwer, Dordrecht

    Google Scholar 

  • Liu TY (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331

    Article  Google Scholar 

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Robertson SE, Jones KS (1976) Relevance weighting of search terms. J Am Soc Inf Sci 27:129–146

    Article  Google Scholar 

  • Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system—experiments in automatic document processing. Prentice Hall, NJ, pp 313–323

    Google Scholar 

  • Salton G, Wong A, Yang CS (1975) A vector space for automatic indexing. Commun ACM 18(11):613–620

    Article  MATH  Google Scholar 

  • Saracevic T (1975) Relevance: a review of and a framework for the thinking on the notion of information science. J Am Soc Inf Sci 26(6):321–343

    Article  Google Scholar 

  • Spärk Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Documentation 28(1):11–21

    Article  Google Scholar 

  • Witten IH, Moffat A, Bell TC (1999) Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann Publishing, San Francisco

    Google Scholar 

  • Yu CT, Salton G (1976) Precision weighting: an effective automatic indexing method. J ACM 23(1):76–88

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael E. Banchs .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Banchs, R.E. (2013). Document Search. In: Text Mining with MATLAB®. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4151-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-4151-9_11

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-4150-2

  • Online ISBN: 978-1-4614-4151-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics