Information retrieval is based on the assumption that occurrences of indexing features in a document will tell us something about the relevance of this document. This assumption implies the cluster hypothesis: closely associated documents tend to be relevant to the same requests (van Rijsbergen 1979, p. 45). In this section, we will describe how texts can be characterized by the distribution of occurrences of textual indexing features. For consistency, we will use the more general terminology of multimedia information retrieval. In particular, we will use the notion of indexing features rather than indexing terms and feature frequency rather than term frequency, even though throughout this chapter about text retrieval, every indexing feature ϕi denotes a term and the feature frequency f f(ϕ i ,d j ) denotes the corresponding term frequency.
KeywordsRetrieval Method Document Frequency Stop Word Indexing Feature Document Length
Unable to display preview. Download preview PDF.
- 1.As usual iff means if and only if.Google Scholar