Inverse Document Frequency
The inverse document frequency (IDF) is a statistical weight used for measuring the importance of a term in a text document collection. The document frequency DF of a term is defined by the number of documents in which a term appears.
Karen Sparck-Jones first proposed that terms with low document frequency are more valuable than terms with high document frequency during retrieval . In other words, the underlying idea of IDF is that the more frequently the term appears in the collection, the less informative the term is.
- 1.Robertson SE, Walker S. On relevance weights with little relevance information. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1997. p. 16–24.Google Scholar