# Inverse Document Frequency

Reference work entry

First Online:

**DOI:**https://doi.org/10.1007/978-1-4614-8265-9_933

## Synonyms

IDF

## Definition

The inverse document frequency (*IDF*) is a statistical weight used for measuring the importance of a term in a text document collection. The document frequency *DF* of a term is defined by the number of documents in which a term appears.

## Key Points

Karen Sparck-Jones first proposed that terms with low document frequency are more valuable than terms with high document frequency during retrieval [2]. In other words, the underlying idea of *IDF* is that the more frequently the term appears in the collection, the less informative the term is.

In its simplest form, the

*IDF*weight of a term is assigned as follows [ 3]:
$$ \mathrm{IDF}={ \log}_2\frac{\mathrm{N}}{\mathrm{DF}} $$

This is a preview of subscription content, log in to check access.

## Recommended Reading

- 1.Robertson SE, Walker S. On relevance weights with little relevance information. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1997. p. 16–24.Google Scholar
- 2.Sparck-Jones K. A statistical interpretation of term specificity and its application in retrieval. J Doc. 1972;28(1):11–20.CrossRefGoogle Scholar
- 3.Sparck-Jones K. Index term weighting. Inf Storage Retr. 1973;9(11):619–33.CrossRefGoogle Scholar

## Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018