Inverse Document Frequency
The inverse document frequency (IDF) is a statistical weight used for measuring the importance of a term in a text document collection. The document frequency DF of a term is defined by the number of documents in which a term appears.
Karen Sparck-Jones first proposed that terms with low document frequency are more valuable than terms with high document frequency during retrieval . In other words, the underlying idea of IDF is that the more frequently the term appears in the collection, the less informative the term is.
KeywordsData Mining Data Storage Statistical Weight Knowledge Discovery Database Management
- 1.Robertson SE, Walker S. On relevance weights with little relevance information. In Proceedings of 20th Annual international ACM SIGIR conference on research and development in information retrieval. p 16–24. 1997.Google Scholar