Skip to main content

TF*IDF

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems
  • 47 Accesses

Synonyms

Term frequency by inverse document frequency

Definition

A weighting function that depends on the term frequency (TF) in a given document calculated with its relative collection frequency (IDF). This weighting function is calculated as follows [1]. Assuming that term j occurs in at least one document d(dj ≠ 0), the inverse document frequency (IDF) would be

$$ {\mathrm{Log}}_2\left(N/{d}_j\right)+1={\mathrm{log}}_2N-{\mathrm{log}}_2{d}_j $$

The ratio dj/N is the fraction of documents in the collection that contain the term. The term frequency-inverse document frequency weight (TF*IDF) of term j in document i is defined by multiplying the term frequency by the inverse document frequency:

$$ {W}_{ij}={f_{\mathrm{ij}}}^{\ast}\left[{\mathrm{log}}_2N-{\mathrm{log}}_2{d}_j\right] $$

Where

  • N: number of documents in the collection

  • dj: number of documents containing term j

  • fij: frequency of term j in document i

  • Wij: is the weight of term j in document i

The use of the logarithm in the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Korfhage RR. Information storage and retrieval. New York: Wiley; 1997.

    Google Scholar 

  2. Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. Cambridge, UK: Cambridge University Press; 2008.

    Book  MATH  Google Scholar 

  3. Roelleke T. Information retrieval models: foundations & relationships. Morgan & Claypool Publishers; 2013.

    Google Scholar 

  4. Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Process Manag. 1988;24(4):513–23.

    Article  Google Scholar 

  5. Singhal A, Salton G, Mitra M, Buckley C. Document length normalization. Inf Process Manag. 1996;32(5):619–33.

    Article  Google Scholar 

  6. Sparck JK. A statistical interpretation of term specify and its application in retrieval. J Doc. 1972;28(1):11–20.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ibrahim Abu El-Khair .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Abu El-Khair, I. (2018). TF*IDF. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_956

Download citation

Publish with us

Policies and ethics