Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Document Length Normalization

Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_934

Synonyms

Length normalization; Term frequency normalization

Definition

Document length normalization adjusts the term frequency or the relevance score in order to normalize the effect of document length on the document ranking.

Key Points

The reasons for employing a document length normalization method in an IR system are quite subtle. In general, the effect observed on the ranking by the presence of many lengthy documents in a collection is to favor their retrieval with respect to shorter documents.

Singhal, Buckley and Mitra gave the following two reasons for adopting a length normalization in the vector space model [ 4]:
  1. 1.

    The same term usually occurs repeatedly in long documents.

     
  2. 2.

    The vocabulary of a long document is usually large.

     

In 1994, Robertson and Walker also studied the effect of document length in the context of the probabilistic model. They observed that:

Some documents may simply cover more material than others, […], a long document covers a similar scope to a...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Amati G. Probabilistic models for information retrieval based on divergence from randomness [PhD thesis]. University of Glasgow: Department of Computing Science; 2003.Google Scholar
  2. 2.
    Robertson SE, Walker S. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1994. p. 232–41.CrossRefGoogle Scholar
  3. 3.
    Robertson SE, Walker S, Jones S, Hancock-Beaulieu M. Okapi at trec-3. In: Proceedings of the 3rd Text Retrieval Conference; 1994.Google Scholar
  4. 4.
    Singhal A, Buckley C, Mitra M. Pivoted document length normalization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1996. p. 21–9.Google Scholar
  5. 5.
    Zhai C, Lafferty J. A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2001. p. 334–42.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of GlasgowGlasgowUK

Section editors and affiliations

  • Giambattista Amati
    • 1
  1. 1.Fondazione Ugo BordoniRomeItaly