Probability Smoothing

Hiemstra, Djoerd

doi:10.1007/978-1-4614-8265-9_936

Probability Smoothing

Djoerd Hiemstra³

Reference work entry
First Online: 01 January 2018

14 Accesses

Definition

Probability smoothing is a language modeling technique that assigns some nonzero probability to events that were unseen in the training data. This has the effect that the probability mass is divided over more events; hence, the probability distribution becomes more smooth.

Key Points

Smoothing overcomes the so-called sparse data problem, that is, many events that are plausible in reality are not found in the data used to estimate probabilities. When using maximum likelihood estimates, unseen events are assigned a zero probability. In case of information retrieval, most events are unseen in the data, even if simple unigram language models are used documents that are relatively short (say on average several hundreds of words), whereas the vocabulary is typically big (maybe millions of words), so the vast majority of words does not occur in the document. A small document about “information retrieval” might not mention the word “search,” but that does not mean it is not relevant...

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Author information

Authors and Affiliations

University of Twente, Enschede, The Netherlands
Djoerd Hiemstra

Authors

Djoerd Hiemstra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Djoerd Hiemstra .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Section Editor information

Fondazione Ugo Bordoni, Rome, Italy
Giambattista Amati

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Hiemstra, D. (2018). Probability Smoothing. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_936

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_936
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Probability Smoothing

Definition

Key Points

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Definition

Key Points

Buying options

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation