Skip to main content

N-Gram Models

  • Reference work entry
  • First Online:
  • 54 Accesses

Definition

In language modeling, n-gram models are probabilistic models of text that use some limited amount of history, or word dependencies, where n refers to the number of words that participate in the dependence relation.

Key Points

In automatic speech recognition, n-grams are important to model some of the structural usage of natural language, i.e., the model uses word dependencies to assign a higher probability to “how are you today” than to “are how today you,” although both phrases contain the exact same words. If used in information retrieval, simple unigram language models (n-gram models with n = 1), i.e., models that do not use term dependencies, result in good quality retrieval in many studies. The use of bigram models (n-gram models with n= 2) would allow the system to model direct term dependencies, and treat the occurrence of “New York” differently from separate occurrences of “New” and “York,” possibly improving retrieval performance. The use of trigram models would...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Metzler D, Bruce Croft W. A Markov random field model for term dependencies. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2005. p. 472–9.

    Google Scholar 

  2. Miller DRH, Leek T, Schwartz RM. A hidden Markov model information retrieval system. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1999. p. 214–21.

    Google Scholar 

  3. Song F, Bruce Croft W. A general language model for information retrieval. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1999. p. 4–9.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Djoerd Hiemstra .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Hiemstra, D. (2018). N-Gram Models. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_935

Download citation

Publish with us

Policies and ethics