Word n-gram language model

An n-gram language model is a language model that models sequences of words as a Markov process. It makes use of the simplifying assumption that the probability of the next word in a sequence depends only on a fixed size window of previous words. A bigram model considers one previous word, a trigram model considers two, and in general, an n-gram model considers n-1 words of previous context.^[1]

For example, a bigram language model models the probability of the sentence I saw the red house as:

$P({\text{I, saw, the, red, house}})\approx P({\text{I}}\mid \langle s\rangle )P({\text{saw}}\mid {\text{I}})P({\text{the}}\mid {\text{saw}})P({\text{red}}\mid {\text{the}})P({\text{house}}\mid {\text{red}})P(\langle /s\rangle \mid {\text{house}})$

Where $\langle s\rangle$ and $\langle /s\rangle$ are special tokens denoting the start and end of a sentence.

These conditional probabilities may be estimated based on frequency counts in some text corpus. For example, $P({\text{saw}}\mid {\text{I}})$ can be naively estimated as the proportion of occurrences of the word I which are followed by saw in the corpus. The problem of sparsity (for example, if the bigram "red house" has zero occurrences in our corpus) may necessitate modifying the basic markov model by smoothing techniques, particularly when using larger context windows.^[1]

n-gram models are no longer commonly used in natural language processing research and applications, as they have been supplanted by state of the art deep learning methods, most recently large language models.

References

^ ^a ^b Jurafsky, Dan; Martin, James H. (7 January 2023). "N-gram Language Models". Speech and Language Processing (PDF) (3rd edition draft ed.). Retrieved 24 May 2022.

[jm-1] Jurafsky, Dan; Martin, James H. (7 January 2023). "N-gram Language Models". Speech and Language Processing (PDF) (3rd edition draft ed.). Retrieved 24 May 2022.

[1]