Language model

Statistical language models are probability distributions defined on sequences of words, P(w_1..n). Language modeling has been used in many NLP applications such as part-of-speech tagging, parsing, speech recognition and information retrieval. Estimating sequences can become expensive in corpora where phrases or sentences can be arbitrarily long, and so these models are most often approximated using smoothed N-gram models like unigram, bigram and trigram.

In speech recognition, these models refer to a probabilistic distribution capturing the statistics of the generation of a language, and attempt to predict the next word in a speech sequence.

When used in information retrieval, a language model is associated with a document in a collection. With query Q as input, retrieved documents are ranked based on the probability that the document's language model would generate the terms of the query, P(Q|M_d).

References

{{cite conference}}: Empty citation (help)
{{cite conference}}: Empty citation (help)

This computing article is a stub. You can help Wikipedia by expanding it.