Jump to content

Factored language model

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Gang Ji (talk | contribs) at 00:13, 14 July 2005. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Factored language model (FLM) is an extension of conventional Language model. In an FLM, each word is viewed as a vector of k factors: . An FLM provides the probabilistic model where the prediction of factor is based on parents . For an example, if represents word token and represents Part of speech tag for English, the model gives a model for predicting current work token based on traditional Ngram model as well as Part of speech tag of the previous word.

A main advantage of factored language models is they allow users to put in linguistic knowledge such as explicitly model the relationship between word tokens and Part of speech in English, or morphological information (stems, root, etc.) in Arabic.

Like N-gram models, smoothing techniques are necessary in parameter estimation. In particular, generalized backing-off is used in training an FLM.

References

  • {{cite conference}}: Empty citation (help)