Jump to content

Conditional random field

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 129.132.245.49 (talk) at 14:07, 2 August 2011 (Error in definition corrected). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A conditional random field (CRF) is a type of discriminative undirected probabilistic graphical model. It is most often used for labeling or parsing of sequential data, such as natural language text or biological sequences[1] and computer vision[2] . Specifically, CRFs find applications in shallow parsing[3] , named entity recognition[4] and gene finding, among other tasks, being an alternative to the related hidden Markov models. In computer vision, CRFs are often used for image segmentation, object recognition and as a general approach to combine features from different sources.

Description

Lafferty, McCallum and Pereira (2001) define a CRF as follows:

Let be a graph such that , so that is indexed by the vertices of . Then is a conditional random field in case,

when conditioned on , the random variables obey the Markov property with respect to the graph: , where means that and are neighbors in .

What this means is that a CRF is an undirected graphical model whose nodes can be divided into exactly two disjoint sets and , the observed and output variables, respectively; the conditional distribution is then modeled. The Markov property means that for any pair of nodes there exists a single neighbour of such that is conditionally independent of , given and . This means that the graphical structure of (ignoring ) has to be cycle-free, and is tree-structured by definition.

Given the tree-structure of the output graph, where is the two endpoints of the edge , according to the Hammersley–Clifford theorem.[1] As opposed to a general undirected model, parameter estimation and exact inference are both tractable in this kind of model.

Inference

For general graphs, the problem of exact inference in CRFs is intractable. The inference problem for a CRF is basically the same as for an MRF and the same arguments hold.[5] For graphs with chain or tree structure, exact inference is possible. The algorithms used in these cases are analogous to the forward-backward and Viterbi algorithm for the case of HMMs.

Parameter Learning

Learning the parameters is usually done by maximum likelihood learning for . If all nodes have exponential family distributions and all nodes are observed during training, this optimization is convex.[5] It can be solved for example using gradient descent algorithms Quasi-Newton methods, such as the L-BFGS algorithm. On the other hand, if some variables are unobserved, the inference problem has to be solved for these variables. This is intractable to do exact in general graphs, so approximations have to be used.

Examples

In sequence modeling, the graph of interest is usually a chain graph. An input sequence of observed variables represents a sequence of observations and represents a hidden (or unknown) state variable that needs to be inferred given the observations. The are structured to form a chain, with an edge between each and . As well as having a simple interpretation of the as "labels" for each element in the input sequence, this layout admits efficient algorithms for:

  • model training, learning the conditional distributions between the and feature functions from some corpus of training data.
  • inference, determining the probability of a given label sequence given .
  • decoding, determining the most likely label sequence given .

The conditional dependency of each on is defined through a fixed set of feature functions of the form , which can informally be thought of as measurements on the input sequence that partially determine the likelihood of each possible value for . The model assigns each feature a numerical weight and combines them to determine the probability of a certain value for .

Linear-chain CRFs have many of the same applications as conceptually simpler hidden Markov models (HMMs), but relax certain assumptions about the input and output sequence distributions. An HMM can loosely be understood as a CRF with very specific feature functions that use constant probabilities to model state transitions and emissions. Conversely, a CRF can loosely be understood as a generalization of an HMM that makes the constant transition probabilities into arbitrary functions that vary across the positions in the sequence of hidden states, depending on the input sequence.

Notably in contrast to HMMs, CRFs can contain any number of feature functions, the feature functions can inspect the entire input sequence at any point during inference, and the range of the feature functions need not have a probabilistic interpretation.

Higher-order CRFs and semi-Markov CRFs

CRFs can be extended into higher order models by making each dependent on a fixed number of previous variables . Training and inference are only practical for small values of (such as o ≤ 5),[citation needed] since their computational cost increases exponentially with . Large-margin models for structured prediction, such as the structured Support Vector Machine can be seen as an alternative training procedure to CRFs.

There exists another generalization of CRFs, the semi-Markov conditional random field (semi-CRF), which models variable-length segmentations of the label sequence .[6] This provides much of the power of higher-order CRFs to model long-range dependencies of the , at a reasonable computational cost.

Software

This is a partial list of software that implement generic CRF tools.

This is a partial list of software that implement CRF related tools.

See also

References

  1. ^ a b "Conditional random fields: Probabilistic models for segmenting and labeling sequence data" (PDF). Proc. 18th International Conf. on Machine Learning. Morgan Kaufmann. 2001. pp. 282–289. {{cite conference}}: Unknown parameter |authors= ignored (help); Unknown parameter |booktitle= ignored (|book-title= suggested) (help)
  2. ^ Template:Cite article
  3. ^ Sha, F., Pereira, F. (2003). shallow parsing with conditional random fields.{{cite conference}}: CS1 maint: multiple names: authors list (link)
  4. ^ Settles, B. (2004). "Biomedical named entity recognition using conditional random fields and rich feature sets" (PDF). Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications. pp. 104--107. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)
  5. ^ a b Sutton, Charles; McCallum, Andrew (2010). "An Introduction to Conditional Random Fields". v1. arXiv:1011.4088 [stat.ML]. {{cite arXiv}}: Unknown parameter |version= ignored (help)
  6. ^ Sarawagi, Sunita (2005). "Semi-Markov conditional random fields for information extraction". Advances in Neural Information Processing Systems 17. Cambridge, MA: MIT Press. pp. 1185–1192. {{cite conference}}: External link in |title= (help); Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |editors= ignored (|editor= suggested) (help)

Further Reading

  • McCallum, A.: Efficiently inducing features of conditional random fields. In: Proc. 19th Conference on Uncertainty in Artificial Intelligence. (2003)
  • Wallach, H.M.: Conditional random fields: An introduction. Technical report MS-CIS-04-21, University of Pennsylvania (2004)
  • Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields for Relational Learning. In "Introduction to Statistical Relational Learning". Edited by Lise Getoor and Ben Taskar. MIT Press. (2006) Online PDF
  • Klinger, R., Tomanek, K.: Classical Probabilistic Models and Conditional Random Fields. Algorithm Engineering Report TR07-2-013, Department of Computer Science, Dortmund University of Technology, December 2007. ISSN 1864-4503. Online PDF