Jump to content

Kernel perceptron

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Qwertyus (talk | contribs) at 13:35, 16 March 2014 (The perceptron algorithm). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In machine learning, the kernel perceptron is a variant of the popular perceptron learning algorithm that can learn kernel machines, i.e. non-linear classifiers that employ a kernel function to compute the similarity of unseen samples to training samples.

Preliminaries

The perceptron algorithm

The perceptron algorithm is an old but popular online learning algorithm that operates by a principle called "error-driven learning": it iteratively improves a model by running it on training samples, then updating the model whenever it finds it has made an incorrect classification wrt. a supervised signal. The model learned by the standard perceptron algorithm is a linear binary classifier: a vector of weights w (and optionally an intercept term b) that is used to classify a sample vector x as class "one" or class "minus one" according to

where a zero is arbitrarily mapped to one or minus one.

In pseudocode, the perceptron algorithm is given by:

Initialize w to an all-zero vector of length p, the number of predictors (features).
For each training example x with ground truth label y ∈ {-1, 1}:
Let ŷ = sgn(wT x).[1]
If ŷy, update w := w + y x.
Repeat this procedure until convergence, or for a set amount of iterations ("epochs").

Kernel machines

By contrast with the linear models learned by the perceptron, a kernel machine[2] is a classifier that stores a subset of its training examples x, associates with each a weight αᵢ, and makes decisions for new samples x' by evaluating

Here, K is some kernel function. Formally, a kernel function is a non-negative semidefinite kernel (see Mercer's condition); intuitively, it can be thought of as a similarity function between samples. Each function x'K(x, x') serves as a basis function in the classification.

Algorithm

The kernel perceptron algorithm is a perceptron variant that learns kernel machines. In pseudocode, it can be formulated as follows:[3]

Initialize α to an all-zeros vector of length n, the number of training samples.
For some fixed number of iterations, or until the model makes no mistakes on the training set:
For each training example x, y:
Let
If ŷy, perform an update:
αᵢ := αᵢ + 1

When the linear kernel K(x, x') = xx' is used, this algorithm is a dual formulation of the ordinary perceptron algorithm.

Notes and references

  1. ^ Using the hat to denote an estimated value.
  2. ^ Schölkopf, Bernhard; and Smola, Alexander J.; Learning with Kernels, MIT Press, Cambridge, MA, 2002. ISBN 0-262-19475-9
  3. ^ John Shawe-Taylor; Nello Cristianini (2004). Kernel Methods for Pattern Analysis. Cambridge University Press. p. 242.