Jump to content

Kernel perceptron

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Qwertyus (talk | contribs) at 13:53, 16 March 2014. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In machine learning, the kernel perceptron is a variant of the popular perceptron learning algorithm that can learn kernel machines, i.e. non-linear classifiers that employ a kernel function to compute the similarity of unseen samples to training samples.

Preliminaries

The perceptron algorithm

The perceptron algorithm is an old but popular online learning algorithm that operates by a principle called "error-driven learning": it iteratively improves a model by running it on training samples, then updating the model whenever it finds it has made an incorrect classification wrt. a supervised signal. The model learned by the standard perceptron algorithm is a linear binary classifier: a vector of weights w (and optionally an intercept term b) that is used to classify a sample vector x as class "one" or class "minus one" according to

where a zero is arbitrarily mapped to one or minus one.[1]

In pseudocode, the perceptron algorithm is given by:

Initialize w to an all-zero vector of length p, the number of predictors (features).
For each training example x with ground truth label yᵢ ∈ {-1, 1}:
Let ŷ = sgn(wT x).
If ŷyᵢ, update w := w + yᵢ x.
Repeat this procedure until convergence, or for a set amount of iterations ("epochs").

Kernel machines

By contrast with the linear models learned by the perceptron, a kernel machine[2] is a classifier that stores a subset of its training examples x, associates with each a weight αᵢ, and makes decisions for new samples x' by evaluating

Here, K is some kernel function. Formally, a kernel function is a non-negative semidefinite kernel (see Mercer's condition); intuitively, it can be thought of as a similarity function between samples. Each function x'K(x, x') serves as a basis function in the classification.

Algorithm

The perceptron algorithm can be expressed in dual form, in which it can be seen to learn a linear combination of its n training samples,

The required changes to the algorithm are that the per-feature weight vector w is replaced with a per-example weight vector α, initially all zero, the update rule is replaced by αᵢ := αᵢ + 1, and predictions ŷ for a sample x are made by evaluating

By replacing the dot product (linear kernel) in the dual version of the perceptron algorithm, we can make it learn kernel machines, giving the final algorithm:[3]

Initialize α to an all-zeros vector of length n, the number of training samples.
For some fixed number of iterations, or until the model makes no mistakes on the training set:
For each training example x, y:
Let
If ŷy, perform an update:
αᵢ := αᵢ + 1

Notes and references

  1. ^ Using the hat to denote an estimated value.
  2. ^ Schölkopf, Bernhard; and Smola, Alexander J.; Learning with Kernels, MIT Press, Cambridge, MA, 2002. ISBN 0-262-19475-9
  3. ^ John Shawe-Taylor; Nello Cristianini (2004). Kernel Methods for Pattern Analysis. Cambridge University Press. pp. 241–242.