Kernel perceptron
In machine learning, the kernel perceptron is a variant of the popular perceptron learning algorithm that can learn kernel machines, i.e. non-linear classifiers that employ a kernel function to compute the similarity of unseen samples to training samples.
Preliminaries
The perceptron algorithm
The perceptron algorithm is an old but popular online learning algorithm that operates by a principle called "error-driven learning": it iteratively improves a model by running it on training samples, then updating the model whenever it finds it has made an incorrect classification wrt. a supervised signal. The model learned by the standard perceptron algorithm is a linear binary classifier: a vector of weights w (and optionally an intercept term b) that is used to classify a sample vector x as class "one" or class "minus one" according to
where a zero is arbitrarily mapped to one or minus one.
In pseudocode, the perceptron algorithm is given by:
- Initialize w to an all-zero vector of length p, the number of predictors (features).
- For each training example x with ground truth label y ∈ {-1, 1}:
- Let ŷ = sgn(wT x).[1]
- If ŷ ≠ y, update w := w + y x.
- Repeat this procedure until convergence, or for a set amount of iterations ("epochs").
Kernel machines
By contrast with the linear models learned by the perceptron, a kernel machine[2] is a classifier that stores a subset of its training examples xᵢ, associates with each a weight αᵢ, and makes decisions for new samples x' by evaluating
Here, K is some kernel function. Formally, a kernel function is a non-negative semidefinite kernel (see Mercer's condition); intuitively, it can be thought of as a similarity function between samples. Each function x' ↦ K(xᵢ, x') serves as a basis function in the classification.
Algorithm
The kernel perceptron algorithm is a perceptron variant that learns kernel machines. In pseudocode, it can be formulated as follows:[3]
- Initialize α to an all-zeros vector of length n, the number of training samples.
- For some fixed number of iterations, or until the model makes no mistakes on the training set:
- For each training example x, y:
- Let
- If ŷ ≠ y, perform an update:
- αᵢ := αᵢ + 1
- For each training example x, y:
When the linear kernel K(x, x') = x ⋅ x' is used, this algorithm is a dual formulation of the ordinary perceptron algorithm.