Winnow (algorithm)

The winnow algorithm^[1] is a technique from machine learning for learning a linear classifier from labeled examples. It is very similar to the perceptron algorithm. However, the perceptron algorithm uses an additive weight-update scheme, but winnow uses a multiplicative weight-update scheme that allows it to perform much better when many dimensions are irrelevant (hence its name). It is not a sophisticated algorithm but it scales well to high-dimensional spaces. During training, winnow is shown a sequence of positive and negative examples. From these it learns a decision hyperplane. It can also be used in the online learning setting, where the learning and the classification phase are not clearly separated.

The algorithm

The basic algorithm, Winnow1, is given as follows. The instance space is $X=\{0,1\}^{n}$ . The algorithm maintains non-negative weights $w_{i}$ for $i\in \{1...n\}$ which are initially set to 1. When the learner is given an example $(x_{1},...x_{n})$ , the learner follows the following prediction rule:

If $\sum _{i=1}^{n}w_{i}x_{i}>\Theta$ , then it predicts 1
Otherwise it predicts 0

Where $\Theta$ is a real number that is called the 'threshold'. Good bounds are obtained if $\Theta =n/2$ .

The update rule is (loosely):

If an example is correctly classified, do nothing.
If an example is predicted to be 1 but the correct result was 0, all of the weights involved in the mistake are set to zero (demotion step).
If an example is predicted to be 0 but the correct result was 1, all of the weights involved in the mistake are multiplied by $\alpha$ (promotion step).

A good value for $\alpha$ is 2.

Variations are also used. For example, Winnow2 is the same as Winnow1 except that in the demotion step the weights are divided by $\alpha$ instead of being set to 0.

Mistake bounds

If Winnow1 is run with $\alpha >1$ and $\Theta \geq 1/\alpha$ on a target function that is a $k$ -literal monotone disjunction given by $f(x_{1},...x_{n})=x_{i_{1}}\cup ...\cup x_{i_{k}}$ , then for any sequence of instances the total number of mistakes is bounded by $\alpha k(\log _{\alpha }\Theta +1)+{\frac {n}{\Theta }}$ .