Jump to content

Winnow (algorithm)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 92.226.51.13 (talk) at 23:06, 29 May 2011. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The winnow algorithm[1] is a technique from machine learning for learning a linear classifier from labeled examples. It is very similar to the perceptron algorithm. However, the perceptron algorithm uses an additive weight-update scheme, but winnow uses a multiplicative weight-update scheme that allows it to perform much better when many dimensions are irrelevant (hence its name). It is not a sophisticated algorithm but it scales well to high-dimensional spaces. During training, winnow is shown a sequence of positive and negative examples. From these it learns a decision hyperplane. It can also be used in the online learning setting, where the learning and the classification phase are not clearly separated.

The algorithm

The basic algorithm, Winnow1, is given as follows. The instance space is . The algorithm maintains non-negative weights for which are initially set to 1. When the learner is given an example , the learner follows the following prediction rule:

  • If , then it predicts 1
  • Otherwise it predicts 0

Where is a real number that is called the 'threshold'. Good bounds are obtained if .

The update rule is (loosely):

  • If an example is correctly classified, do nothing.
  • If an example is predicted to be 1 but the correct result was 0, all of the weights involved in the mistake are set to zero (demotion step).
  • If an example is predicted to be 0 but the correct result was 1, all of the weights involved in the mistake are multiplied by (promotion step).

A good value for is 2.

Variations are also used. For example, Winnow2 is the same as Winnow1 except that in the demotion step the weights are divided by instead of being set to 0.

Mistake bounds

If Winnow1 is run with and on a target function that is a -literal monotone disjunction given by , then for any sequence of instances the total number of mistakes is bounded by .

References

Citations and notes

  1. ^ Littlestone, N. (1988) "Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm" Machine Learning 285-318(2)