The basic algorithm, Winnow1, is as follows. The instance space is , that is, each instance is described as a set of Boolean-valued features. The algorithm maintains non-negative weights for , which are initially set to 1, one weight for each feature. When the learner is given an example , it applies the typical prediction rule for linear classifiers:
- If , then predict 1
- Otherwise predict 0
Here is a real number that is called the threshold. Together with the weights, the threshold defines a dividing hyperplane in the instance space. Good bounds are obtained if (see below).
For each example with which it is presented, the learner applies the following update rule:
- If an example is correctly classified, do nothing.
- If an example is predicted incorrectly and the correct result was 0, for each feature , the corresponding weight is set to 0 (demotion step).
- If an example is predicted incorrectly and the correct result was 1, for each feature , the corresponding weight multiplied by α(promotion step).
A typical value for α is 2.
There are many variations to this basic approach. Winnow2[1] is similar except that in the demotion step the weights are divided by α instead of being set to 0. Balanced Winnow maintains two sets of weights, and thus two hyperplanes. This can then be generalized for multi-label classification.