CN2 algorithm
Appearance
The CN2 induction algorithm is a learning algorithm for rule induction. It is designed to work even when the training data is imperfect. It is based on ideas from the AQ algorithm and the ID3 algorithm. As a consequence it creates a rule set like that created by AQ but is able to handle noisy data like ID3.
Description of Algorithm
The algorithm must be given a set of examples, TrainingSet, which have already been classified in order to generate a list of classification rules. A set of conditions, SimpleConditionSet, which can be applied, alone or in combination, to any set of examples is predefined to be used for the classification.
routine CN2(TrainingSet)
let the ClassificationRuleList be empty
repeat
let the BestConditionSet be Find_BestConditionSet(TrainingSet)
if the BestConditionSet is not nil
then
let the TrainingSubset be the examples covered by the BestConditionSet
remove from the TrainingSet the examples in the TrainingSubset
let the MostCommonClass be the most common class of examples in the TrainingSubset
append to the ClassificationRuleList the rule
'if ' the BestConditionSet ' then the class is ' the MostCommonClass
until the TrainingSet is empty or the BestConditionSet is nil
return the ClassificationRuleList
routine Find_BestConditionSet(TrainingSet)
let the ConditionalFormulaSet be the set containing the empty formula
let the BestConditionSet be nil
repeat
let the TrialConditionalFormulaSet be the set of formulae
{x and y|x belongs to the ConditionalFormulaSet, y belongs to the SimpleConditionSet}.
remove all formulae in the TrialConditionalFormulaSet that are either in the ConditionalFormulaSet (i.e.,
the unspecialized ones) or null (e.g., big = y and big = n)
for every formula, F, in the TrialConditionalFormulaSet
if
F is statistically significant
and F is better than the BestConditionSet
by user-defined criteria when tested on the TrainingSet
then
replace the current value of the BestConditionSet by F
while the number of formulae in the TrialConditionalFormulaSet > user-defined maximum
remove the worst formula from the TrialConditionalFormulaSet
let the TrialConditionalFormulaSet be the TrialConditionalFormulaSet
until the TrialConditionalFormulaSet is empty
return the BestConditionSet
External links