User:Sean3000/sandbox
Appearance
The Information Value is a method of Feature selection widely used in credit scoring[1]. The formula is:
Where ‘Distr Bad’ is the number of defaulting customers in a category over the total number of defaulting customers in a portfolio, while ‘Distr Good’ is the number of non-defaulting customers in a category over the total number of non-defaulting costumers.
For example, if the category is age band, it may be calculated as follows:
Age | Total Count of Customers | Count of Bads | Count of Goods | Distribution of Bads | Distribution of Goods | Information Value |
---|---|---|---|---|---|---|
<18 | 2,000 | 140 | 1,860 | 3.65% | 3.86% | 0.01% |
19<25 | 5,000 | 960 | 4,040 | 25.00% | 8.39% | 18.14% |
26<35 | 10,000 | 1,080 | 8,920 | 28.12% | 18.52% | 4.01% |
36<50 | 12,000 | 900 | 11,100 | 23.44% | 23.05% | 0.01% |
51<65 | 13,000 | 500 | 12,500 | 13.02% | 25.96% | 8.92% |
66<70 | 7,000 | 200 | 6,800 | 5.21% | 14.12% | 8.89% |
71+ | 3,000 | 060 | 2,940 | 1.56% | 6.10% | 6.19% |
Total: | 52,000 | 3,840 | 48,160 | 100.00% | 100.00% | 46.17% |
Thus for the first row ("<18"), the Count of Bads is the Total Count of Customers - Total Count of Bads, (2000-140), the distribution of Goods is