Jump to content

User:Sean3000/sandbox

From Wikipedia, the free encyclopedia

The Information Value is a method of Feature selection widely used in credit scoring[1]. The formula is:

Where ‘Distr Bad’ is the number of defaulting customers in a category over the total number of defaulting customers in a portfolio, while ‘Distr Good’ is the number of non-defaulting customers in a category over the total number of non-defaulting costumers.

For example, if the category is age band, it may be calculated as follows:

Age Total Count of Customers Count of Bads Count of Goods Distribution of Bads Distribution of Goods Information Value
<18 2,000 140 1,860 3.65% 3.86% 0.01%
19<25 5,000 960 4,040 25.00% 8.39% 18.14%
26<35 10,000 1,080 8,920 28.12% 18.52% 4.01%
36<50 12,000 900 11,100 23.44% 23.05% 0.01%
51<65 13,000 500 12,500 13.02% 25.96% 8.92%
66<70 7,000 200 6,800 5.21% 14.12% 8.89%
71+ 3,000 060 2,940 1.56% 6.10% 6.19%
Total: 52,000 3,840 48,160 100.00% 100.00% 46.17%

Thus for the first row ("<18"), the Count of Bads is the Total Count of Customers - Total Count of Bads, (2000-140), the distribution of Goods is

References

[edit]
  1. ^ [[1]]Metric Divergence Measures and Information Value in Credit Scoring, Guoping Zeng