| Information Gain |
Gain(D,a)=Ent(D)−∑v=1V∣D∣∣Dv∣Ent(Dv) |
Ent(D)=−∑k=1∣Y∣pklog2pk |
ID3 |
对可取值数目较多的属性有偏好 |
| Gain ratio |
Gain-ratio(D,a)=IV(a)Gain(D,a) |
IV(a)=−∑v=1V∣D∣∣Dv∣log2∣D∣∣Dv∣ |
C4.5 |
从候选划分中找出信息增益高于平均水平的属性,再从中选择增益率最高的 |
| Gini ratio |
Gini-index(D,a)=∑v=1V∣D∣∣Dv∣Gini(Dv) |
Gini(D)=1−∑k=1∣Y∣pk2 |
CART |
Gini 指数为随机抽取两个样本类别标记不一致的概率,越小纯度越高 |