| Information Gain | Gain(D,a)=Ent(D)−∑v=1V∣D∣∣Dv∣Ent(Dv) | Ent(D)=−∑k=1∣Y∣pklog2pk | ID3 | 对可取值数目较多的属性有偏好 |
| Gain ratio | Gain-ratio(D,a)=IV(a)Gain(D,a) | IV(a)=−∑v=1V∣D∣∣Dv∣log2∣D∣∣Dv∣ | C4.5 | 从候选划分中找出信息增益高于平均水平的属性,再从中选择增益率最高的 |
| Gini ratio | Gini-index(D,a)=∑v=1V∣D∣∣Dv∣Gini(Dv) | Gini(D)=1−∑k=1∣Y∣pk2 | CART | Gini 指数为随机抽取两个样本类别标记不一致的概率,越小纯度越高 |