ID3 uses information gain as its attribute selection measure. The attribute with the highest information gain is chosen as the splitting attribute for node N.
The average amount of information needed to identify the class label,
where pi - non zero probability
pi is estimated by |Ci,D| / |D|
Info(D) is also known as the entropy of D.
The required more information is derived from
where
|Dj| / |D| - weight of jth partition
Information gain is defined as the difference between the original information requirement and the new requirement.
Gain(A) = Info(D) - InfoA(D)
The information gain measure is biased toward tests with many outcomes. C4.5, a successor of ID3, uses an extension to information gain known as gain ratio, which attempts to overcome this bias.
Gain ratio differs from information gain, which measures the information with respect to classification that is acquired based on the same partitioning. Gain ration can be calculated by
The attribute with the maximum gain ratio is selected as the splitting attribute.