Example for MDL
Consider the decision trees shown in Figure 4.3. Assume they are generated from a data set that contains 16 binary attributes and 3 classes, C1, C2, and C3. Compute the total description length of each decision tree according to the minimum description length principle.
Because there are 16 attributes, the cost for each internal node in the decision tree is:
log2(m) = log2(16) = 4
Furthermore, because there are 3 classes, the cost for each leaf node is:
The cost for each misclassification error is log2(n). The overall cost for the decision tree (a) is 2×4+3×2+7×log2 n = 14+7 log2 n
and the overall cost for the decision tree (b) is
4×4+5×2+4×5 = 26+4 log2 n.
According to the MDL principle, tree (a) is better than (b) if n < 16 and is worse than (b) if n > 16.