Example for MDL

Consider the decision trees shown in Figure 4.3. Assume they are generated from a data set that contains 16 binary attributes and 3 classes, C1, C2, and C3. Compute the total description length of each decision tree according to the minimum description length principle.

Answer:

 

Because there are 16 attributes, the cost for each internal node in the decision tree is:

log2(m) = log2(16) = 4

Furthermore, because there are 3 classes, the cost for each leaf node is:

 

The cost for each misclassification error is log2(n). The overall cost for the decision tree (a) is 24+32+7log2 n = 14+7 log2 n

 

and the overall cost for the decision tree (b) is

44+52+45 = 26+4 log2 n.

 

According to the MDL principle, tree (a) is better than (b) if n < 16 and is worse than (b) if n > 16.