Example for MDL

Consider the decision trees shown in Figure 4.3.
Assume they are generated
from a data set that contains 16 binary attributes and 3 classes,
*C*1, *C*2, and *C*3. Compute the total description length of each
decision tree according to the minimum description length principle.

Answer:

Because there are 16 attributes, the cost for each
internal node in the decision tree is:

log2(*m*) = log2(16) = 4

Furthermore, because there are 3 classes, the cost for
each leaf node is:

_{}

The cost for each
misclassification error is log2(*n*).
The overall cost for the decision tree (a) is 2*×*4+3*×*2+7*×*log2 *n *= 14+7 log2 *n*

and the overall cost for the decision tree (b) is

4*×*4+5*×*2+4*×*5
= 26+4 log2 *n*.

According to the MDL principle, tree (a) is better
than (b) if *n < *16 and is worse than (b) if *n > *16.