Some details
Contents
Some details#
How do we deal with categorical predictors?#
If there are only 2 categories, then the split is obvious. We don’t have to choose the splitting point \(s\), as for a numerical variable.
If there are more than 2 categories:
Order the categories according to the average of the response: \(\mathtt{ChestPain:a} > \mathtt{ChestPain:c} > \mathtt{ChestPain:b}\)
Treat as a numerical variable with this ordering, and choose a splitting point \(s\).
One can show that this is the optimal way of partitioning.
How do we deal with missing data?#
Suppose we can assign every sample to a leaf \(R_i\) despite the missing data.
When choosing a new split with variable \(X_j\) (growing the tree):
Only consider the samples which have the variable \(X_j\).
In addition to choosing the best split, choose a second best split using a different variable, and a third best, …
To propagate a sample down the tree, if it is missing a variable to make a decision, try the second best decision, or the third best: surrogate splitting
Some advantages of trees#
Very easy to interpret!
Closer to human decision-making.
Easy to visualize graphically (for shallow ones)
They easily handle qualitative predictors and missing data.
Downside: they don’t necessarily fit that well!