Decision Trees Declassified
From “20 Questions” to Machine Learning mastery.
1. The Concept: “20 Questions”
Imagine you are playing 20 Questions. Your friend is thinking of a specific animal, and you need to guess it.
You wouldn’t start by guessing random animals like “Is it a Platypus?” Instead, you ask splitting questions to narrow down the possibilities:
- “Is it a mammal?” (Splits the world into Mammals vs. Non-Mammals)
- “Does it bark?” (Splits Mammals into Dogs vs. Others)
The Logic Flow
In Machine Learning, this flow is the “Tree”. The questions are “Nodes”. The final answer is the “Leaf”.
2. The Engine: Interactive Gini Lab Try this
A Decision Tree wants to create “Pure” leaves. Drag the slider below to find the best spot to split the Red Dots from the Blue Dots. Watch how the Gini Impurity (the messiness score) changes.
Goal: Get the Weighted Gini as low as possible. The lower the Gini, the “purer” the split.
Left Leaf
Weighted Gini Impurity
Right Leaf
3. Under the Hood: The Math
The Gini Impurity Formula
Gini Impurity measures the likelihood of an incorrect classification if we labeled data randomly based on the distribution.
Detailed Example:
1. P(Blue) = 4/5 = 0.8
2. P(Red) = 1/5 = 0.2
3. Squares: 0.8² = 0.64, 0.2² = 0.04
4. Sum: 0.64 + 0.04 = 0.68
5. Gini = 1 – 0.68 = 0.32
* A Gini of 0.0 means the node is “Pure” (all one color). * A Gini of 0.5 means maximum impurity (50/50 split).
How the Algorithm Learns
The decision tree uses a Greedy Approach (specifically CART or ID3). It doesn’t plan ahead; it just tries to find the best immediate split.
- Check Every Feature: It looks at every column in your data (Age, Income, etc.).
- Check Every Threshold: It tries splitting at every unique value (Age > 20? Age > 21?).
- Calculate Score: For every possible split, it calculates the Weighted Gini Impurity (what you see in the simulator).
- Pick the Best: It chooses the split with the lowest score and creates two child nodes.
- Repeat: It does this recursively for every child node until it stops (leaves are pure or max depth reached).
4. Anatomy of a Tree
Root Node
The very top of the tree. It represents the entire population before any splitting happens.
Decision Node
A sub-node that splits into further sub-nodes. This is where the questions (e.g., “X > 50?”) happen.
Leaf Node
A node that does not split. It holds the final prediction or decision.
The “Overfitting” Trap
If a tree grows too deep, it starts memorizing the noise rather than the signal.
Example: A tree might create a specific rule for “People named Bob who wear green hats” just because one person in the data fit that description. This rule won’t work in the real world.
Solution: Pruning (cutting weak branches) or setting a Max Depth.
Advantages vs. Disadvantages
- Interpretability: You can visualize and explain the logic easily to a human.
- No Data Prep: Handles both numerical and categorical data well without heavy scaling.
- Non-Linear: Can capture complex patterns.
- Instability: A small change in data can result in a completely different tree.
- Overfitting: Tends to build complex trees that don’t generalize well without tuning.
- Bias: Biased towards dominant classes.