Decision Tree

1. The Concept: “20 Questions”

Imagine you are playing 20 Questions. Your friend is thinking of a specific animal, and you need to guess it.

You wouldn’t start by guessing random animals like “Is it a Platypus?” Instead, you ask splitting questions to narrow down the possibilities:

“Is it a mammal?” (Splits the world into Mammals vs. Non-Mammals)
“Does it bark?” (Splits Mammals into Dogs vs. Others)

The Logic Flow

Is it a mammal?

No

Is it a bird?

Yes

Does it bark?

In Machine Learning, this flow is the “Tree”. The questions are “Nodes”. The final answer is the “Leaf”.

2. The Engine: Interactive Gini Lab Try this

A Decision Tree wants to create “Pure” leaves. Drag the slider below to find the best spot to split the Red Dots from the Blue Dots. Watch how the Gini Impurity (the messiness score) changes.

Goal: Get the Weighted Gini as low as possible. The lower the Gini, the “purer” the split.

DRAG ME

Left Leaf

Blue: 0 Red: 0

Gini: 0.00

Weighted Gini Impurity

0.00

Adjust slider…

Right Leaf

Blue: 0 Red: 0

Gini: 0.00

3. Under the Hood: The Math

The Gini Impurity Formula

Gini Impurity measures the likelihood of an incorrect classification if we labeled data randomly based on the distribution.

Gini = 1 – ∑(pᵢ)²

Detailed Example:

Leaf has: 4 Blue, 1 Red (Total 5)
1. P(Blue) = 4/5 = 0.8
2. P(Red) = 1/5 = 0.2
3. Squares: 0.8² = 0.64, 0.2² = 0.04
4. Sum: 0.64 + 0.04 = 0.68
5. Gini = 1 – 0.68 = 0.32

* A Gini of 0.0 means the node is “Pure” (all one color). * A Gini of 0.5 means maximum impurity (50/50 split).

How the Algorithm Learns

The decision tree uses a Greedy Approach (specifically CART or ID3). It doesn’t plan ahead; it just tries to find the best immediate split.

Check Every Feature: It looks at every column in your data (Age, Income, etc.).
Check Every Threshold: It tries splitting at every unique value (Age > 20? Age > 21?).
Calculate Score: For every possible split, it calculates the Weighted Gini Impurity (what you see in the simulator).
Pick the Best: It chooses the split with the lowest score and creates two child nodes.
Repeat: It does this recursively for every child node until it stops (leaves are pure or max depth reached).

4. Anatomy of a Tree

🌱

Root Node

The very top of the tree. It represents the entire population before any splitting happens.

🌿

Decision Node

A sub-node that splits into further sub-nodes. This is where the questions (e.g., “X > 50?”) happen.

🍂

Leaf Node

A node that does not split. It holds the final prediction or decision.

The “Overfitting” Trap

If a tree grows too deep, it starts memorizing the noise rather than the signal.

Example: A tree might create a specific rule for “People named Bob who wear green hats” just because one person in the data fit that description. This rule won’t work in the real world.

Solution: Pruning (cutting weak branches) or setting a Max Depth.

Advantages vs. Disadvantages

👍 Pros

Interpretability: You can visualize and explain the logic easily to a human.
No Data Prep: Handles both numerical and categorical data well without heavy scaling.
Non-Linear: Can capture complex patterns.

👎 Cons

Instability: A small change in data can result in a completely different tree.
Overfitting: Tends to build complex trees that don’t generalize well without tuning.
Bias: Biased towards dominant classes.

📚 Rapid Glossary

Entropy Alternative to Gini. Measures disorder. High entropy = messy data. Low entropy = pure data.

Information Gain The reduction in Entropy (or Gini) achieved by a split. The algorithm maximizes this.

Pruning The process of removing sections of the tree that provide little power to classify instances, reducing overfitting.

Decision Trees Declassified