Support Vector Machines (SVM)
The “Widest Street” Algorithm of Machine Learning
💡 The Analogy: The Country Road
Imagine you have two groups of sheep: Red Sheep and Blue Sheep grazing in a field.
You need to build a fence to separate them. You could build the fence anywhere between the groups, but SVM is picky.
SVM doesn’t just want a fence. It wants to build a wide highway between the two groups.
- The Hyperplane is the yellow line in the middle of the road.
- The Margin is the width of the road.
- The Support Vectors are the specific sheep standing closest to the road. They are the only ones that matter; if you move the other sheep, the road stays the same.
Interactive Playground
The line updates automatically.
Live Status
What’s happening?
This uses a simplified linear optimizer (Stochastic Gradient Descent). It tries to maximize the distance between the Red and Blue points while keeping them on correct sides.
Try this: Place a cluster of Red points in the top-left and Blue points in the bottom-right. Then, add a point between them to see how the line shifts!
How SVM Works: Step-by-Step
Mapping the Data
The algorithm plots every data item as a point in n-dimensional space (where n is the number of features you have).
The Hyperplane Search
It looks for a Hyperplane (a line in 2D, a flat sheet in 3D) that separates the classes. There are infinite possible lines, but most are bad.
Maximize the Margin
The core “magic” of SVM. It selects the hyperplane with the maximum margin (distance) between the plane and the nearest data point of either class.
Identifying Support Vectors
The data points that lie closest to the decision surface are called Support Vectors. They are the most difficult data points to classify and effectively “support” or define the boundary.
Where does it fit in the ML Space?
Family
Supervised Learning
Requires labeled data (Input X, Output Y) to train.
Tasks
Classification & Regression
Mostly used for classification (SVC), but can do regression (SVR).
Relationship to Ensemble
Not Bagging/Boosting
SVM is usually a “Strong Learner” on its own. Random Forests (Bagging) and XGBoost (Boosting) use Decision Trees, not SVMs typically.
Types of SVM & The Kernel Trick
“Imagine red and blue balls lying flat on a table (2D), mixed up so a ruler can’t separate them. If you slap the table and the red balls fly up into the air (3D), you can suddenly slide a flat sheet between the floating red balls and the table-bound blue balls. When they land, the boundary looks curved.”
Measuring Success: Metrics
| Metric | Formula / Definition | When to use? |
|---|---|---|
| Accuracy | \( \frac{TP + TN}{Total} \) | When classes are balanced (e.g., 50% Red, 50% Blue). |
| Precision | \( \frac{TP}{TP + FP} \) | When False Positives are costly (e.g., Spam filter – don’t mark real mail as spam). |
| Recall (Sensitivity) | \( \frac{TP}{TP + FN} \) | When False Negatives are dangerous (e.g., Cancer detection – don’t miss a case). |
| F1-Score | \( 2 \times \frac{Precision \times Recall}{Precision + Recall} \) | Best balance. Use when you have uneven class distribution. |