K-Means Clustering
Unsupervised Learning / Interactive Visualizer
How it Works: The Pizza Shop Analogy
Imagine you own a pizza chain and want to open K new delivery centers in a city to serve customers most efficiently. You have a map of where all your customers live (the data points), but you don’t know where to put the shops.
- Initialize: Place your K shops randomly on the map.
- Assign: Every customer orders from the shop closest to them. This creates “delivery zones” (clusters).
- Update: You realize some shops are off-center. You move each shop to the exact geographic center of its current customers.
- Repeat: Because the shops moved, some customers are now closer to a different shop. Re-assign customers, move shops again, until the shops stop moving.
Algorithm Visualizer
Algorithm Steps
We randomly place K centroids (the diamond shapes) on the canvas. These act as the temporary centers of our clusters.
We calculate the distance from every data point to every centroid. Points are painted the color of their closest centroid.
The centroids move to the average position (mean X, mean Y) of all points currently assigned to them.
We repeat steps 2 & 3 until the centroids stop moving. The algorithm has converged.
The Elbow Method
How do you know what K should be? We plot Inertia (sum of squared distances from points to their centers) against K.
As K increases, inertia decreases. The “Elbow” is the point where adding more clusters gives diminishing returns.
ML Context & Details
K-Means is the “Hello World” of Unsupervised Learning. Unlike Supervised Learning (where you have labels like “Cat” or “Dog”), here the algorithm must find structure in raw data on its own.
- Inertia (WCSS): Within-Cluster Sum of Squares. We want this low.
- Silhouette Score: Measures how similar a point is to its own cluster compared to other clusters. Range is -1 to 1. Closer to 1 is better.