K-Means Clustering: Interactive Guide

K-Means Clustering

Unsupervised Learning / Interactive Visualizer

Machine Learning Algorithms

How it Works: The Pizza Shop Analogy

Imagine you own a pizza chain and want to open K new delivery centers in a city to serve customers most efficiently. You have a map of where all your customers live (the data points), but you don’t know where to put the shops.

The Strategy:
  1. Initialize: Place your K shops randomly on the map.
  2. Assign: Every customer orders from the shop closest to them. This creates “delivery zones” (clusters).
  3. Update: You realize some shops are off-center. You move each shop to the exact geographic center of its current customers.
  4. Repeat: Because the shops moved, some customers are now closer to a different shop. Re-assign customers, move shops again, until the shops stop moving.

Algorithm Visualizer

Data Point Centroid
Click to add points
Iteration 0
Status Ready
Total Inertia 0

Algorithm Steps

1. Initialization

We randomly place K centroids (the diamond shapes) on the canvas. These act as the temporary centers of our clusters.

2. Assignment

We calculate the distance from every data point to every centroid. Points are painted the color of their closest centroid.

3. Update Centroids

The centroids move to the average position (mean X, mean Y) of all points currently assigned to them.

4. Convergence

We repeat steps 2 & 3 until the centroids stop moving. The algorithm has converged.

The Elbow Method

How do you know what K should be? We plot Inertia (sum of squared distances from points to their centers) against K.

As K increases, inertia decreases. The “Elbow” is the point where adding more clusters gives diminishing returns.

ML Context & Details

Where does it fit?

K-Means is the “Hello World” of Unsupervised Learning. Unlike Supervised Learning (where you have labels like “Cat” or “Dog”), here the algorithm must find structure in raw data on its own.


Common Metrics
  • Inertia (WCSS): Within-Cluster Sum of Squares. We want this low.
  • Silhouette Score: Measures how similar a point is to its own cluster compared to other clusters. Range is -1 to 1. Closer to 1 is better.

When to use it?
Customer Segmentation Image Compression Anomaly Detection Document Clustering

Designed for learning. Embeddable in WordPress.

Leave a Reply

Your email address will not be published. Required fields are marked *