Skip to content

Commit dfcea7c

Browse files
spec: add elbow-curve specification (#2335)
## New Specification: `elbow-curve` Related to #2333 --- ### specification.md # elbow-curve: Elbow Curve for K-Means Clustering ## Description An elbow curve visualizes the relationship between the number of clusters (k) and within-cluster sum of squares (inertia/distortion) in K-means clustering. The plot helps identify the optimal number of clusters by finding the "elbow point" where adding more clusters yields diminishing returns in reducing inertia. This is a fundamental diagnostic tool for unsupervised learning parameter selection. ## Applications - Selecting the optimal number of clusters (k) in K-means clustering analysis - Customer segmentation to determine natural groupings in behavioral data - Image compression parameter selection for color quantization - Document clustering to identify topic groupings in text corpora ## Data - `k_values` (numeric) - Number of clusters tested (typically 1 to 10 or 15) - `inertia` (numeric) - Within-cluster sum of squares for each k value - Size: 8-15 different k values for clear elbow visualization - Example: Scikit-learn's `KMeans.inertia_` attribute across multiple k values ## Notes - X-axis shows number of clusters (k), y-axis shows inertia/distortion - The elbow point is where the rate of decrease sharply changes - Consider annotating or highlighting the optimal k value - Use markers at each data point to show discrete k values tested - A smooth connecting line helps visualize the curve shape --- **Next:** Add `approved` label to the issue to merge this PR. --- :robot: *[spec-create workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/20528131890)* Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent fe34d79 commit dfcea7c

2 files changed

Lines changed: 57 additions & 0 deletions

File tree

plots/elbow-curve/specification.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# elbow-curve: Elbow Curve for K-Means Clustering
2+
3+
## Description
4+
5+
An elbow curve visualizes the relationship between the number of clusters (k) and within-cluster sum of squares (inertia/distortion) in K-means clustering. The plot helps identify the optimal number of clusters by finding the "elbow point" where adding more clusters yields diminishing returns in reducing inertia. This is a fundamental diagnostic tool for unsupervised learning parameter selection.
6+
7+
## Applications
8+
9+
- Selecting the optimal number of clusters (k) in K-means clustering analysis
10+
- Customer segmentation to determine natural groupings in behavioral data
11+
- Image compression parameter selection for color quantization
12+
- Document clustering to identify topic groupings in text corpora
13+
14+
## Data
15+
16+
- `k_values` (numeric) - Number of clusters tested (typically 1 to 10 or 15)
17+
- `inertia` (numeric) - Within-cluster sum of squares for each k value
18+
- Size: 8-15 different k values for clear elbow visualization
19+
- Example: Scikit-learn's `KMeans.inertia_` attribute across multiple k values
20+
21+
## Notes
22+
23+
- X-axis shows number of clusters (k), y-axis shows inertia/distortion
24+
- The elbow point is where the rate of decrease sharply changes
25+
- Consider annotating or highlighting the optimal k value
26+
- Use markers at each data point to show discrete k values tested
27+
- A smooth connecting line helps visualize the curve shape
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Specification-level metadata for elbow-curve
2+
# Auto-synced to PostgreSQL on push to main
3+
4+
spec_id: elbow-curve
5+
title: Elbow Curve for K-Means Clustering
6+
7+
# Specification tracking
8+
created: 2025-12-26T19:28:09Z
9+
updated: null
10+
issue: 2333
11+
suggested: MarkusNeusinger
12+
13+
# Classification tags (applies to all library implementations)
14+
# See docs/concepts/tagging-system.md for detailed guidelines
15+
tags:
16+
plot_type:
17+
- line
18+
- curve
19+
data_type:
20+
- numeric
21+
- discrete
22+
domain:
23+
- general
24+
- statistics
25+
- machine-learning
26+
features:
27+
- basic
28+
- clustering
29+
- optimization
30+
- diagnostic

0 commit comments

Comments
 (0)