Skip to content

Commit 9d12dc9

Browse files
spec: add learning-curve-basic specification (#2280)
## New Specification: `learning-curve-basic` Related to #2275 --- ### specification.md # learning-curve-basic: Model Learning Curve ## Description A learning curve visualizes model performance (training and validation scores) as a function of training set size. It is essential for diagnosing bias vs variance tradeoffs, determining whether collecting more data would improve model performance, and guiding model selection decisions. The plot typically shows two lines with shaded confidence bands representing variability across cross-validation folds. ## Applications - Diagnosing underfitting (high bias) when both training and validation scores are low - Diagnosing overfitting (high variance) when training score is high but validation score is low with a large gap - Determining if collecting more training data would improve model performance - Comparing learning characteristics across different model architectures ## Data - `train_sizes` (numeric) - Array of training set sizes used for evaluation - `train_scores` (numeric) - Training scores at each sample size (2D: folds × sizes) - `validation_scores` (numeric) - Validation scores at each sample size (2D: folds × sizes) - Size: 5-20 different training set sizes, typically with 5-10 cross-validation folds - Example: Scikit-learn's `learning_curve` function output ## Notes - Use shaded regions to show confidence bands (e.g., ±1 standard deviation across folds) - Clearly label the y-axis with the metric being evaluated (accuracy, F1, MSE, etc.) - Include a legend distinguishing training from validation curves - X-axis should show actual sample sizes or percentages of total training data - Consider using distinct colors (e.g., blue for training, orange for validation) for clarity --- **Next:** Add `approved` label to the issue to merge this PR. --- :robot: *[spec-create workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/20526506984)* Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent d2ded2a commit 9d12dc9

2 files changed

Lines changed: 58 additions & 0 deletions

File tree

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# learning-curve-basic: Model Learning Curve
2+
3+
## Description
4+
5+
A learning curve visualizes model performance (training and validation scores) as a function of training set size. It is essential for diagnosing bias vs variance tradeoffs, determining whether collecting more data would improve model performance, and guiding model selection decisions. The plot typically shows two lines with shaded confidence bands representing variability across cross-validation folds.
6+
7+
## Applications
8+
9+
- Diagnosing underfitting (high bias) when both training and validation scores are low
10+
- Diagnosing overfitting (high variance) when training score is high but validation score is low with a large gap
11+
- Determining if collecting more training data would improve model performance
12+
- Comparing learning characteristics across different model architectures
13+
14+
## Data
15+
16+
- `train_sizes` (numeric) - Array of training set sizes used for evaluation
17+
- `train_scores` (numeric) - Training scores at each sample size (2D: folds × sizes)
18+
- `validation_scores` (numeric) - Validation scores at each sample size (2D: folds × sizes)
19+
- Size: 5-20 different training set sizes, typically with 5-10 cross-validation folds
20+
- Example: Scikit-learn's `learning_curve` function output
21+
22+
## Notes
23+
24+
- Use shaded regions to show confidence bands (e.g., ±1 standard deviation across folds)
25+
- Clearly label the y-axis with the metric being evaluated (accuracy, F1, MSE, etc.)
26+
- Include a legend distinguishing training from validation curves
27+
- X-axis should show actual sample sizes or percentages of total training data
28+
- Consider using distinct colors (e.g., blue for training, orange for validation) for clarity
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Specification-level metadata for learning-curve-basic
2+
# Auto-synced to PostgreSQL on push to main
3+
4+
spec_id: learning-curve-basic
5+
title: Model Learning Curve
6+
7+
# Specification tracking
8+
created: 2025-12-26T17:28:31Z
9+
updated: 2025-12-26T17:28:31Z
10+
issue: 2275
11+
suggested: MarkusNeusinger
12+
13+
# Classification tags (applies to all library implementations)
14+
# See docs/concepts/tagging-system.md for detailed guidelines
15+
tags:
16+
plot_type:
17+
- line
18+
- learning-curve
19+
data_type:
20+
- numeric
21+
- continuous
22+
domain:
23+
- machine-learning
24+
- statistics
25+
- data-science
26+
features:
27+
- basic
28+
- confidence-band
29+
- comparison
30+
- diagnostic

0 commit comments

Comments
 (0)