Skip to content

Commit 8cfe5cf

Browse files
spec: add diagnostic-regression-panel specification (#5257)
## New Specification: `diagnostic-regression-panel` Related to #5242 --- ### specification.md # diagnostic-regression-panel: Regression Diagnostic Panel (Four-Plot Display) ## Description A 2x2 panel of diagnostic plots for evaluating linear regression model assumptions, replicating the classic output of R's `plot(lm)`. The four subplots are: (1) Residuals vs Fitted values to detect non-linearity and heteroscedasticity, (2) Normal Q-Q plot of standardized residuals to assess normality, (3) Scale-Location plot (square root of standardized residuals vs fitted values) to check homoscedasticity, and (4) Residuals vs Leverage with Cook's distance contours to identify influential observations. This composite display is the standard first step in regression model validation across statistics, academia, and regulated industries. ## Applications - Validating linear regression assumptions before reporting results in statistical analysis - Checking for heteroscedasticity, non-linearity, and influential outliers in fitted models - Teaching regression diagnostics in academic statistics courses and textbooks - Regulatory model validation in finance (credit risk models) and pharma (dose-response modeling) ## Data - `fitted` (float) - Fitted/predicted values from the regression model - `residuals` (float) - Raw residuals (observed minus predicted) - `std_residuals` (float) - Standardized (or studentized) residuals - `leverage` (float) - Hat values / leverage for each observation - `cooks_d` (float) - Cook's distance measuring each observation's influence - Size: 50-500 observations ## Notes - Four subplots arranged in a 2x2 grid layout with shared figure title - **Subplot 1 (Residuals vs Fitted):** Scatter of residuals against fitted values with a horizontal zero-reference line and a LOWESS smoother to reveal non-linear patterns - **Subplot 2 (Normal Q-Q):** Standardized residuals plotted against theoretical normal quantiles with a 45-degree reference line; deviations indicate non-normality - **Subplot 3 (Scale-Location):** Square root of absolute standardized residuals vs fitted values with a LOWESS smoother; a flat line indicates constant variance - **Subplot 4 (Residuals vs Leverage):** Standardized residuals vs leverage with Cook's distance contour lines (e.g., at 0.5 and 1.0) to highlight influential points - Label the 2-3 most influential points (highest Cook's distance) with observation indices in each subplot - Use consistent point styling across all four subplots --- **Next:** Add `approved` label to the issue to merge this PR. --- :robot: *[spec-create workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/24290826351)* Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent a57bafa commit 8cfe5cf

2 files changed

Lines changed: 60 additions & 0 deletions

File tree

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# diagnostic-regression-panel: Regression Diagnostic Panel (Four-Plot Display)
2+
3+
## Description
4+
5+
A 2x2 panel of diagnostic plots for evaluating linear regression model assumptions, replicating the classic output of R's `plot(lm)`. The four subplots are: (1) Residuals vs Fitted values to detect non-linearity and heteroscedasticity, (2) Normal Q-Q plot of standardized residuals to assess normality, (3) Scale-Location plot (square root of standardized residuals vs fitted values) to check homoscedasticity, and (4) Residuals vs Leverage with Cook's distance contours to identify influential observations. This composite display is the standard first step in regression model validation across statistics, academia, and regulated industries.
6+
7+
## Applications
8+
9+
- Validating linear regression assumptions before reporting results in statistical analysis
10+
- Checking for heteroscedasticity, non-linearity, and influential outliers in fitted models
11+
- Teaching regression diagnostics in academic statistics courses and textbooks
12+
- Regulatory model validation in finance (credit risk models) and pharma (dose-response modeling)
13+
14+
## Data
15+
16+
- `fitted` (float) - Fitted/predicted values from the regression model
17+
- `residuals` (float) - Raw residuals (observed minus predicted)
18+
- `std_residuals` (float) - Standardized (or studentized) residuals
19+
- `leverage` (float) - Hat values / leverage for each observation
20+
- `cooks_d` (float) - Cook's distance measuring each observation's influence
21+
- Size: 50-500 observations
22+
23+
## Notes
24+
25+
- Four subplots arranged in a 2x2 grid layout with shared figure title
26+
- **Subplot 1 (Residuals vs Fitted):** Scatter of residuals against fitted values with a horizontal zero-reference line and a LOWESS smoother to reveal non-linear patterns
27+
- **Subplot 2 (Normal Q-Q):** Standardized residuals plotted against theoretical normal quantiles with a 45-degree reference line; deviations indicate non-normality
28+
- **Subplot 3 (Scale-Location):** Square root of absolute standardized residuals vs fitted values with a LOWESS smoother; a flat line indicates constant variance
29+
- **Subplot 4 (Residuals vs Leverage):** Standardized residuals vs leverage with Cook's distance contour lines (e.g., at 0.5 and 1.0) to highlight influential points
30+
- Label the 2-3 most influential points (highest Cook's distance) with observation indices in each subplot
31+
- Use consistent point styling across all four subplots
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Specification-level metadata for diagnostic-regression-panel
2+
# Auto-synced to PostgreSQL on push to main
3+
4+
spec_id: diagnostic-regression-panel
5+
title: Regression Diagnostic Panel (Four-Plot Display)
6+
7+
# Specification tracking
8+
created: "2026-04-11T20:23:05Z"
9+
updated: null
10+
issue: 5242
11+
suggested: MarkusNeusinger
12+
13+
# Classification tags (applies to all library implementations)
14+
# See docs/reference/tagging-system.md for detailed guidelines
15+
tags:
16+
plot_type:
17+
- scatter
18+
- qq
19+
data_type:
20+
- numeric
21+
- continuous
22+
domain:
23+
- statistics
24+
- research
25+
features:
26+
- multi
27+
- annotated
28+
- diagnostic
29+
- correlation

0 commit comments

Comments
 (0)