Skip to content

Commit aa66c90

Browse files
feat(altair): implement logistic-regression (#3564)
## Implementation: `logistic-regression` - altair Implements the **altair** version of `logistic-regression`. **File:** `plots/logistic-regression/implementations/altair.py` **Parent Issue:** #3550 --- :robot: *[impl-generate workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/20866597347)* --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent cfe9eb9 commit aa66c90

2 files changed

Lines changed: 339 additions & 0 deletions

File tree

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
""" pyplots.ai
2+
logistic-regression: Logistic Regression Curve Plot
3+
Library: altair 6.0.0 | Python 3.13.11
4+
Quality: 91/100 | Created: 2026-01-09
5+
"""
6+
7+
import altair as alt
8+
import numpy as np
9+
import pandas as pd
10+
11+
12+
# Data - Study hours vs exam pass/fail
13+
np.random.seed(42)
14+
n_samples = 150
15+
16+
# Generate study hours with different distributions for pass/fail
17+
hours_fail = np.random.normal(3, 1.5, 60)
18+
hours_pass = np.random.normal(7, 1.5, 90)
19+
hours = np.concatenate([hours_fail, hours_pass])
20+
hours = np.clip(hours, 0.5, 10)
21+
22+
outcome = np.concatenate([np.zeros(60), np.ones(90)])
23+
24+
# Fit logistic regression using gradient descent
25+
X_b = np.column_stack([np.ones(n_samples), hours])
26+
w = np.zeros(2)
27+
28+
for _ in range(1000):
29+
z = X_b @ w
30+
predictions = 1 / (1 + np.exp(-z))
31+
gradient = X_b.T @ (predictions - outcome) / n_samples
32+
w -= 0.1 * gradient
33+
34+
b0, b1 = w[0], w[1]
35+
36+
# Generate smooth curve points
37+
x_curve = np.linspace(0, 10.5, 200)
38+
y_proba = 1 / (1 + np.exp(-(b0 + b1 * x_curve)))
39+
40+
# Calculate confidence intervals
41+
se = np.sqrt(y_proba * (1 - y_proba) / n_samples) * 2.5
42+
ci_lower = np.clip(y_proba - 1.96 * se, 0, 1)
43+
ci_upper = np.clip(y_proba + 1.96 * se, 0, 1)
44+
45+
# Create curve DataFrame
46+
curve_df = pd.DataFrame({"Study Hours": x_curve, "Probability": y_proba, "CI Lower": ci_lower, "CI Upper": ci_upper})
47+
48+
# Add jitter to data points for visibility
49+
jitter = np.random.uniform(-0.03, 0.03, len(outcome))
50+
y_jittered = outcome + jitter
51+
52+
# Create data points DataFrame
53+
points_df = pd.DataFrame(
54+
{
55+
"Study Hours": hours,
56+
"Outcome": outcome,
57+
"Outcome Jittered": y_jittered,
58+
"Class": ["Fail" if o == 0 else "Pass" for o in outcome],
59+
}
60+
)
61+
62+
# Decision threshold line
63+
threshold_df = pd.DataFrame({"Study Hours": [0, 10.5], "Probability": [0.5, 0.5]})
64+
65+
# Create the confidence interval band
66+
ci_band = (
67+
alt.Chart(curve_df)
68+
.mark_area(opacity=0.25, color="#306998")
69+
.encode(x=alt.X("Study Hours:Q"), y=alt.Y("CI Lower:Q"), y2=alt.Y2("CI Upper:Q"))
70+
)
71+
72+
# Create the logistic curve
73+
curve = (
74+
alt.Chart(curve_df)
75+
.mark_line(strokeWidth=4, color="#306998")
76+
.encode(x=alt.X("Study Hours:Q"), y=alt.Y("Probability:Q"))
77+
)
78+
79+
# Create the data points
80+
points = (
81+
alt.Chart(points_df)
82+
.mark_circle(size=200, opacity=0.6, strokeWidth=1, stroke="white")
83+
.encode(
84+
x=alt.X("Study Hours:Q", title="Study Hours", scale=alt.Scale(domain=[0, 10.5])),
85+
y=alt.Y("Outcome Jittered:Q", title="Probability / Outcome", scale=alt.Scale(domain=[-0.05, 1.05])),
86+
color=alt.Color(
87+
"Class:N",
88+
scale=alt.Scale(domain=["Fail", "Pass"], range=["#306998", "#FFD43B"]),
89+
legend=alt.Legend(title="Exam Result", titleFontSize=20, labelFontSize=18, symbolSize=300),
90+
),
91+
tooltip=["Study Hours", "Class"],
92+
)
93+
)
94+
95+
# Decision threshold line
96+
threshold = (
97+
alt.Chart(threshold_df)
98+
.mark_line(strokeDash=[12, 8], strokeWidth=3, color="#888888")
99+
.encode(x=alt.X("Study Hours:Q"), y=alt.Y("Probability:Q"))
100+
)
101+
102+
# Threshold label
103+
threshold_label = (
104+
alt.Chart(pd.DataFrame({"x": [9.5], "y": [0.54], "text": ["Decision Threshold (p=0.5)"]}))
105+
.mark_text(fontSize=16, color="#666666", align="right")
106+
.encode(x="x:Q", y="y:Q", text="text:N")
107+
)
108+
109+
# Combine all layers
110+
chart = (
111+
alt.layer(ci_band, curve, threshold, threshold_label, points)
112+
.properties(
113+
width=1600,
114+
height=900,
115+
title=alt.Title("logistic-regression · altair · pyplots.ai", fontSize=28, anchor="middle"),
116+
)
117+
.configure_axis(labelFontSize=18, titleFontSize=22, gridOpacity=0.3)
118+
.configure_view(strokeWidth=0)
119+
)
120+
121+
# Save outputs
122+
chart.save("plot.png", scale_factor=3.0)
123+
chart.save("plot.html")
Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
library: altair
2+
specification_id: logistic-regression
3+
created: '2026-01-09T21:55:53Z'
4+
updated: '2026-01-09T22:01:12Z'
5+
generated_by: claude-opus-4-5-20251101
6+
workflow_run: 20866597347
7+
issue: 3550
8+
python_version: 3.13.11
9+
library_version: 6.0.0
10+
preview_url: https://storage.googleapis.com/pyplots-images/plots/logistic-regression/altair/plot.png
11+
preview_thumb: https://storage.googleapis.com/pyplots-images/plots/logistic-regression/altair/plot_thumb.png
12+
preview_html: https://storage.googleapis.com/pyplots-images/plots/logistic-regression/altair/plot.html
13+
quality_score: 91
14+
review:
15+
strengths:
16+
- Excellent implementation of the sigmoid curve with manual gradient descent fitting
17+
- Well-designed confidence interval band using mark_area
18+
- Appropriate jittering of data points for visibility
19+
- Clean Altair layer composition combining multiple visual elements
20+
- 'Good color choice (Python blue #306998 and yellow #FFD43B)'
21+
- Proper title format following spec requirements
22+
- Tooltips included for interactivity
23+
weaknesses:
24+
- Axis label could include units (e.g., Study Hours (hrs))
25+
- Some data points at boundary regions show unexpected class membership (pass at
26+
low hours, fail at high hours) which may confuse viewers
27+
- Grid opacity could be slightly more subtle (0.2 instead of 0.3)
28+
image_description: 'The plot displays a logistic regression visualization with the
29+
characteristic S-shaped sigmoid curve. The x-axis shows "Study Hours" ranging
30+
from 0 to 10.8, and the y-axis shows "Probability / Outcome" ranging from -0.10
31+
to 1.10. Data points are colored by class: blue circles for "Fail" (clustered
32+
near the bottom around y=0) and yellow circles for "Pass" (clustered near the
33+
top around y=1). The blue sigmoid curve shows the fitted logistic regression probability,
34+
surrounded by a light blue semi-transparent confidence interval band. A gray dashed
35+
horizontal line at p=0.5 indicates the decision threshold, with a "Decision Threshold
36+
(p=0.5)" label. The legend in the upper right shows "Exam Result" with Fail (blue)
37+
and Pass (yellow). The title reads "logistic-regression · altair · pyplots.ai".'
38+
criteria_checklist:
39+
visual_quality:
40+
score: 36
41+
max: 40
42+
items:
43+
- id: VQ-01
44+
name: Text Legibility
45+
score: 10
46+
max: 10
47+
passed: true
48+
comment: Title, axis labels, and tick labels are all clearly readable at full
49+
size with appropriate font sizes
50+
- id: VQ-02
51+
name: No Overlap
52+
score: 8
53+
max: 8
54+
passed: true
55+
comment: No overlapping text elements, data points have appropriate jitter
56+
- id: VQ-03
57+
name: Element Visibility
58+
score: 7
59+
max: 8
60+
passed: true
61+
comment: Markers well-sized (size=200) with good opacity (0.6), though some
62+
overlap in dense regions
63+
- id: VQ-04
64+
name: Color Accessibility
65+
score: 5
66+
max: 5
67+
passed: true
68+
comment: Blue and yellow are colorblind-safe and provide excellent contrast
69+
- id: VQ-05
70+
name: Layout Balance
71+
score: 4
72+
max: 5
73+
passed: true
74+
comment: Good use of canvas space, legend well positioned but slight extra
75+
space on right
76+
- id: VQ-06
77+
name: Axis Labels
78+
score: 1
79+
max: 2
80+
passed: true
81+
comment: Descriptive labels but no units
82+
- id: VQ-07
83+
name: Grid & Legend
84+
score: 1
85+
max: 2
86+
passed: true
87+
comment: Grid subtle (gridOpacity=0.3), legend well placed but slightly large
88+
spec_compliance:
89+
score: 24
90+
max: 25
91+
items:
92+
- id: SC-01
93+
name: Plot Type
94+
score: 8
95+
max: 8
96+
passed: true
97+
comment: Correct logistic regression visualization with sigmoid curve
98+
- id: SC-02
99+
name: Data Mapping
100+
score: 5
101+
max: 5
102+
passed: true
103+
comment: X correctly shows predictor, Y shows probability/outcome
104+
- id: SC-03
105+
name: Required Features
106+
score: 5
107+
max: 5
108+
passed: true
109+
comment: 'All spec features present: sigmoid curve, confidence interval, jittered
110+
points, decision threshold line, class coloring'
111+
- id: SC-04
112+
name: Data Range
113+
score: 3
114+
max: 3
115+
passed: true
116+
comment: All data visible with appropriate scale
117+
- id: SC-05
118+
name: Legend Accuracy
119+
score: 2
120+
max: 2
121+
passed: true
122+
comment: Legend correctly shows Exam Result with Fail/Pass
123+
- id: SC-06
124+
name: Title Format
125+
score: 1
126+
max: 2
127+
passed: true
128+
comment: Uses correct format but middot character may render differently across
129+
systems
130+
data_quality:
131+
score: 18
132+
max: 20
133+
items:
134+
- id: DQ-01
135+
name: Feature Coverage
136+
score: 7
137+
max: 8
138+
passed: true
139+
comment: 'Shows sigmoid curve, both classes, confidence interval, threshold;
140+
minor: could show edge cases more'
141+
- id: DQ-02
142+
name: Realistic Context
143+
score: 7
144+
max: 7
145+
passed: true
146+
comment: Exam pass/fail vs study hours is an excellent, neutral, realistic
147+
scenario
148+
- id: DQ-03
149+
name: Appropriate Scale
150+
score: 4
151+
max: 5
152+
passed: true
153+
comment: 'Study hours 0-10 is reasonable; 150 samples good; minor: some pass
154+
outcomes at low hours'
155+
code_quality:
156+
score: 10
157+
max: 10
158+
items:
159+
- id: CQ-01
160+
name: KISS Structure
161+
score: 3
162+
max: 3
163+
passed: true
164+
comment: 'Clean linear flow: imports, data, model, plot, save'
165+
- id: CQ-02
166+
name: Reproducibility
167+
score: 3
168+
max: 3
169+
passed: true
170+
comment: Uses np.random.seed(42)
171+
- id: CQ-03
172+
name: Clean Imports
173+
score: 2
174+
max: 2
175+
passed: true
176+
comment: Only necessary imports (altair, numpy, pandas)
177+
- id: CQ-04
178+
name: No Deprecated API
179+
score: 1
180+
max: 1
181+
passed: true
182+
comment: Uses current Altair API
183+
- id: CQ-05
184+
name: Output Correct
185+
score: 1
186+
max: 1
187+
passed: true
188+
comment: Saves as plot.png and plot.html
189+
library_features:
190+
score: 3
191+
max: 5
192+
items:
193+
- id: LF-01
194+
name: Distinctive Features
195+
score: 3
196+
max: 5
197+
passed: true
198+
comment: Uses Altair layer composition, tooltips, and declarative encoding
199+
well, but could use more interactive features
200+
verdict: APPROVED
201+
impl_tags:
202+
dependencies: []
203+
techniques:
204+
- layer-composition
205+
- annotations
206+
- hover-tooltips
207+
- html-export
208+
patterns:
209+
- data-generation
210+
- iteration-over-groups
211+
dataprep:
212+
- regression
213+
styling:
214+
- alpha-blending
215+
- grid-styling
216+
- edge-highlighting

0 commit comments

Comments
 (0)