Skip to content

Commit 5fa0310

Browse files
feat(plotnine): implement line-retention-cohort (#4932)
## Implementation: `line-retention-cohort` - plotnine Implements the **plotnine** version of `line-retention-cohort`. **File:** `plots/line-retention-cohort/implementations/plotnine.py` **Parent Issue:** #4572 --- :robot: *[impl-generate workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/23164943107)* --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent 65c5364 commit 5fa0310

2 files changed

Lines changed: 381 additions & 0 deletions

File tree

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
""" pyplots.ai
2+
line-retention-cohort: User Retention Curve by Cohort
3+
Library: plotnine 0.15.3 | Python 3.14.3
4+
Quality: 91/100 | Created: 2026-03-16
5+
"""
6+
7+
import numpy as np
8+
import pandas as pd
9+
from plotnine import (
10+
aes,
11+
annotate,
12+
element_blank,
13+
element_line,
14+
element_rect,
15+
element_text,
16+
geom_hline,
17+
geom_line,
18+
geom_point,
19+
geom_ribbon,
20+
geom_text,
21+
ggplot,
22+
guide_legend,
23+
guides,
24+
labs,
25+
scale_alpha_identity,
26+
scale_color_manual,
27+
scale_size_identity,
28+
scale_x_continuous,
29+
scale_y_continuous,
30+
theme,
31+
theme_minimal,
32+
)
33+
34+
35+
# Data
36+
np.random.seed(42)
37+
38+
cohorts = {
39+
"Jan 2025": {"size": 1245, "decay": 0.22, "plateau": 8},
40+
"Feb 2025": {"size": 1102, "decay": 0.17, "plateau": 12},
41+
"Mar 2025": {"size": 1380, "decay": 0.13, "plateau": 18},
42+
"Apr 2025": {"size": 1290, "decay": 0.11, "plateau": 22},
43+
"May 2025": {"size": 1455, "decay": 0.08, "plateau": 30},
44+
}
45+
46+
weeks = np.arange(0, 13)
47+
rows = []
48+
49+
for cohort_name, info in cohorts.items():
50+
base = (100 - info["plateau"]) * np.exp(-info["decay"] * weeks) + info["plateau"]
51+
noise = np.concatenate(([0], np.cumsum(np.random.normal(0, 0.6, len(weeks) - 1))))
52+
retention = np.clip(base + noise, 0, 100)
53+
retention[0] = 100.0
54+
label = f"{cohort_name} (n={info['size']:,})"
55+
for w, r in zip(weeks, retention, strict=True):
56+
rows.append({"week": w, "retention": r, "cohort": label})
57+
58+
df = pd.DataFrame(rows)
59+
60+
cohort_labels = list(df["cohort"].unique())
61+
df["cohort"] = pd.Categorical(df["cohort"], categories=cohort_labels, ordered=True)
62+
63+
# Alpha: ensure oldest is still readable
64+
alpha_values = [0.6, 0.7, 0.8, 0.9, 1.0]
65+
alpha_map = dict(zip(cohort_labels, alpha_values, strict=True))
66+
df["line_alpha"] = df["cohort"].map(alpha_map).astype(float)
67+
68+
# Line width: thinner for older, bolder for newer
69+
size_values = [1.0, 1.2, 1.4, 1.6, 2.0]
70+
size_map = dict(zip(cohort_labels, size_values, strict=True))
71+
df["line_size"] = df["cohort"].map(size_map).astype(float)
72+
73+
# Ribbon data: show spread between oldest and newest cohort
74+
oldest_label = cohort_labels[0]
75+
newest_label = cohort_labels[-1]
76+
df_oldest = df[df["cohort"] == oldest_label][["week", "retention"]].rename(columns={"retention": "ymin"})
77+
df_newest = df[df["cohort"] == newest_label][["week", "retention"]].rename(columns={"retention": "ymax"})
78+
df_ribbon = df_oldest.merge(df_newest, on="week")
79+
80+
# Colors: refined palette with clear progression
81+
colors = ["#94B8D1", "#6A9EC1", "#306998", "#E07941", "#C94420"]
82+
83+
# Endpoint labels for storytelling
84+
df_endpoints = df[df["week"] == 12].copy()
85+
df_endpoints["ret_label"] = df_endpoints["retention"].apply(lambda x: f"{x:.0f}%")
86+
87+
# Plot
88+
plot = (
89+
ggplot(df, aes(x="week", y="retention", color="cohort", group="cohort"))
90+
+ geom_ribbon(
91+
aes(x="week", ymin="ymin", ymax="ymax"), data=df_ribbon, inherit_aes=False, fill="#306998", alpha=0.07
92+
)
93+
+ geom_hline(yintercept=20, linetype="dashed", color="#AAAAAA", size=0.7)
94+
+ geom_line(aes(alpha="line_alpha", size="line_size"))
95+
+ scale_alpha_identity()
96+
+ scale_size_identity()
97+
+ geom_point(aes(alpha="line_alpha"), size=2.5, show_legend=False)
98+
+ geom_text(aes(label="ret_label"), data=df_endpoints, nudge_x=0.5, size=10, ha="left", show_legend=False)
99+
+ scale_color_manual(values=colors)
100+
+ scale_x_continuous(breaks=range(0, 13), labels=[str(w) for w in range(0, 13)], expand=(0.02, 0.8))
101+
+ scale_y_continuous(
102+
limits=(0, 108), breaks=[0, 20, 40, 60, 80, 100], labels=["0%", "20%", "40%", "60%", "80%", "100%"]
103+
)
104+
+ annotate(
105+
"text",
106+
x=11.8,
107+
y=22.5,
108+
label="20% retention threshold",
109+
size=11,
110+
color="#888888",
111+
ha="right",
112+
fontstyle="italic",
113+
)
114+
+ annotate(
115+
"label",
116+
x=6,
117+
y=55,
118+
label="Improvement\ngap",
119+
size=10,
120+
color="#306998",
121+
fill="#F0F4F8",
122+
alpha=0.85,
123+
ha="center",
124+
label_size=0,
125+
)
126+
+ labs(
127+
x="Weeks Since Signup",
128+
y="Retained Users",
129+
color="Cohort",
130+
title="line-retention-cohort · plotnine · pyplots.ai",
131+
)
132+
+ guides(color=guide_legend(override_aes={"size": 3, "alpha": 1}))
133+
+ theme_minimal()
134+
+ theme(
135+
figure_size=(16, 9),
136+
text=element_text(family="sans-serif", size=14, color="#333333"),
137+
plot_title=element_text(size=24, weight="bold", color="#1a1a1a"),
138+
axis_title=element_text(size=20, color="#444444"),
139+
axis_text=element_text(size=16, color="#555555"),
140+
legend_title=element_text(size=18, weight="bold"),
141+
legend_text=element_text(size=14),
142+
legend_position="right",
143+
legend_background=element_rect(fill="#FAFAFA", color="#E0E0E0", size=0.5),
144+
legend_key=element_rect(fill="none", color="none"),
145+
panel_grid_major_x=element_blank(),
146+
panel_grid_minor=element_blank(),
147+
panel_grid_major_y=element_line(color="#EBEBEB", size=0.4, alpha=0.6),
148+
axis_line_x=element_line(color="#333333", size=0.5),
149+
axis_line_y=element_line(color="#333333", size=0.5),
150+
plot_margin=0.04,
151+
)
152+
)
153+
154+
# Save
155+
plot.save("plot.png", dpi=300, verbose=False)
Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
library: plotnine
2+
specification_id: line-retention-cohort
3+
created: '2026-03-16T20:44:50Z'
4+
updated: '2026-03-16T20:57:58Z'
5+
generated_by: claude-opus-4-5-20251101
6+
workflow_run: 23164943107
7+
issue: 4572
8+
python_version: 3.14.3
9+
library_version: 0.15.3
10+
preview_url: https://storage.googleapis.com/pyplots-images/plots/line-retention-cohort/plotnine/plot.png
11+
preview_thumb: https://storage.googleapis.com/pyplots-images/plots/line-retention-cohort/plotnine/plot_thumb.png
12+
preview_html: null
13+
quality_score: 91
14+
review:
15+
strengths:
16+
- Excellent data storytelling with improvement gap ribbon, endpoint labels, and
17+
20% threshold annotation
18+
- Graduated alpha and line width effectively emphasize recent cohorts
19+
- Custom color palette with intentional blue-to-orange progression
20+
- All spec requirements fully implemented including cohort sizes in legend
21+
- Expert use of plotnine grammar of graphics with identity scales
22+
weaknesses:
23+
- Minor overlap of endpoint labels (17% and 19%) for the two oldest cohorts at week
24+
12
25+
- Three blue shades for Jan/Feb/Mar cohorts could be more distinct
26+
image_description: The plot displays 5 user retention curves for monthly signup
27+
cohorts (Jan–May 2025), each starting at 100% and decaying over 12 weeks. Colors
28+
progress from light steel blue (oldest cohort, Jan) through medium blue (Mar)
29+
to dark burnt orange/red (newest, May). Lines use graduated alpha (0.6–1.0) and
30+
thickness (1.0–2.0pt) to visually emphasize recent cohorts. A subtle blue-shaded
31+
ribbon spans the area between the oldest and newest cohort curves, labeled "Improvement
32+
gap" in a boxed annotation. A horizontal dashed gray line marks the 20% retention
33+
threshold with an italic annotation. Endpoint percentage labels appear at week
34+
12 for each cohort (17%, 19%, 34%, 40%, 57%). The legend on the right shows cohort
35+
labels with sample sizes (e.g., "Jan 2025 (n=1,245)") inside a light gray box
36+
with border. Title reads "line-retention-cohort · plotnine · pyplots.ai" in bold.
37+
Y-axis labeled "Retained Users" with percentage gridlines; X-axis labeled "Weeks
38+
Since Signup" with integer ticks 0–12.
39+
criteria_checklist:
40+
visual_quality:
41+
score: 26
42+
max: 30
43+
items:
44+
- id: VQ-01
45+
name: Text Legibility
46+
score: 8
47+
max: 8
48+
passed: true
49+
comment: 'All font sizes explicitly set: title=24, axis_title=20, axis_text=16,
50+
legend_title=18, legend_text=14'
51+
- id: VQ-02
52+
name: No Overlap
53+
score: 4
54+
max: 6
55+
passed: true
56+
comment: Minor overlap of 19% and 17% endpoint labels at week 12
57+
- id: VQ-03
58+
name: Element Visibility
59+
score: 6
60+
max: 6
61+
passed: true
62+
comment: Lines and points well-sized with graduated thickness and alpha
63+
- id: VQ-04
64+
name: Color Accessibility
65+
score: 3
66+
max: 4
67+
passed: true
68+
comment: Blue-to-orange progression generally colorblind-safe, three blue
69+
shades could be more distinct
70+
- id: VQ-05
71+
name: Layout & Canvas
72+
score: 3
73+
max: 4
74+
passed: true
75+
comment: Good layout, slight crowding on right side near legend
76+
- id: VQ-06
77+
name: Axis Labels & Title
78+
score: 2
79+
max: 2
80+
passed: true
81+
comment: 'Descriptive labels: Weeks Since Signup, Retained Users'
82+
design_excellence:
83+
score: 16
84+
max: 20
85+
items:
86+
- id: DE-01
87+
name: Aesthetic Sophistication
88+
score: 6
89+
max: 8
90+
passed: true
91+
comment: Custom palette with color progression, refined typography, graduated
92+
alpha/size for emphasis
93+
- id: DE-02
94+
name: Visual Refinement
95+
score: 5
96+
max: 6
97+
passed: true
98+
comment: X-grid removed, Y-grid subtle, legend styled, generous whitespace
99+
- id: DE-03
100+
name: Data Storytelling
101+
score: 5
102+
max: 6
103+
passed: true
104+
comment: Improvement gap ribbon, endpoint labels, threshold line, alpha/size
105+
graduation create clear narrative
106+
spec_compliance:
107+
score: 15
108+
max: 15
109+
items:
110+
- id: SC-01
111+
name: Plot Type
112+
score: 5
113+
max: 5
114+
passed: true
115+
comment: Correct line chart with multiple cohort retention curves
116+
- id: SC-02
117+
name: Required Features
118+
score: 4
119+
max: 4
120+
passed: true
121+
comment: 'All spec features implemented: 100% start, distinct colors, legend
122+
with sizes, gridlines, threshold line, emphasis'
123+
- id: SC-03
124+
name: Data Mapping
125+
score: 3
126+
max: 3
127+
passed: true
128+
comment: X=weeks since signup, Y=retention percentage, correctly mapped
129+
- id: SC-04
130+
name: Title & Legend
131+
score: 3
132+
max: 3
133+
passed: true
134+
comment: Title follows exact format, legend labels show cohort name with sample
135+
size
136+
data_quality:
137+
score: 15
138+
max: 15
139+
items:
140+
- id: DQ-01
141+
name: Feature Coverage
142+
score: 6
143+
max: 6
144+
passed: true
145+
comment: 5 cohorts with different decay rates, plateau levels, and sizes showing
146+
clear progression
147+
- id: DQ-02
148+
name: Realistic Context
149+
score: 5
150+
max: 5
151+
passed: true
152+
comment: Real-world product analytics with monthly cohorts and plausible user
153+
counts
154+
- id: DQ-03
155+
name: Appropriate Scale
156+
score: 4
157+
max: 4
158+
passed: true
159+
comment: Realistic exponential decay with sensible plateau levels and cohort
160+
sizes
161+
code_quality:
162+
score: 10
163+
max: 10
164+
items:
165+
- id: CQ-01
166+
name: KISS Structure
167+
score: 3
168+
max: 3
169+
passed: true
170+
comment: Clean imports-data-plot-save flow, no functions or classes
171+
- id: CQ-02
172+
name: Reproducibility
173+
score: 2
174+
max: 2
175+
passed: true
176+
comment: np.random.seed(42) set
177+
- id: CQ-03
178+
name: Clean Imports
179+
score: 2
180+
max: 2
181+
passed: true
182+
comment: All imports are used
183+
- id: CQ-04
184+
name: Code Elegance
185+
score: 2
186+
max: 2
187+
passed: true
188+
comment: Well-organized with appropriate complexity
189+
- id: CQ-05
190+
name: Output & API
191+
score: 1
192+
max: 1
193+
passed: true
194+
comment: Saves as plot.png with dpi=300, current API
195+
library_mastery:
196+
score: 9
197+
max: 10
198+
items:
199+
- id: LM-01
200+
name: Idiomatic Usage
201+
score: 5
202+
max: 5
203+
passed: true
204+
comment: Expert grammar of graphics usage with layered geoms, identity scales,
205+
guide overrides
206+
- id: LM-02
207+
name: Distinctive Features
208+
score: 4
209+
max: 5
210+
passed: true
211+
comment: scale_*_identity with data-mapped columns, annotate label, geom_ribbon,
212+
guide_legend override_aes
213+
verdict: APPROVED
214+
impl_tags:
215+
dependencies: []
216+
techniques:
217+
- annotations
218+
- layer-composition
219+
- custom-legend
220+
patterns:
221+
- data-generation
222+
- iteration-over-groups
223+
dataprep: []
224+
styling:
225+
- alpha-blending
226+
- grid-styling

0 commit comments

Comments
 (0)