Skip to content

Commit 346bd4f

Browse files
feat(plotnine): implement boxen-basic (#3434)
## Implementation: `boxen-basic` - plotnine Implements the **plotnine** version of `boxen-basic`. **File:** `plots/boxen-basic/implementations/plotnine.py` **Parent Issue:** #3414 --- :robot: *[impl-generate workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/20845375761)* --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent 44b06ea commit 346bd4f

2 files changed

Lines changed: 390 additions & 0 deletions

File tree

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
""" pyplots.ai
2+
boxen-basic: Basic Boxen Plot (Letter-Value Plot)
3+
Library: plotnine 0.15.2 | Python 3.13.11
4+
Quality: 91/100 | Created: 2026-01-09
5+
"""
6+
7+
import numpy as np
8+
import pandas as pd
9+
from plotnine import (
10+
aes,
11+
element_line,
12+
element_text,
13+
geom_point,
14+
geom_rect,
15+
geom_segment,
16+
ggplot,
17+
labs,
18+
scale_fill_manual,
19+
scale_x_continuous,
20+
theme,
21+
theme_minimal,
22+
)
23+
24+
25+
# Set seed for reproducibility
26+
np.random.seed(42)
27+
28+
# Generate data - server response times by endpoint (1000+ per category)
29+
n_per_group = 2000
30+
endpoints = ["API", "Database", "Cache", "Auth"]
31+
32+
data = []
33+
for endpoint in endpoints:
34+
if endpoint == "API":
35+
# Right-skewed with some outliers
36+
values = np.concatenate(
37+
[
38+
np.random.exponential(50, n_per_group - 20),
39+
np.random.uniform(300, 500, 20), # Outliers
40+
]
41+
)
42+
elif endpoint == "Database":
43+
# Bimodal - some fast, some slow queries
44+
values = np.concatenate(
45+
[np.random.normal(30, 10, n_per_group // 2), np.random.normal(100, 20, n_per_group // 2)]
46+
)
47+
elif endpoint == "Cache":
48+
# Fast and tight distribution
49+
values = np.random.normal(15, 5, n_per_group)
50+
values = np.maximum(values, 1) # No negative response times
51+
else: # Auth
52+
# Medium with heavy tail
53+
values = np.random.gamma(3, 20, n_per_group)
54+
55+
for v in values:
56+
data.append({"endpoint": endpoint, "response_time": v})
57+
58+
df = pd.DataFrame(data)
59+
60+
61+
# Compute letter values for each group
62+
categories = df["endpoint"].unique()
63+
box_data = []
64+
outlier_data = []
65+
median_data = []
66+
67+
# Width parameters
68+
base_width = 0.8
69+
width_decay = 0.85 # Each nested level is 85% of previous width
70+
71+
for i, cat in enumerate(categories):
72+
values = df[df["endpoint"] == cat]["response_time"].values
73+
74+
# Calculate letter values (quantiles) inline
75+
n = len(values)
76+
k = min(max(3, int(np.floor(np.log2(n)) - 2)), 8) # Adaptive levels, cap at 8
77+
78+
# Compute quantile depths
79+
depths = [0.5] # Start with median
80+
for j in range(1, k):
81+
depth = 0.5 ** (j + 1)
82+
depths.append(0.5 - depth)
83+
depths.append(0.5 + depth)
84+
85+
depths = sorted(set(depths))
86+
quantiles = np.quantile(values, depths)
87+
88+
# Find median
89+
median_idx = depths.index(0.5)
90+
median_val = quantiles[median_idx]
91+
median_data.append({"x": i, "y": median_val, "endpoint": cat})
92+
93+
# Create nested boxes from outer to inner
94+
n_pairs = (len(depths) - 1) // 2
95+
for level in range(n_pairs):
96+
lower_idx = level
97+
upper_idx = len(depths) - 1 - level
98+
ymin = quantiles[lower_idx]
99+
ymax = quantiles[upper_idx]
100+
width = base_width * (width_decay**level)
101+
102+
box_data.append(
103+
{
104+
"endpoint": cat,
105+
"x": i,
106+
"xmin": i - width / 2,
107+
"xmax": i + width / 2,
108+
"ymin": ymin,
109+
"ymax": ymax,
110+
"level": level,
111+
}
112+
)
113+
114+
# Outliers beyond deepest letter value
115+
lower_bound = quantiles[0]
116+
upper_bound = quantiles[-1]
117+
outliers = values[(values < lower_bound) | (values > upper_bound)]
118+
for o in outliers:
119+
outlier_data.append({"x": i, "y": o, "endpoint": cat})
120+
121+
box_df = pd.DataFrame(box_data)
122+
outlier_df = pd.DataFrame(outlier_data) if outlier_data else pd.DataFrame(columns=["x", "y", "endpoint"])
123+
median_df = pd.DataFrame(median_data)
124+
125+
# Color palette - Python Blue gradient from dark to light
126+
n_levels = box_df["level"].max() + 1 if len(box_df) > 0 else 1
127+
# Create gradient from dark blue (#1a4971) to light blue (#a8d4f0)
128+
colors = []
129+
for i in range(n_levels):
130+
t = i / max(n_levels - 1, 1) # Normalize to 0-1
131+
r = int(26 + t * (168 - 26))
132+
g = int(73 + t * (212 - 73))
133+
b = int(113 + t * (240 - 113))
134+
colors.append(f"#{r:02x}{g:02x}{b:02x}")
135+
136+
# Create the plot
137+
plot = (
138+
ggplot()
139+
+ geom_rect(
140+
data=box_df.sort_values("level", ascending=False), # Draw outer boxes first
141+
mapping=aes(xmin="xmin", xmax="xmax", ymin="ymin", ymax="ymax", fill="factor(level)"),
142+
color="#1a1a1a",
143+
size=0.3,
144+
)
145+
+ geom_segment(data=median_df, mapping=aes(x="x - 0.35", xend="x + 0.35", y="y", yend="y"), color="white", size=1.5)
146+
+ scale_fill_manual(
147+
values=colors,
148+
name="Quantile Level",
149+
labels=[f"{50 * (0.5 ** (i + 1)):.1f}%-{100 - 50 * (0.5 ** (i + 1)):.1f}%" for i in range(n_levels)],
150+
)
151+
+ labs(title="boxen-basic · plotnine · pyplots.ai", x="Endpoint", y="Response Time (ms)")
152+
+ theme_minimal()
153+
+ theme(
154+
figure_size=(16, 9),
155+
plot_title=element_text(size=24, weight="bold"),
156+
axis_title=element_text(size=20),
157+
axis_text=element_text(size=16),
158+
axis_text_x=element_text(size=18),
159+
legend_title=element_text(size=16),
160+
legend_text=element_text(size=14),
161+
legend_position="right",
162+
panel_grid_major=element_line(alpha=0.3),
163+
panel_grid_minor=element_line(alpha=0.15),
164+
)
165+
)
166+
167+
# Add outliers if present
168+
if len(outlier_df) > 0:
169+
plot = plot + geom_point(data=outlier_df, mapping=aes(x="x", y="y"), color="#306998", size=2, alpha=0.5)
170+
171+
# Custom x-axis scale for category names
172+
plot = plot + scale_x_continuous(breaks=list(range(len(categories))), labels=list(categories))
173+
174+
# Save the plot
175+
plot.save("plot.png", dpi=300, width=16, height=9)
Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
library: plotnine
2+
specification_id: boxen-basic
3+
created: '2026-01-09T08:10:09Z'
4+
updated: '2026-01-09T08:15:18Z'
5+
generated_by: claude-opus-4-5-20251101
6+
workflow_run: 20845375761
7+
issue: 3414
8+
python_version: 3.13.11
9+
library_version: 0.15.2
10+
preview_url: https://storage.googleapis.com/pyplots-images/plots/boxen-basic/plotnine/plot.png
11+
preview_thumb: https://storage.googleapis.com/pyplots-images/plots/boxen-basic/plotnine/plot_thumb.png
12+
preview_html: null
13+
quality_score: 91
14+
review:
15+
strengths:
16+
- Excellent implementation of a boxen plot using plotnine grammar of graphics despite
17+
lacking native geom_boxen support
18+
- 'Data demonstrates all key aspects of boxen plots: different distribution shapes,
19+
visible quantile nesting, and outlier detection'
20+
- Clean blue gradient color scheme that is colorblind-safe
21+
- Realistic server response time scenario with 2000 points per category
22+
- Proper use of nested boxes with decreasing widths for deeper quantiles
23+
- Clear legend explaining quantile level percentages
24+
weaknesses:
25+
- Color gradient between quantile levels could have more contrast to better distinguish
26+
nested boxes
27+
- Manual implementation adds code complexity (though unavoidable for plotnine)
28+
image_description: 'The plot displays a boxen plot (letter-value plot) showing server
29+
response times across four endpoints: API, Database, Cache, and Auth. Each category
30+
has nested rectangular boxes representing different quantile levels, colored in
31+
a gradient from dark blue (innermost, representing 25%-75%) to light blue (outermost,
32+
representing 0.4%-99.6%). The boxes decrease in width for deeper quantiles as
33+
expected. A white horizontal line marks the median in each box. Outliers are shown
34+
as semi-transparent blue points above the main boxes, particularly visible for
35+
the API endpoint (ranging up to ~500ms) and to a lesser extent for Database and
36+
Auth. The Cache endpoint shows a tight, low distribution with few outliers. The
37+
x-axis is labeled "Endpoint" with category names (API, Database, Cache, Auth),
38+
and the y-axis is labeled "Response Time (ms)" ranging from 0 to about 500. A
39+
legend on the right explains the quantile levels. The title follows the required
40+
format: "boxen-basic · plotnine · pyplots.ai". Grid lines are subtle and the overall
41+
layout is well-balanced.'
42+
criteria_checklist:
43+
visual_quality:
44+
score: 38
45+
max: 40
46+
items:
47+
- id: VQ-01
48+
name: Text Legibility
49+
score: 10
50+
max: 10
51+
passed: true
52+
comment: Title at 24pt bold, axis labels at 20pt, tick labels at 16-18pt,
53+
all perfectly readable
54+
- id: VQ-02
55+
name: No Overlap
56+
score: 8
57+
max: 8
58+
passed: true
59+
comment: No overlapping text elements, category labels well-spaced
60+
- id: VQ-03
61+
name: Element Visibility
62+
score: 7
63+
max: 8
64+
passed: true
65+
comment: Nested boxes clearly visible with distinct widths; outlier points
66+
appropriately sized with alpha=0.5
67+
- id: VQ-04
68+
name: Color Accessibility
69+
score: 5
70+
max: 5
71+
passed: true
72+
comment: Blue gradient is colorblind-safe, good contrast between levels
73+
- id: VQ-05
74+
name: Layout Balance
75+
score: 5
76+
max: 5
77+
passed: true
78+
comment: Plot fills good portion of canvas, balanced margins
79+
- id: VQ-06
80+
name: Axis Labels
81+
score: 2
82+
max: 2
83+
passed: true
84+
comment: 'Descriptive labels with units: Response Time (ms) and Endpoint'
85+
- id: VQ-07
86+
name: Grid & Legend
87+
score: 1
88+
max: 2
89+
passed: true
90+
comment: Grid subtle (alpha 0.3), legend placed well but has many entries
91+
spec_compliance:
92+
score: 23
93+
max: 25
94+
items:
95+
- id: SC-01
96+
name: Plot Type
97+
score: 8
98+
max: 8
99+
passed: true
100+
comment: Correct boxen/letter-value plot with nested boxes showing multiple
101+
quantile levels
102+
- id: SC-02
103+
name: Data Mapping
104+
score: 5
105+
max: 5
106+
passed: true
107+
comment: Categories on X-axis, values on Y-axis correctly assigned
108+
- id: SC-03
109+
name: Required Features
110+
score: 4
111+
max: 5
112+
passed: true
113+
comment: Shows nested boxes with decreasing widths, outliers displayed; color
114+
distinction between levels is subtle
115+
- id: SC-04
116+
name: Data Range
117+
score: 3
118+
max: 3
119+
passed: true
120+
comment: All data visible including outliers up to ~500ms
121+
- id: SC-05
122+
name: Legend Accuracy
123+
score: 2
124+
max: 2
125+
passed: true
126+
comment: Legend correctly explains quantile levels with percentage ranges
127+
- id: SC-06
128+
name: Title Format
129+
score: 1
130+
max: 2
131+
passed: true
132+
comment: Correct format but uses standard separator
133+
data_quality:
134+
score: 19
135+
max: 20
136+
items:
137+
- id: DQ-01
138+
name: Feature Coverage
139+
score: 8
140+
max: 8
141+
passed: true
142+
comment: 'Excellent: shows different distribution shapes (API skewed, Database
143+
bimodal, Cache tight, Auth gamma)'
144+
- id: DQ-02
145+
name: Realistic Context
146+
score: 7
147+
max: 7
148+
passed: true
149+
comment: Server response times by endpoint is a perfect, realistic scenario
150+
- id: DQ-03
151+
name: Appropriate Scale
152+
score: 4
153+
max: 5
154+
passed: true
155+
comment: Response times 0-500ms are realistic for server endpoints
156+
code_quality:
157+
score: 8
158+
max: 10
159+
items:
160+
- id: CQ-01
161+
name: KISS Structure
162+
score: 2
163+
max: 3
164+
passed: true
165+
comment: Code follows pattern but has complexity for manual letter-value calculation
166+
(unavoidable)
167+
- id: CQ-02
168+
name: Reproducibility
169+
score: 3
170+
max: 3
171+
passed: true
172+
comment: np.random.seed(42) set at beginning
173+
- id: CQ-03
174+
name: Clean Imports
175+
score: 2
176+
max: 2
177+
passed: true
178+
comment: All imports are used
179+
- id: CQ-04
180+
name: No Deprecated API
181+
score: 1
182+
max: 1
183+
passed: true
184+
comment: Uses current plotnine API
185+
- id: CQ-05
186+
name: Output Correct
187+
score: 0
188+
max: 1
189+
passed: false
190+
comment: Saves to plot.png correctly
191+
library_features:
192+
score: 3
193+
max: 5
194+
items:
195+
- id: LF-01
196+
name: Distinctive Features
197+
score: 3
198+
max: 5
199+
passed: true
200+
comment: Uses ggplot grammar with geom_rect workaround since plotnine lacks
201+
native geom_boxen
202+
verdict: APPROVED
203+
impl_tags:
204+
dependencies: []
205+
techniques:
206+
- layer-composition
207+
- custom-legend
208+
patterns:
209+
- data-generation
210+
- iteration-over-groups
211+
dataprep:
212+
- binning
213+
styling:
214+
- alpha-blending
215+
- grid-styling

0 commit comments

Comments
 (0)