Skip to content

Commit 44b06ea

Browse files
feat(letsplot): implement boxen-basic (#3439)
## Implementation: `boxen-basic` - letsplot Implements the **letsplot** version of `boxen-basic`. **File:** `plots/boxen-basic/implementations/letsplot.py` **Parent Issue:** #3414 --- :robot: *[impl-generate workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/20845378914)* --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent 8138af0 commit 44b06ea

2 files changed

Lines changed: 376 additions & 0 deletions

File tree

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
""" pyplots.ai
2+
boxen-basic: Basic Boxen Plot (Letter-Value Plot)
3+
Library: letsplot 4.8.2 | Python 3.13.11
4+
Quality: 91/100 | Created: 2026-01-09
5+
"""
6+
7+
import numpy as np
8+
import pandas as pd
9+
from lets_plot import (
10+
LetsPlot,
11+
aes,
12+
element_text,
13+
geom_point,
14+
geom_rect,
15+
geom_segment,
16+
ggplot,
17+
ggsave,
18+
ggsize,
19+
labs,
20+
scale_fill_manual,
21+
scale_x_continuous,
22+
theme,
23+
theme_minimal,
24+
)
25+
26+
27+
LetsPlot.setup_html()
28+
29+
# Data - Generate realistic response times for different server endpoints
30+
np.random.seed(42)
31+
endpoints = ["API Gateway", "Auth Service", "Database", "Cache Layer"]
32+
n_per_group = 2000
33+
34+
data = []
35+
# Realistic response time distributions (ms) with different characteristics
36+
distributions = {
37+
"API Gateway": {"base": 45, "scale": 20, "skew": 0.5},
38+
"Auth Service": {"base": 80, "scale": 35, "skew": 0.8},
39+
"Database": {"base": 120, "scale": 50, "skew": 1.2},
40+
"Cache Layer": {"base": 8, "scale": 5, "skew": 0.3},
41+
}
42+
43+
for endpoint in endpoints:
44+
d = distributions[endpoint]
45+
# Generate log-normal like distribution for realistic response times
46+
values = np.random.exponential(d["scale"], n_per_group) + d["base"]
47+
# Add occasional slow requests (tail)
48+
slow_idx = np.random.choice(n_per_group, size=int(n_per_group * 0.05), replace=False)
49+
values[slow_idx] = values[slow_idx] * np.random.uniform(2, 5, len(slow_idx))
50+
data.extend([(endpoint, v) for v in values])
51+
52+
df = pd.DataFrame(data, columns=["endpoint", "response_time"])
53+
54+
55+
# Letter value names for legend
56+
level_names = ["50%", "75%", "87.5%", "93.75%", "96.875%", "98.4%", "99.2%", "99.6%"]
57+
level_colors = ["#306998", "#4A7FA8", "#6490B8", "#7EA1C8", "#98B2D8", "#B2C3E8", "#CCD4F8", "#E6E5FF"]
58+
59+
60+
# Calculate letter values for boxen plot
61+
def compute_letter_values(values, k=None):
62+
"""Compute letter values (quantiles) for boxen plot."""
63+
n = len(values)
64+
if k is None:
65+
# Number of letter values based on data size
66+
k = int(np.log2(n)) - 1
67+
k = max(2, min(k, 8))
68+
69+
sorted_vals = np.sort(values)
70+
letter_values = []
71+
72+
for i in range(k):
73+
# Calculate the depth for each letter value
74+
depth = 0.5 ** (i + 1)
75+
lower_q = depth
76+
upper_q = 1 - depth
77+
78+
lower_val = np.percentile(sorted_vals, lower_q * 100)
79+
upper_val = np.percentile(sorted_vals, upper_q * 100)
80+
letter_values.append((lower_val, upper_val, level_names[i]))
81+
82+
# Calculate outlier bounds (beyond deepest letter value)
83+
deepest_lower = letter_values[-1][0]
84+
deepest_upper = letter_values[-1][1]
85+
outliers = sorted_vals[(sorted_vals < deepest_lower) | (sorted_vals > deepest_upper)]
86+
87+
return letter_values, np.median(sorted_vals), outliers, k
88+
89+
90+
# Compute letter values for each endpoint
91+
box_data = []
92+
median_data = []
93+
outlier_data = []
94+
max_k = 0
95+
96+
x_positions = {endpoint: i for i, endpoint in enumerate(endpoints)}
97+
98+
for endpoint in endpoints:
99+
group_data = df[df["endpoint"] == endpoint]["response_time"].values
100+
letter_vals, median, outliers, k = compute_letter_values(group_data)
101+
max_k = max(max_k, k)
102+
103+
x_pos = x_positions[endpoint]
104+
105+
for idx, (lower, upper, level_name) in enumerate(letter_vals):
106+
# Width decreases with depth
107+
half_width = 0.4 * (0.85**idx)
108+
box_data.append(
109+
{
110+
"x_min": x_pos - half_width,
111+
"x_max": x_pos + half_width,
112+
"y_min": lower,
113+
"y_max": upper,
114+
"level": level_name,
115+
"endpoint": endpoint,
116+
}
117+
)
118+
119+
median_data.append({"x": x_pos - 0.38, "xend": x_pos + 0.38, "y": median, "endpoint": endpoint})
120+
121+
for o in outliers:
122+
outlier_data.append({"x": x_pos, "y": o, "endpoint": endpoint})
123+
124+
box_df = pd.DataFrame(box_data)
125+
median_df = pd.DataFrame(median_data)
126+
outlier_df = pd.DataFrame(outlier_data) if outlier_data else pd.DataFrame(columns=["x", "y", "endpoint"])
127+
128+
# Plot using lets-plot
129+
plot = (
130+
ggplot()
131+
+ geom_rect(
132+
aes(xmin="x_min", xmax="x_max", ymin="y_min", ymax="y_max", fill="level"),
133+
data=box_df,
134+
alpha=0.9,
135+
color="#1a1a1a",
136+
size=0.5,
137+
)
138+
+ geom_segment(aes(x="x", xend="xend", y="y", yend="y"), data=median_df, color="#FFD43B", size=3)
139+
+ scale_fill_manual(
140+
values=dict(zip(level_names[:max_k], level_colors[:max_k], strict=False)), name="Quantile Range"
141+
)
142+
+ scale_x_continuous(breaks=[0, 1, 2, 3], labels=endpoints)
143+
+ labs(x="Server Endpoint", y="Response Time (ms)", title="boxen-basic \u00b7 letsplot \u00b7 pyplots.ai")
144+
+ theme_minimal()
145+
+ theme(
146+
axis_title=element_text(size=20),
147+
axis_text=element_text(size=16),
148+
plot_title=element_text(size=24),
149+
legend_title=element_text(size=18),
150+
legend_text=element_text(size=14),
151+
)
152+
+ ggsize(1600, 900)
153+
)
154+
155+
# Add outliers if present
156+
if not outlier_df.empty:
157+
plot = plot + geom_point(aes(x="x", y="y"), data=outlier_df, color="#DC2626", size=2, alpha=0.6)
158+
159+
# Save
160+
ggsave(plot, "plot.png", path=".", scale=3)
161+
ggsave(plot, "plot.html", path=".")
Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
library: letsplot
2+
specification_id: boxen-basic
3+
created: '2026-01-09T08:12:01Z'
4+
updated: '2026-01-09T08:14:47Z'
5+
generated_by: claude-opus-4-5-20251101
6+
workflow_run: 20845378914
7+
issue: 3414
8+
python_version: 3.13.11
9+
library_version: 4.8.2
10+
preview_url: https://storage.googleapis.com/pyplots-images/plots/boxen-basic/letsplot/plot.png
11+
preview_thumb: https://storage.googleapis.com/pyplots-images/plots/boxen-basic/letsplot/plot_thumb.png
12+
preview_html: https://storage.googleapis.com/pyplots-images/plots/boxen-basic/letsplot/plot.html
13+
quality_score: 91
14+
review:
15+
strengths:
16+
- Excellent visual representation of letter-value plot with clear nested box structure
17+
- Realistic server response time scenario with appropriate data characteristics
18+
(skewed distributions, tail behavior)
19+
- Good color gradient from dark blue to light lavender that clearly shows quantile
20+
depth
21+
- Yellow median lines provide excellent contrast and visibility
22+
- Legend clearly explains the quantile ranges
23+
- Proper handling of outliers as distinct red points
24+
weaknesses:
25+
- Uses a helper function compute_letter_values() which violates the KISS principle
26+
(imports → data → plot → save, no functions)
27+
- Legend order shows 50% at top and 99.6% at bottom, which is counterintuitive to
28+
visual interpretation
29+
image_description: 'The plot displays a letter-value (boxen) plot comparing response
30+
times across four server endpoints: API Gateway, Auth Service, Database, and Cache
31+
Layer. Each endpoint shows nested rectangular boxes representing quantile ranges
32+
from 50% (innermost, dark blue #306998) to 99.6% (outermost, light lavender).
33+
The boxes decrease in width for deeper quantiles, creating the characteristic
34+
boxen plot shape. Yellow/gold median lines are prominently displayed across each
35+
distribution. Red dots mark outliers beyond the 99.6% quantile. The Database endpoint
36+
shows the widest distribution and most outliers, while Cache Layer shows the tightest
37+
distribution with lowest response times. The plot uses a minimal theme with subtle
38+
grid lines and a clean legend on the right explaining the quantile ranges.'
39+
criteria_checklist:
40+
visual_quality:
41+
score: 37
42+
max: 40
43+
items:
44+
- id: VQ-01
45+
name: Text Legibility
46+
score: 10
47+
max: 10
48+
passed: true
49+
comment: Title at 24pt, axis labels at 20pt, tick labels at 16pt - all perfectly
50+
readable
51+
- id: VQ-02
52+
name: No Overlap
53+
score: 8
54+
max: 8
55+
passed: true
56+
comment: No overlapping text elements, endpoint labels are well-spaced
57+
- id: VQ-03
58+
name: Element Visibility
59+
score: 7
60+
max: 8
61+
passed: true
62+
comment: Boxes are clearly visible with good sizing; outlier points could
63+
be slightly larger
64+
- id: VQ-04
65+
name: Color Accessibility
66+
score: 5
67+
max: 5
68+
passed: true
69+
comment: Blue gradient palette is colorblind-safe; yellow median line provides
70+
good contrast
71+
- id: VQ-05
72+
name: Layout Balance
73+
score: 5
74+
max: 5
75+
passed: true
76+
comment: Plot fills canvas appropriately with balanced margins; legend well-positioned
77+
- id: VQ-06
78+
name: Axis Labels
79+
score: 2
80+
max: 2
81+
passed: true
82+
comment: 'Descriptive labels with units: Response Time (ms) and Server Endpoint'
83+
- id: VQ-07
84+
name: Grid & Legend
85+
score: 0
86+
max: 2
87+
passed: false
88+
comment: Legend quantile ordering is counterintuitive (50% at top, 99.6% at
89+
bottom)
90+
spec_compliance:
91+
score: 24
92+
max: 25
93+
items:
94+
- id: SC-01
95+
name: Plot Type
96+
score: 8
97+
max: 8
98+
passed: true
99+
comment: Correct boxen/letter-value plot with nested boxes
100+
- id: SC-02
101+
name: Data Mapping
102+
score: 5
103+
max: 5
104+
passed: true
105+
comment: Categories on X-axis, values on Y-axis
106+
- id: SC-03
107+
name: Required Features
108+
score: 5
109+
max: 5
110+
passed: true
111+
comment: Nested boxes, decreasing widths, outliers as points, legend explaining
112+
quantile levels
113+
- id: SC-04
114+
name: Data Range
115+
score: 3
116+
max: 3
117+
passed: true
118+
comment: All data visible including outliers up to ~1400ms
119+
- id: SC-05
120+
name: Legend Accuracy
121+
score: 2
122+
max: 2
123+
passed: true
124+
comment: Legend correctly shows quantile range names
125+
- id: SC-06
126+
name: Title Format
127+
score: 1
128+
max: 2
129+
passed: true
130+
comment: Uses correct format but with Unicode middot character
131+
data_quality:
132+
score: 20
133+
max: 20
134+
items:
135+
- id: DQ-01
136+
name: Feature Coverage
137+
score: 8
138+
max: 8
139+
passed: true
140+
comment: 'Shows all aspects: different distribution shapes, varying spreads,
141+
outliers, tail behavior'
142+
- id: DQ-02
143+
name: Realistic Context
144+
score: 7
145+
max: 7
146+
passed: true
147+
comment: Server response times is a real, neutral scenario perfectly suited
148+
for large dataset visualization
149+
- id: DQ-03
150+
name: Appropriate Scale
151+
score: 5
152+
max: 5
153+
passed: true
154+
comment: Response times in realistic ranges (8-500ms base with occasional
155+
slow requests up to 1400ms)
156+
code_quality:
157+
score: 7
158+
max: 10
159+
items:
160+
- id: CQ-01
161+
name: KISS Structure
162+
score: 0
163+
max: 3
164+
passed: false
165+
comment: Uses a function compute_letter_values() which violates KISS principle
166+
- id: CQ-02
167+
name: Reproducibility
168+
score: 3
169+
max: 3
170+
passed: true
171+
comment: Uses np.random.seed(42) for reproducibility
172+
- id: CQ-03
173+
name: Clean Imports
174+
score: 2
175+
max: 2
176+
passed: true
177+
comment: All imports are used
178+
- id: CQ-04
179+
name: No Deprecated API
180+
score: 1
181+
max: 1
182+
passed: true
183+
comment: Modern lets-plot API
184+
- id: CQ-05
185+
name: Output Correct
186+
score: 1
187+
max: 1
188+
passed: true
189+
comment: Saves as plot.png and plot.html
190+
library_features:
191+
score: 3
192+
max: 5
193+
items:
194+
- id: LF-01
195+
name: Distinctive Features
196+
score: 3
197+
max: 5
198+
passed: true
199+
comment: Uses ggplot2 grammar with geom_rect, geom_segment, geom_point. Manual
200+
construction necessary as lets-plot has no native boxen geom.
201+
verdict: APPROVED
202+
impl_tags:
203+
dependencies: []
204+
techniques:
205+
- layer-composition
206+
- manual-ticks
207+
- html-export
208+
patterns:
209+
- data-generation
210+
- iteration-over-groups
211+
dataprep:
212+
- binning
213+
styling:
214+
- alpha-blending
215+
- edge-highlighting

0 commit comments

Comments
 (0)