Skip to content

Commit e82efb1

Browse files
feat(plotnine): implement parallel-categories-basic (#2852)
## Implementation: `parallel-categories-basic` - plotnine Implements the **plotnine** version of `parallel-categories-basic`. **File:** `plots/parallel-categories-basic/implementations/plotnine.py` --- :robot: *[impl-generate workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/20606634555)* --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent 36aa1d7 commit e82efb1

2 files changed

Lines changed: 324 additions & 0 deletions

File tree

Lines changed: 297 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,297 @@
1+
""" pyplots.ai
2+
parallel-categories-basic: Basic Parallel Categories Plot
3+
Library: plotnine 0.15.2 | Python 3.13.11
4+
Quality: 90/100 | Created: 2025-12-30
5+
"""
6+
7+
import sys
8+
9+
10+
# Prevent current directory from shadowing the plotnine package
11+
sys.path = [p for p in sys.path if not p.endswith("implementations")]
12+
13+
import numpy as np # noqa: E402
14+
import pandas as pd # noqa: E402
15+
from plotnine import ( # noqa: E402
16+
aes,
17+
annotate,
18+
coord_cartesian,
19+
element_blank,
20+
element_text,
21+
geom_polygon,
22+
geom_rect,
23+
geom_text,
24+
ggplot,
25+
labs,
26+
scale_fill_manual,
27+
theme,
28+
theme_minimal,
29+
)
30+
31+
32+
# Data - Customer journey data with multiple categorical dimensions
33+
# Each row represents aggregated counts for a specific path through dimensions
34+
np.random.seed(42)
35+
36+
# Define category combinations and realistic counts
37+
path_data = [
38+
# Channel -> Product Category -> Customer Type -> Outcome
39+
("Online", "Electronics", "New", "Purchased", 145),
40+
("Online", "Electronics", "New", "Abandoned", 98),
41+
("Online", "Electronics", "Returning", "Purchased", 187),
42+
("Online", "Electronics", "Returning", "Abandoned", 42),
43+
("Online", "Clothing", "New", "Purchased", 112),
44+
("Online", "Clothing", "New", "Abandoned", 76),
45+
("Online", "Clothing", "Returning", "Purchased", 156),
46+
("Online", "Clothing", "Returning", "Abandoned", 38),
47+
("Online", "Home", "New", "Purchased", 67),
48+
("Online", "Home", "New", "Abandoned", 54),
49+
("Online", "Home", "Returning", "Purchased", 89),
50+
("Online", "Home", "Returning", "Abandoned", 23),
51+
("Store", "Electronics", "New", "Purchased", 78),
52+
("Store", "Electronics", "New", "Abandoned", 32),
53+
("Store", "Electronics", "Returning", "Purchased", 124),
54+
("Store", "Electronics", "Returning", "Abandoned", 18),
55+
("Store", "Clothing", "New", "Purchased", 95),
56+
("Store", "Clothing", "New", "Abandoned", 28),
57+
("Store", "Clothing", "Returning", "Purchased", 142),
58+
("Store", "Clothing", "Returning", "Abandoned", 15),
59+
("Store", "Home", "New", "Purchased", 56),
60+
("Store", "Home", "New", "Abandoned", 21),
61+
("Store", "Home", "Returning", "Purchased", 78),
62+
("Store", "Home", "Returning", "Abandoned", 12),
63+
("Mobile", "Electronics", "New", "Purchased", 89),
64+
("Mobile", "Electronics", "New", "Abandoned", 112),
65+
("Mobile", "Electronics", "Returning", "Purchased", 134),
66+
("Mobile", "Electronics", "Returning", "Abandoned", 67),
67+
("Mobile", "Clothing", "New", "Purchased", 76),
68+
("Mobile", "Clothing", "New", "Abandoned", 94),
69+
("Mobile", "Clothing", "Returning", "Purchased", 118),
70+
("Mobile", "Clothing", "Returning", "Abandoned", 52),
71+
("Mobile", "Home", "New", "Purchased", 45),
72+
("Mobile", "Home", "New", "Abandoned", 58),
73+
("Mobile", "Home", "Returning", "Purchased", 67),
74+
("Mobile", "Home", "Returning", "Abandoned", 34),
75+
]
76+
77+
path_counts = pd.DataFrame(path_data, columns=["channel", "product", "customer_type", "outcome", "count"])
78+
79+
# Define dimensions and their category orders (ordered to minimize ribbon crossings)
80+
dimensions = [
81+
{"name": "channel", "label": "Channel", "categories": ["Online", "Store", "Mobile"]},
82+
{"name": "product", "label": "Product", "categories": ["Electronics", "Clothing", "Home"]},
83+
{"name": "customer_type", "label": "Customer", "categories": ["Returning", "New"]},
84+
{"name": "outcome", "label": "Outcome", "categories": ["Purchased", "Abandoned"]},
85+
]
86+
87+
# Color by outcome - Python Blue for abandoned, Yellow for purchased
88+
outcome_colors = {"Purchased": "#FFD43B", "Abandoned": "#306998"}
89+
90+
# Layout parameters
91+
n_dims = len(dimensions)
92+
x_positions = np.linspace(0.1, 0.9, n_dims)
93+
node_width = 0.04
94+
node_gap = 0.03
95+
total_height = 0.82
96+
y_start = 0.92
97+
98+
# Calculate node positions for each dimension
99+
node_positions = {}
100+
for dim_idx, dim in enumerate(dimensions):
101+
x_pos = x_positions[dim_idx]
102+
categories = dim["categories"]
103+
col_name = dim["name"]
104+
105+
# Calculate totals for this dimension
106+
if col_name == "outcome":
107+
totals = path_counts.groupby(col_name)["count"].sum()
108+
else:
109+
totals = path_counts.groupby(col_name)["count"].sum()
110+
111+
grand_total = totals.sum()
112+
current_y = y_start
113+
114+
for cat in categories:
115+
count = totals.get(cat, 0)
116+
height = (count / grand_total) * total_height if grand_total > 0 else 0
117+
118+
node_positions[(dim_idx, cat)] = {
119+
"x": x_pos,
120+
"y_top": current_y,
121+
"y_bottom": current_y - height,
122+
"height": height,
123+
"count": count,
124+
"flow_offset_out": 0, # For outgoing flows (right side)
125+
"flow_offset_in": 0, # For incoming flows (left side)
126+
}
127+
current_y = current_y - height - node_gap
128+
129+
# Build node rectangles dataframe
130+
node_data = []
131+
for (dim_idx, cat), pos in node_positions.items():
132+
node_data.append(
133+
{
134+
"dim_idx": dim_idx,
135+
"category": cat,
136+
"xmin": pos["x"] - node_width / 2,
137+
"xmax": pos["x"] + node_width / 2,
138+
"ymin": pos["y_bottom"],
139+
"ymax": pos["y_top"],
140+
"label_y": (pos["y_top"] + pos["y_bottom"]) / 2,
141+
"count": pos["count"],
142+
"display_label": str(cat),
143+
"fill_color": outcome_colors.get(cat, "#888888"),
144+
}
145+
)
146+
nodes_df = pd.DataFrame(node_data)
147+
148+
# Build flow polygons between adjacent dimensions
149+
flow_polygons = []
150+
flow_id_counter = 0
151+
152+
for _, path_row in path_counts.iterrows():
153+
path_values = [path_row["channel"], path_row["product"], path_row["customer_type"], path_row["outcome"]]
154+
count = path_row["count"]
155+
outcome = path_row["outcome"]
156+
157+
# Draw flows between each adjacent pair of dimensions
158+
for dim_idx in range(n_dims - 1):
159+
from_cat = path_values[dim_idx]
160+
to_cat = path_values[dim_idx + 1]
161+
162+
src_pos = node_positions[(dim_idx, from_cat)]
163+
tgt_pos = node_positions[(dim_idx + 1, to_cat)]
164+
165+
# Calculate flow height proportional to count at source and target
166+
src_total = sum(path_counts[path_counts[dimensions[dim_idx]["name"]] == from_cat]["count"])
167+
flow_height_src = (count / src_total) * src_pos["height"] if src_total > 0 else 0
168+
169+
tgt_total = sum(path_counts[path_counts[dimensions[dim_idx + 1]["name"]] == to_cat]["count"])
170+
flow_height_tgt = (count / tgt_total) * tgt_pos["height"] if tgt_total > 0 else 0
171+
172+
# Source connection point (right side of node)
173+
src_y_top = src_pos["y_top"] - src_pos["flow_offset_out"]
174+
src_y_bottom = src_y_top - flow_height_src
175+
src_pos["flow_offset_out"] += flow_height_src
176+
177+
# Target connection point (left side of node)
178+
tgt_y_top = tgt_pos["y_top"] - tgt_pos["flow_offset_in"]
179+
tgt_y_bottom = tgt_y_top - flow_height_tgt
180+
tgt_pos["flow_offset_in"] += flow_height_tgt
181+
182+
# Create curved flow polygon using cubic interpolation
183+
flow_x_left = x_positions[dim_idx] + node_width / 2
184+
flow_x_right = x_positions[dim_idx + 1] - node_width / 2
185+
n_points = 30
186+
187+
t_param = np.linspace(0, 1, n_points)
188+
# Smooth cubic easing for natural flow appearance
189+
x_top = flow_x_left + (flow_x_right - flow_x_left) * t_param
190+
y_top = src_y_top + (tgt_y_top - src_y_top) * (3 * t_param**2 - 2 * t_param**3)
191+
192+
x_bottom = flow_x_right + (flow_x_left - flow_x_right) * t_param
193+
y_bottom = tgt_y_bottom + (src_y_bottom - tgt_y_bottom) * (3 * t_param**2 - 2 * t_param**3)
194+
195+
# Combine into polygon
196+
x_polygon = np.concatenate([x_top, x_bottom])
197+
y_polygon = np.concatenate([y_top, y_bottom])
198+
199+
flow_id = f"flow_{flow_id_counter}"
200+
flow_id_counter += 1
201+
202+
for i in range(len(x_polygon)):
203+
flow_polygons.append({"x": x_polygon[i], "y": y_polygon[i], "flow_id": flow_id, "outcome": outcome})
204+
205+
flows_df = pd.DataFrame(flow_polygons)
206+
207+
# Create the plot
208+
plot = (
209+
ggplot()
210+
# Flow polygons with transparency - colored by outcome
211+
+ geom_polygon(flows_df, aes(x="x", y="y", group="flow_id", fill="outcome"), alpha=0.5)
212+
# Node rectangles - use neutral gray for all nodes
213+
+ geom_rect(
214+
nodes_df, aes(xmin="xmin", xmax="xmax", ymin="ymin", ymax="ymax"), fill="#555555", color="white", size=0.8
215+
)
216+
# Category labels on nodes
217+
+ geom_text(
218+
nodes_df[nodes_df["count"] >= 20],
219+
aes(x=(nodes_df["xmin"] + nodes_df["xmax"]) / 2, y="label_y", label="count"),
220+
ha="center",
221+
va="center",
222+
size=10,
223+
color="white",
224+
fontweight="bold",
225+
)
226+
+ scale_fill_manual(values=outcome_colors, name="Outcome", breaks=["Purchased", "Abandoned"])
227+
+ labs(title="parallel-categories-basic · plotnine · pyplots.ai", x="", y="")
228+
+ coord_cartesian(xlim=(0, 1), ylim=(-0.02, 1.02))
229+
+ theme_minimal()
230+
+ theme(
231+
figure_size=(16, 9),
232+
plot_title=element_text(size=24, ha="center", weight="bold"),
233+
axis_text=element_blank(),
234+
axis_ticks=element_blank(),
235+
panel_grid=element_blank(),
236+
legend_title=element_text(size=16, weight="bold"),
237+
legend_text=element_text(size=14),
238+
legend_position="right",
239+
)
240+
)
241+
242+
# Add dimension labels at top
243+
for dim_idx, dim in enumerate(dimensions):
244+
plot = plot + annotate(
245+
"text",
246+
x=x_positions[dim_idx],
247+
y=0.98,
248+
label=dim["label"],
249+
size=14,
250+
color="#333333",
251+
fontweight="bold",
252+
ha="center",
253+
)
254+
255+
# Add category labels beside each node (all dimensions)
256+
for (dim_idx, cat), pos in node_positions.items():
257+
label = str(cat)
258+
label_y = (pos["y_top"] + pos["y_bottom"]) / 2
259+
260+
# For first dimension, place label on left side of node
261+
if dim_idx == 0:
262+
plot = plot + annotate(
263+
"text",
264+
x=x_positions[dim_idx] - node_width / 2 - 0.01,
265+
y=label_y,
266+
label=label,
267+
size=10,
268+
color="#333333",
269+
ha="right",
270+
va="center",
271+
)
272+
# For last dimension, place label on right side of node
273+
elif dim_idx == n_dims - 1:
274+
plot = plot + annotate(
275+
"text",
276+
x=x_positions[dim_idx] + node_width / 2 + 0.01,
277+
y=label_y,
278+
label=label,
279+
size=10,
280+
color="#333333",
281+
ha="left",
282+
va="center",
283+
)
284+
# For middle dimensions, place label below the node
285+
else:
286+
plot = plot + annotate(
287+
"text",
288+
x=x_positions[dim_idx],
289+
y=pos["y_bottom"] - 0.015,
290+
label=label,
291+
size=9,
292+
color="#333333",
293+
ha="center",
294+
va="top",
295+
)
296+
297+
plot.save("plot.png", dpi=300, verbose=False)
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
library: plotnine
2+
specification_id: parallel-categories-basic
3+
created: '2025-12-30T21:54:08Z'
4+
updated: '2025-12-30T22:02:46Z'
5+
generated_by: claude-opus-4-5-20251101
6+
workflow_run: 20606634555
7+
issue: 0
8+
python_version: 3.13.11
9+
library_version: 0.15.2
10+
preview_url: https://storage.googleapis.com/pyplots-images/plots/parallel-categories-basic/plotnine/plot.png
11+
preview_thumb: https://storage.googleapis.com/pyplots-images/plots/parallel-categories-basic/plotnine/plot_thumb.png
12+
preview_html: null
13+
quality_score: 90
14+
review:
15+
strengths:
16+
- Creative implementation of parallel categories using plotnine basic geoms (geom_polygon,
17+
geom_rect)
18+
- Smooth cubic interpolation for ribbon curves creates professional appearance
19+
- Clear visual distinction between outcomes with yellow/blue color scheme
20+
- Effective use of transparency (alpha=0.5) for overlapping ribbons
21+
- Well-organized data structure with explicit path counts
22+
- Category labels positioned intelligently based on dimension position
23+
- Counts displayed inside nodes for quantitative reference
24+
weaknesses:
25+
- Middle dimension category labels positioned below nodes are slightly smaller (9pt)
26+
and could be harder to read
27+
- Some imports may be unused (coord_cartesian could be replaced with theme settings)

0 commit comments

Comments
 (0)