Skip to content

Commit 6f652c0

Browse files
feat(plotnine): implement ma-differential-expression (#5076)
## Implementation: `ma-differential-expression` - plotnine Implements the **plotnine** version of `ma-differential-expression`. **File:** `plots/ma-differential-expression/implementations/plotnine.py` **Parent Issue:** #4420 --- :robot: *[impl-generate workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/23339077857)* --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent b4e8d31 commit 6f652c0

2 files changed

Lines changed: 387 additions & 0 deletions

File tree

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
""" pyplots.ai
2+
ma-differential-expression: MA Plot for Differential Expression
3+
Library: plotnine 0.15.3 | Python 3.14.3
4+
Quality: 90/100 | Created: 2026-03-20
5+
"""
6+
7+
import numpy as np
8+
import pandas as pd
9+
from plotnine import (
10+
aes,
11+
annotate,
12+
element_blank,
13+
element_line,
14+
element_rect,
15+
element_text,
16+
geom_hline,
17+
geom_point,
18+
geom_text,
19+
ggplot,
20+
guide_legend,
21+
guides,
22+
labs,
23+
scale_alpha_manual,
24+
scale_color_manual,
25+
scale_size_manual,
26+
stat_smooth,
27+
theme,
28+
theme_minimal,
29+
)
30+
31+
32+
# Data
33+
np.random.seed(42)
34+
n_genes = 15000
35+
36+
mean_expression = np.random.uniform(0, 15, n_genes)
37+
log_fold_change = np.random.normal(0, 0.5, n_genes)
38+
log_fold_change += 0.15 * np.sin(mean_expression * 0.3)
39+
40+
n_sig = int(n_genes * 0.08)
41+
sig_indices = np.random.choice(n_genes, n_sig, replace=False)
42+
log_fold_change[sig_indices] *= np.random.uniform(2.5, 5.0, n_sig)
43+
44+
significant = np.abs(log_fold_change) > 1.0
45+
significant[sig_indices[: n_sig // 2]] = True
46+
47+
# Categorize into up/down/not significant for richer storytelling
48+
category = np.where(~significant, "Not significant", np.where(log_fold_change > 0, "Upregulated", "Downregulated"))
49+
50+
gene_names = [f"Gene{i}" for i in range(n_genes)]
51+
top_genes = ["BRCA1", "TP53", "MYC", "EGFR", "KRAS", "PTEN", "RB1", "APC"]
52+
top_idx = np.argsort(np.abs(log_fold_change))[-len(top_genes) :]
53+
for i, idx in enumerate(top_idx):
54+
gene_names[idx] = top_genes[i]
55+
56+
df = pd.DataFrame(
57+
{
58+
"mean_expression": mean_expression,
59+
"log_fold_change": log_fold_change,
60+
"significant": significant,
61+
"gene_name": gene_names,
62+
"category": pd.Categorical(
63+
category, categories=["Downregulated", "Not significant", "Upregulated"], ordered=True
64+
),
65+
}
66+
)
67+
68+
df_labels = df.loc[top_idx].copy()
69+
# Position labels offset from data points to avoid overlap
70+
nudge = np.where(df_labels["log_fold_change"] > 0, 0.8, -0.8)
71+
72+
# Detect close labels within same direction and stagger them
73+
for direction in [1, -1]:
74+
mask = (nudge * direction) > 0
75+
subset = df_labels[mask].sort_values("mean_expression")
76+
for j in range(1, len(subset)):
77+
prev_x = subset.iloc[j - 1]["mean_expression"]
78+
curr_x = subset.iloc[j]["mean_expression"]
79+
if abs(curr_x - prev_x) < 2.0:
80+
idx_curr = subset.index[j]
81+
nudge[df_labels.index.get_loc(idx_curr)] *= 2.0
82+
83+
df_labels["label_y"] = df_labels["log_fold_change"] + nudge
84+
85+
# Split labels by direction for separate geom_text layers
86+
df_labels_up = df_labels[df_labels["log_fold_change"] > 0].copy()
87+
df_labels_down = df_labels[df_labels["log_fold_change"] < 0].copy()
88+
89+
# Plot
90+
plot = (
91+
ggplot(df, aes(x="mean_expression", y="log_fold_change", color="category"))
92+
+ geom_point(aes(alpha="category", size="category"), stroke=0)
93+
+ geom_hline(yintercept=0, color="#2C3E50", size=1.0, alpha=0.8)
94+
+ geom_hline(yintercept=1, linetype="dashed", color="#95A5A6", size=0.6)
95+
+ geom_hline(yintercept=-1, linetype="dashed", color="#95A5A6", size=0.6)
96+
+ annotate(
97+
"label",
98+
x=14.5,
99+
y=1.0,
100+
label=" ±2-fold threshold ",
101+
size=11,
102+
color="#555555",
103+
fill="#FAFAFA",
104+
alpha=0.9,
105+
label_size=0,
106+
ha="right",
107+
va="center",
108+
)
109+
+ stat_smooth(aes(group=1), method="lowess", color="#306998", size=1.4, se=False, span=0.3, linetype="solid")
110+
+ geom_text(
111+
aes(x="mean_expression", y="label_y", label="gene_name"),
112+
data=df_labels_up,
113+
color="#1A1A2E",
114+
size=14,
115+
fontstyle="italic",
116+
alpha=1,
117+
inherit_aes=False,
118+
show_legend=False,
119+
)
120+
+ geom_text(
121+
aes(x="mean_expression", y="label_y", label="gene_name"),
122+
data=df_labels_down,
123+
color="#1A1A2E",
124+
size=14,
125+
fontstyle="italic",
126+
alpha=1,
127+
inherit_aes=False,
128+
show_legend=False,
129+
)
130+
+ scale_color_manual(values={"Upregulated": "#D45E00", "Not significant": "#C0C0C0", "Downregulated": "#306998"})
131+
+ scale_alpha_manual(values={"Upregulated": 0.8, "Not significant": 0.15, "Downregulated": 0.8})
132+
+ scale_size_manual(values={"Upregulated": 2.0, "Not significant": 1.0, "Downregulated": 2.0})
133+
+ labs(
134+
x="Mean Expression (A)",
135+
y="Log₂ Fold Change (M)",
136+
title="ma-differential-expression · plotnine · pyplots.ai",
137+
color="",
138+
)
139+
+ guides(color=guide_legend(override_aes={"alpha": 1, "size": 4}), alpha="none", size="none")
140+
+ theme_minimal()
141+
+ theme(
142+
figure_size=(16, 9),
143+
plot_title=element_text(size=24, weight="bold", color="#1A1A2E"),
144+
plot_subtitle=element_text(size=16, color="#555555"),
145+
axis_title=element_text(size=20, color="#2C3E50"),
146+
axis_text=element_text(size=16, color="#555555"),
147+
legend_text=element_text(size=16),
148+
legend_title=element_blank(),
149+
legend_position="top",
150+
legend_background=element_rect(fill="white", alpha=0.8),
151+
panel_grid_major_x=element_blank(),
152+
panel_grid_major_y=element_line(color="#ECECEC", size=0.3),
153+
panel_grid_minor=element_blank(),
154+
plot_background=element_rect(fill="#FAFAFA", color="none"),
155+
panel_background=element_rect(fill="#FAFAFA", color="none"),
156+
)
157+
)
158+
159+
# Save
160+
plot.save("plot.png", dpi=300, verbose=False)
Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
library: plotnine
2+
specification_id: ma-differential-expression
3+
created: '2026-03-20T10:38:20Z'
4+
updated: '2026-03-20T10:59:43Z'
5+
generated_by: claude-opus-4-5-20251101
6+
workflow_run: 23339077857
7+
issue: 4420
8+
python_version: 3.14.3
9+
library_version: 0.15.3
10+
preview_url: https://storage.googleapis.com/pyplots-images/plots/ma-differential-expression/plotnine/plot.png
11+
preview_thumb: https://storage.googleapis.com/pyplots-images/plots/ma-differential-expression/plotnine/plot_thumb.png
12+
preview_html: null
13+
quality_score: 90
14+
review:
15+
strengths:
16+
- Excellent data storytelling through three-category color/size/alpha mapping that
17+
immediately conveys up/down regulation
18+
- All spec requirements implemented comprehensively (reference lines, LOESS, gene
19+
labels, transparency)
20+
- Realistic genomics context with real gene names and appropriate data scales
21+
- Clean idiomatic plotnine code leveraging grammar of graphics layering
22+
weaknesses:
23+
- Label nudge logic is verbose and could be simplified
24+
- The ±2-fold threshold annotation text (size=11) is slightly small relative to
25+
other text elements
26+
image_description: 'The plot displays an MA plot for differential gene expression
27+
with Mean Expression (A) on the x-axis (range 0-15) and Log2 Fold Change (M) on
28+
the y-axis (range approximately -7 to +7). Three categories are shown via a top
29+
legend: Downregulated (blue), Not significant (light gray, faded), and Upregulated
30+
(orange). A solid dark horizontal reference line sits at M=0, with dashed gray
31+
lines at M=+/-1 labeled "±2-fold threshold" on the right. A dark blue LOESS smoothing
32+
curve traces near the zero line showing minimal systematic bias. Eight top differentially
33+
expressed genes are labeled in italic text (APC, BRCA1, RB1, TP53, MYC, KRAS,
34+
EGFR, PTEN) with labels nudged away from their data points. The background is
35+
light gray (#FAFAFA) with subtle horizontal-only gridlines. Significant genes
36+
(orange/blue) are larger and more opaque, while non-significant genes are small
37+
and nearly transparent.'
38+
criteria_checklist:
39+
visual_quality:
40+
score: 28
41+
max: 30
42+
items:
43+
- id: VQ-01
44+
name: Text Legibility
45+
score: 8
46+
max: 8
47+
passed: true
48+
comment: 'All font sizes explicitly set: title 24pt, axis titles 20pt, axis/tick
49+
text 16pt, legend text 16pt'
50+
- id: VQ-02
51+
name: No Overlap
52+
score: 5
53+
max: 6
54+
passed: true
55+
comment: Gene labels use nudge logic; threshold annotation slightly small
56+
and at canvas edge
57+
- id: VQ-03
58+
name: Element Visibility
59+
score: 5
60+
max: 6
61+
passed: true
62+
comment: 15k points handled well with alpha=0.15 for non-significant and alpha=0.8
63+
+ larger size for significant
64+
- id: VQ-04
65+
name: Color Accessibility
66+
score: 4
67+
max: 4
68+
passed: true
69+
comment: Orange and blue are colorblind-safe with gray non-significant providing
70+
good contrast
71+
- id: VQ-05
72+
name: Layout & Canvas
73+
score: 4
74+
max: 4
75+
passed: true
76+
comment: 16:9 aspect ratio with good canvas utilization and balanced margins
77+
- id: VQ-06
78+
name: Axis Labels & Title
79+
score: 2
80+
max: 2
81+
passed: true
82+
comment: 'Descriptive labels with parenthetical notation: Mean Expression
83+
(A), Log2 Fold Change (M)'
84+
design_excellence:
85+
score: 15
86+
max: 20
87+
items:
88+
- id: DE-01
89+
name: Aesthetic Sophistication
90+
score: 6
91+
max: 8
92+
passed: true
93+
comment: Thoughtful warm/cool color scheme, intentional visual hierarchy through
94+
alpha/size mapping, custom background
95+
- id: DE-02
96+
name: Visual Refinement
97+
score: 4
98+
max: 6
99+
passed: true
100+
comment: X-grid removed, subtle y-grid, minor grid removed, custom background,
101+
legend refined
102+
- id: DE-03
103+
name: Data Storytelling
104+
score: 5
105+
max: 6
106+
passed: true
107+
comment: Strong visual hierarchy with three-category scheme, gene labels identify
108+
key players, LOESS adds analytical depth
109+
spec_compliance:
110+
score: 15
111+
max: 15
112+
items:
113+
- id: SC-01
114+
name: Plot Type
115+
score: 5
116+
max: 5
117+
passed: true
118+
comment: Correct MA plot type
119+
- id: SC-02
120+
name: Required Features
121+
score: 4
122+
max: 4
123+
passed: true
124+
comment: 'All spec features present: highlighting, reference lines, LOESS,
125+
transparency, gene labels'
126+
- id: SC-03
127+
name: Data Mapping
128+
score: 3
129+
max: 3
130+
passed: true
131+
comment: X=mean_expression, Y=log_fold_change correctly mapped
132+
- id: SC-04
133+
name: Title & Legend
134+
score: 3
135+
max: 3
136+
passed: true
137+
comment: Correct title format and clear legend labels
138+
data_quality:
139+
score: 15
140+
max: 15
141+
items:
142+
- id: DQ-01
143+
name: Feature Coverage
144+
score: 6
145+
max: 6
146+
passed: true
147+
comment: Shows upregulated, downregulated, non-significant, labeled outlier
148+
genes, full expression range
149+
- id: DQ-02
150+
name: Realistic Context
151+
score: 5
152+
max: 5
153+
passed: true
154+
comment: Real gene names (BRCA1, TP53, MYC, etc.), realistic RNA-seq genomics
155+
context
156+
- id: DQ-03
157+
name: Appropriate Scale
158+
score: 4
159+
max: 4
160+
passed: true
161+
comment: 15,000 genes realistic for whole-transcriptome, appropriate expression
162+
and fold-change ranges
163+
code_quality:
164+
score: 9
165+
max: 10
166+
items:
167+
- id: CQ-01
168+
name: KISS Structure
169+
score: 3
170+
max: 3
171+
passed: true
172+
comment: Clean Imports-Data-Plot-Save structure, no functions or classes
173+
- id: CQ-02
174+
name: Reproducibility
175+
score: 2
176+
max: 2
177+
passed: true
178+
comment: np.random.seed(42) set
179+
- id: CQ-03
180+
name: Clean Imports
181+
score: 2
182+
max: 2
183+
passed: true
184+
comment: All imports used
185+
- id: CQ-04
186+
name: Code Elegance
187+
score: 1
188+
max: 2
189+
passed: false
190+
comment: Label nudge/stagger logic is verbose (~18 lines of manual iteration)
191+
- id: CQ-05
192+
name: Output & API
193+
score: 1
194+
max: 1
195+
passed: true
196+
comment: Saves as plot.png, current plotnine API
197+
library_mastery:
198+
score: 8
199+
max: 10
200+
items:
201+
- id: LM-01
202+
name: Idiomatic Usage
203+
score: 5
204+
max: 5
205+
passed: true
206+
comment: 'Expert grammar of graphics usage: ggplot+aes, layered geoms, scale_*_manual,
207+
guides, theme composition'
208+
- id: LM-02
209+
name: Distinctive Features
210+
score: 3
211+
max: 5
212+
passed: true
213+
comment: Uses stat_smooth (LOESS), scale_alpha_manual/scale_size_manual, guide_legend
214+
with override_aes
215+
verdict: APPROVED
216+
impl_tags:
217+
dependencies: []
218+
techniques:
219+
- annotations
220+
- layer-composition
221+
- custom-legend
222+
patterns:
223+
- data-generation
224+
dataprep: []
225+
styling:
226+
- alpha-blending
227+
- grid-styling

0 commit comments

Comments
 (0)