Skip to content

Commit 92a7b9b

Browse files
feat(plotnine): implement biplot-pca (#3478)
## Implementation: `biplot-pca` - plotnine Implements the **plotnine** version of `biplot-pca`. **File:** `plots/biplot-pca/implementations/plotnine.py` **Parent Issue:** #3417 --- :robot: *[impl-generate workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/20853039303)* --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent cc8996c commit 92a7b9b

2 files changed

Lines changed: 341 additions & 0 deletions

File tree

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
""" pyplots.ai
2+
biplot-pca: PCA Biplot with Scores and Loading Vectors
3+
Library: plotnine 0.15.2 | Python 3.13.11
4+
Quality: 91/100 | Created: 2026-01-09
5+
"""
6+
7+
import numpy as np
8+
import pandas as pd
9+
from plotnine import (
10+
aes,
11+
arrow,
12+
element_text,
13+
geom_path,
14+
geom_point,
15+
geom_segment,
16+
geom_text,
17+
ggplot,
18+
labs,
19+
scale_color_manual,
20+
theme,
21+
theme_minimal,
22+
)
23+
from sklearn.datasets import load_iris
24+
from sklearn.decomposition import PCA
25+
from sklearn.preprocessing import StandardScaler
26+
27+
28+
# Load Iris dataset
29+
iris = load_iris()
30+
X = iris.data
31+
y = iris.target
32+
feature_names = iris.feature_names
33+
species_names = [iris.target_names[i] for i in y]
34+
35+
# Standardize features
36+
scaler = StandardScaler()
37+
X_scaled = scaler.fit_transform(X)
38+
39+
# Perform PCA
40+
pca = PCA(n_components=2)
41+
scores = pca.fit_transform(X_scaled)
42+
loadings = pca.components_.T # Shape: (n_features, n_components)
43+
var_explained = pca.explained_variance_ratio_ * 100
44+
45+
# Create scores DataFrame
46+
df_scores = pd.DataFrame({"PC1": scores[:, 0], "PC2": scores[:, 1], "Species": species_names})
47+
48+
# Scale loadings to be visible alongside scores
49+
# Use a scaling factor based on the score range
50+
score_scale = np.max(np.abs(scores)) * 0.8
51+
loading_scale = np.max(np.abs(loadings))
52+
scale_factor = score_scale / loading_scale
53+
54+
# Create loadings DataFrame for arrows
55+
xend = loadings[:, 0] * scale_factor
56+
yend = loadings[:, 1] * scale_factor
57+
58+
df_loadings = pd.DataFrame(
59+
{
60+
"x": [0] * len(feature_names),
61+
"y": [0] * len(feature_names),
62+
"xend": xend,
63+
"yend": yend,
64+
"variable": [name.replace(" (cm)", "") for name in feature_names],
65+
}
66+
)
67+
68+
# Create label positions with smart offsets to avoid overlap
69+
label_offset = 0.25
70+
df_labels = pd.DataFrame(
71+
{
72+
"x": xend + np.sign(xend) * label_offset,
73+
"y": yend + np.sign(yend) * label_offset * 0.8,
74+
"variable": [name.replace(" (cm)", "") for name in feature_names],
75+
}
76+
)
77+
# Manually adjust overlapping labels (petal length and petal width)
78+
df_labels.loc[2, "y"] -= 0.15 # petal length - move down
79+
df_labels.loc[3, "y"] += 0.15 # petal width - move up
80+
81+
# Create unit circle reference (scaled)
82+
theta = np.linspace(0, 2 * np.pi, 100)
83+
df_circle = pd.DataFrame({"x": np.cos(theta) * scale_factor, "y": np.sin(theta) * scale_factor})
84+
85+
# Colors - Python Blue variants for species
86+
colors = ["#306998", "#FFD43B", "#E67E22"]
87+
88+
# Create biplot
89+
plot = (
90+
ggplot()
91+
# Unit circle reference
92+
+ geom_path(df_circle, aes(x="x", y="y"), color="gray", linetype="dashed", size=0.8, alpha=0.5)
93+
# Observation scores as points
94+
+ geom_point(df_scores, aes(x="PC1", y="PC2", color="Species"), size=4, alpha=0.7)
95+
# Loading arrows
96+
+ geom_segment(
97+
df_loadings,
98+
aes(x="x", y="y", xend="xend", yend="yend"),
99+
color="#1a1a1a",
100+
size=1.2,
101+
arrow=arrow(length=0.15, type="closed"),
102+
)
103+
# Loading labels (using separate df with pre-computed positions)
104+
+ geom_text(df_labels, aes(x="x", y="y", label="variable"), size=12, color="#1a1a1a", fontweight="bold")
105+
# Labels
106+
+ labs(
107+
x=f"PC1 ({var_explained[0]:.1f}%)",
108+
y=f"PC2 ({var_explained[1]:.1f}%)",
109+
title="biplot-pca · plotnine · pyplots.ai",
110+
color="Species",
111+
)
112+
# Theme
113+
+ theme_minimal()
114+
+ theme(
115+
figure_size=(16, 9),
116+
text=element_text(size=14),
117+
axis_title=element_text(size=20),
118+
axis_text=element_text(size=16),
119+
plot_title=element_text(size=24),
120+
legend_text=element_text(size=16),
121+
legend_title=element_text(size=18),
122+
)
123+
+ scale_color_manual(values=colors)
124+
)
125+
126+
# Save
127+
plot.save("plot.png", dpi=300, verbose=False)
Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
library: plotnine
2+
specification_id: biplot-pca
3+
created: '2026-01-09T13:20:35Z'
4+
updated: '2026-01-09T13:23:55Z'
5+
generated_by: claude-opus-4-5-20251101
6+
workflow_run: 20853039303
7+
issue: 3417
8+
python_version: 3.13.11
9+
library_version: 0.15.2
10+
preview_url: https://storage.googleapis.com/pyplots-images/plots/biplot-pca/plotnine/plot.png
11+
preview_thumb: https://storage.googleapis.com/pyplots-images/plots/biplot-pca/plotnine/plot_thumb.png
12+
preview_html: null
13+
quality_score: 91
14+
review:
15+
strengths:
16+
- 'Excellent implementation of all biplot requirements: scores, loadings with arrows,
17+
variable labels, and unit circle reference'
18+
- Smart label positioning with manual adjustments to reduce overlap between petal
19+
length/width labels
20+
- Appropriate color palette that is colorblind-safe
21+
- Well-scaled loadings that are visible alongside score points
22+
- Proper axis labels with variance explained percentages
23+
- Clean ggplot2-style layered composition using plotnine grammar of graphics
24+
weaknesses:
25+
- Slight overlap between petal length and petal width labels despite manual adjustment
26+
attempt
27+
- No random seed set (though not strictly needed since Iris is deterministic data)
28+
- Could use more distinctive plotnine features like scale_color_brewer instead of
29+
manual colors
30+
image_description: 'The plot displays a PCA biplot using the Iris dataset with 150
31+
observations (50 per species). The three Iris species (setosa, versicolor, virginica)
32+
are shown as colored scatter points: blue for setosa clustered on the left, golden-yellow
33+
for versicolor in the middle, and orange for virginica on the right. Four black
34+
arrows originate from the origin representing variable loadings: "sepal width"
35+
points upward-left, "sepal length" points to the upper-right, "petal width" and
36+
"petal length" point to the right (nearly overlapping). A dashed gray unit circle
37+
provides reference for loading magnitudes. Axis labels show "PC1 (73.0%)" and
38+
"PC2 (22.9%)" with variance explained percentages. The title correctly follows
39+
the format "biplot-pca · plotnine · pyplots.ai". The plot uses a 16:9 landscape
40+
format with a minimal theme and subtle gray grid.'
41+
criteria_checklist:
42+
visual_quality:
43+
score: 37
44+
max: 40
45+
items:
46+
- id: VQ-01
47+
name: Text Legibility
48+
score: 10
49+
max: 10
50+
passed: true
51+
comment: Title at 24pt, axis labels at 20pt, tick labels at 16pt, all perfectly
52+
readable
53+
- id: VQ-02
54+
name: No Overlap
55+
score: 7
56+
max: 8
57+
passed: true
58+
comment: Minor overlap between petal length and petal width labels due to
59+
similar loading directions; still readable
60+
- id: VQ-03
61+
name: Element Visibility
62+
score: 8
63+
max: 8
64+
passed: true
65+
comment: Point size 4 with alpha 0.7 is well-suited for 150 data points
66+
- id: VQ-04
67+
name: Color Accessibility
68+
score: 5
69+
max: 5
70+
passed: true
71+
comment: Blue, yellow, orange palette is colorblind-safe
72+
- id: VQ-05
73+
name: Layout Balance
74+
score: 4
75+
max: 5
76+
passed: true
77+
comment: Good proportions, unit circle extends close to edges
78+
- id: VQ-06
79+
name: Axis Labels
80+
score: 2
81+
max: 2
82+
passed: true
83+
comment: Descriptive with variance explained percentages
84+
- id: VQ-07
85+
name: Grid & Legend
86+
score: 1
87+
max: 2
88+
passed: true
89+
comment: Grid subtle, legend well-placed; grid could be slightly less prominent
90+
spec_compliance:
91+
score: 25
92+
max: 25
93+
items:
94+
- id: SC-01
95+
name: Plot Type
96+
score: 8
97+
max: 8
98+
passed: true
99+
comment: Correct biplot with scores as points and loadings as arrows
100+
- id: SC-02
101+
name: Data Mapping
102+
score: 5
103+
max: 5
104+
passed: true
105+
comment: PC1 on X-axis, PC2 on Y-axis correctly assigned
106+
- id: SC-03
107+
name: Required Features
108+
score: 5
109+
max: 5
110+
passed: true
111+
comment: 'All spec features present: scores, loadings, labels, unit circle,
112+
variance explained'
113+
- id: SC-04
114+
name: Data Range
115+
score: 3
116+
max: 3
117+
passed: true
118+
comment: Axes show all data points and loading arrows
119+
- id: SC-05
120+
name: Legend Accuracy
121+
score: 2
122+
max: 2
123+
passed: true
124+
comment: Species legend labels are correct
125+
- id: SC-06
126+
name: Title Format
127+
score: 2
128+
max: 2
129+
passed: true
130+
comment: Uses exact format biplot-pca · plotnine · pyplots.ai
131+
data_quality:
132+
score: 19
133+
max: 20
134+
items:
135+
- id: DQ-01
136+
name: Feature Coverage
137+
score: 8
138+
max: 8
139+
passed: true
140+
comment: Shows clear species separation, variable loadings with different
141+
directions and magnitudes
142+
- id: DQ-02
143+
name: Realistic Context
144+
score: 7
145+
max: 7
146+
passed: true
147+
comment: Iris dataset is a classic, neutral, real-world example for PCA
148+
- id: DQ-03
149+
name: Appropriate Scale
150+
score: 4
151+
max: 5
152+
passed: true
153+
comment: Scaled loadings visible alongside scores; loading scale factor could
154+
be slightly larger
155+
code_quality:
156+
score: 8
157+
max: 10
158+
items:
159+
- id: CQ-01
160+
name: KISS Structure
161+
score: 3
162+
max: 3
163+
passed: true
164+
comment: 'Linear flow: imports, data loading, PCA, dataframes, plot, save'
165+
- id: CQ-02
166+
name: Reproducibility
167+
score: 1
168+
max: 3
169+
passed: false
170+
comment: Uses load_iris() which is deterministic, but no explicit np.random.seed()
171+
- id: CQ-03
172+
name: Clean Imports
173+
score: 2
174+
max: 2
175+
passed: true
176+
comment: All imports are used
177+
- id: CQ-04
178+
name: No Deprecated API
179+
score: 1
180+
max: 1
181+
passed: true
182+
comment: Modern plotnine and sklearn APIs
183+
- id: CQ-05
184+
name: Output Correct
185+
score: 1
186+
max: 1
187+
passed: true
188+
comment: Saves as plot.png with dpi=300
189+
library_features:
190+
score: 2
191+
max: 5
192+
items:
193+
- id: LF-01
194+
name: Distinctive Features
195+
score: 2
196+
max: 5
197+
passed: false
198+
comment: Uses layered composition but no advanced plotnine-specific features
199+
like faceting or statistical transformations
200+
verdict: APPROVED
201+
impl_tags:
202+
dependencies:
203+
- sklearn
204+
techniques:
205+
- annotations
206+
- layer-composition
207+
patterns:
208+
- dataset-loading
209+
dataprep:
210+
- pca
211+
- normalization
212+
styling:
213+
- alpha-blending
214+
- grid-styling

0 commit comments

Comments
 (0)