Skip to content

Commit 0fddf45

Browse files
feat(pygal): implement biplot-pca (#3474)
## Implementation: `biplot-pca` - pygal Implements the **pygal** version of `biplot-pca`. **File:** `plots/biplot-pca/implementations/pygal.py` **Parent Issue:** #3417 --- :robot: *[impl-generate workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/20853044084)* --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent 7f8acd8 commit 0fddf45

2 files changed

Lines changed: 342 additions & 0 deletions

File tree

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
""" pyplots.ai
2+
biplot-pca: PCA Biplot with Scores and Loading Vectors
3+
Library: pygal 3.1.0 | Python 3.13.11
4+
Quality: 82/100 | Created: 2026-01-09
5+
"""
6+
7+
import math
8+
9+
import pygal
10+
from pygal.style import Style
11+
from sklearn.datasets import load_iris
12+
from sklearn.decomposition import PCA
13+
from sklearn.preprocessing import StandardScaler
14+
15+
16+
# Load and prepare data
17+
iris = load_iris()
18+
X = iris.data
19+
y = iris.target
20+
feature_names = iris.feature_names
21+
target_names = iris.target_names
22+
23+
# Standardize features
24+
X_scaled = StandardScaler().fit_transform(X)
25+
26+
# Perform PCA
27+
pca = PCA(n_components=2)
28+
scores = pca.fit_transform(X_scaled)
29+
loadings = pca.components_.T # Shape: (n_features, n_components)
30+
variance_explained = pca.explained_variance_ratio_ * 100
31+
32+
# Use correlation biplot scaling
33+
score_range = max(scores[:, 0].max() - scores[:, 0].min(), scores[:, 1].max() - scores[:, 1].min())
34+
loading_scale = score_range * 0.4
35+
unit_circle_radius = loading_scale
36+
37+
# Custom style for 4800x2700 px canvas
38+
species_colors = ["#306998", "#FFD43B", "#2ECC71"] # Blue, Yellow, Green for species
39+
unit_circle_color = "#AAAAAA" # Gray for unit circle reference
40+
# Distinct colors for each loading vector (colorblind-friendly)
41+
loading_colors = ["#E41A1C", "#984EA3", "#FF7F00", "#377EB8"] # Red, Purple, Orange, Blue
42+
43+
custom_style = Style(
44+
background="white",
45+
plot_background="white",
46+
foreground="#333333",
47+
foreground_strong="#333333",
48+
foreground_subtle="#666666",
49+
colors=tuple(species_colors + [unit_circle_color] + loading_colors),
50+
title_font_size=72,
51+
label_font_size=48,
52+
major_label_font_size=42,
53+
legend_font_size=42,
54+
tooltip_font_size=36,
55+
stroke_width=4,
56+
opacity=0.85,
57+
opacity_hover=0.95,
58+
)
59+
60+
# Create XY chart for biplot
61+
chart = pygal.XY(
62+
width=4800,
63+
height=2700,
64+
style=custom_style,
65+
title="biplot-pca · pygal · pyplots.ai",
66+
x_title=f"PC1 ({variance_explained[0]:.1f}%)",
67+
y_title=f"PC2 ({variance_explained[1]:.1f}%)",
68+
show_legend=True,
69+
legend_at_bottom=True,
70+
legend_box_size=36,
71+
dots_size=14,
72+
stroke=False,
73+
show_x_guides=True,
74+
show_y_guides=True,
75+
truncate_legend=50,
76+
explicit_size=True,
77+
)
78+
79+
# Add score points for each species (class)
80+
for i, name in enumerate(target_names):
81+
mask = y == i
82+
points = [(float(scores[j, 0]), float(scores[j, 1])) for j in range(len(y)) if mask[j]]
83+
chart.add(name.capitalize(), points, stroke=False, dots_size=14)
84+
85+
# Create unit circle for loading magnitude reference
86+
circle_points = []
87+
for angle in range(0, 361, 3):
88+
rad = math.radians(angle)
89+
circle_points.append((unit_circle_radius * math.cos(rad), unit_circle_radius * math.sin(rad)))
90+
chart.add("Unit Circle", circle_points, stroke=True, show_dots=False, stroke_style={"width": 3, "dasharray": "10,8"})
91+
92+
# Full feature names for legend clarity
93+
full_names = ["Sepal Length", "Sepal Width", "Petal Length", "Petal Width"]
94+
95+
# Add each loading vector as line arrow with arrowhead
96+
for i, full_name in enumerate(full_names):
97+
tip_x = float(loadings[i, 0] * loading_scale)
98+
tip_y = float(loadings[i, 1] * loading_scale)
99+
100+
# Calculate arrow geometry for line-style arrow
101+
dx, dy = tip_x, tip_y
102+
length = math.sqrt(dx * dx + dy * dy)
103+
ux = dx / length if length > 0 else 0
104+
uy = dy / length if length > 0 else 0
105+
px, py = -uy, ux # Perpendicular vector
106+
107+
# Smaller arrowhead for cleaner line appearance
108+
head_len = 0.08 * loading_scale
109+
head_wid = 0.04 * loading_scale
110+
hb_x = tip_x - ux * head_len
111+
hb_y = tip_y - uy * head_len
112+
113+
# Line arrow: origin to tip with small arrowhead triangle
114+
arrow_line = [
115+
(0.0, 0.0),
116+
(tip_x, tip_y),
117+
(hb_x + px * head_wid, hb_y + py * head_wid),
118+
(tip_x, tip_y),
119+
(hb_x - px * head_wid, hb_y - py * head_wid),
120+
(tip_x, tip_y),
121+
]
122+
chart.add(full_name, arrow_line, stroke=True, show_dots=False, fill=False, stroke_style={"width": 5})
123+
124+
# Save PNG output only
125+
chart.render_to_png("plot.png")
Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
library: pygal
2+
specification_id: biplot-pca
3+
created: '2026-01-09T13:18:36Z'
4+
updated: '2026-01-09T13:44:46Z'
5+
generated_by: claude-opus-4-5-20251101
6+
workflow_run: 20853044084
7+
issue: 3417
8+
python_version: 3.13.11
9+
library_version: 3.1.0
10+
preview_url: https://storage.googleapis.com/pyplots-images/plots/biplot-pca/pygal/plot.png
11+
preview_thumb: https://storage.googleapis.com/pyplots-images/plots/biplot-pca/pygal/plot_thumb.png
12+
preview_html: https://storage.googleapis.com/pyplots-images/plots/biplot-pca/pygal/plot.html
13+
quality_score: 82
14+
review:
15+
strengths:
16+
- Excellent use of the Iris dataset providing realistic, interpretable context
17+
- 'Correct variance explained percentages on axis labels (PC1: 73.0%, PC2: 22.9%)'
18+
- Colorblind-friendly palette distinguishes species and loading vectors effectively
19+
- Unit circle provides appropriate reference for loading magnitudes
20+
- Good canvas utilization and overall layout
21+
weaknesses:
22+
- Loading vectors rendered as thick filled shapes that obscure underlying data points
23+
- should be thin line arrows
24+
- Legend labels are truncated (Sepal L., Petal W.) despite full names defined in
25+
code
26+
- Arrow rendering creates visual clutter near the origin where all four vectors
27+
originate
28+
- No explicit random seed (though dataset is deterministic)
29+
image_description: 'The plot displays a PCA biplot of the Iris dataset on a 4800x2700
30+
white canvas. Three species are shown as scatter points: Setosa (blue, clustered
31+
on the left), Versicolor (yellow, in the middle), and Virginica (green, on the
32+
right). A dashed red/pink unit circle is centered at the origin for loading magnitude
33+
reference. Four loading vectors are rendered as large filled arrow shapes originating
34+
from the origin: Sepal Width (purple, pointing upward), Sepal Length (brown/red,
35+
pointing upper-right), Petal Length (orange, pointing right), and Petal Width
36+
(teal, pointing right). The axis labels correctly show "PC1 (73.0%)" and "PC2
37+
(22.9%)". The title "biplot-pca · pygal · pyplots.ai" appears at the top. A legend
38+
appears in the top-left corner showing all series with abbreviated labels.'
39+
criteria_checklist:
40+
visual_quality:
41+
score: 32
42+
max: 40
43+
items:
44+
- id: VQ-01
45+
name: Text Legibility
46+
score: 9
47+
max: 10
48+
passed: true
49+
comment: Title and axis labels are clearly readable at full size; tick labels
50+
are appropriately sized
51+
- id: VQ-02
52+
name: No Overlap
53+
score: 6
54+
max: 8
55+
passed: true
56+
comment: The thick filled arrow shapes overlap with each other near the origin
57+
and partially obscure data points
58+
- id: VQ-03
59+
name: Element Visibility
60+
score: 6
61+
max: 8
62+
passed: true
63+
comment: Scatter points are visible but the loading arrows are excessively
64+
thick and filled, obscuring some data
65+
- id: VQ-04
66+
name: Color Accessibility
67+
score: 5
68+
max: 5
69+
passed: true
70+
comment: Good colorblind-safe palette with blue, yellow, green for species
71+
and distinct colors for loadings
72+
- id: VQ-05
73+
name: Layout Balance
74+
score: 4
75+
max: 5
76+
passed: true
77+
comment: Good use of canvas space; legend positioning is reasonable but slightly
78+
crowds the top-left
79+
- id: VQ-06
80+
name: Axis Labels
81+
score: 2
82+
max: 2
83+
passed: true
84+
comment: 'Descriptive labels with variance percentages: PC1 (73.0%) and PC2
85+
(22.9%)'
86+
- id: VQ-07
87+
name: Grid & Legend
88+
score: 0
89+
max: 2
90+
passed: false
91+
comment: Legend uses abbreviated labels (Sepal L.) instead of full names;
92+
too many legend entries
93+
spec_compliance:
94+
score: 23
95+
max: 25
96+
items:
97+
- id: SC-01
98+
name: Plot Type
99+
score: 8
100+
max: 8
101+
passed: true
102+
comment: Correct PCA biplot showing scores and loadings
103+
- id: SC-02
104+
name: Data Mapping
105+
score: 5
106+
max: 5
107+
passed: true
108+
comment: PC1 on X-axis, PC2 on Y-axis, correct mapping
109+
- id: SC-03
110+
name: Required Features
111+
score: 4
112+
max: 5
113+
passed: true
114+
comment: Has scores, loadings, unit circle, variance labels; loading labels
115+
are truncated
116+
- id: SC-04
117+
name: Data Range
118+
score: 3
119+
max: 3
120+
passed: true
121+
comment: All data points visible within axes
122+
- id: SC-05
123+
name: Legend Accuracy
124+
score: 2
125+
max: 2
126+
passed: true
127+
comment: Legend entries correctly identify species and loading vectors
128+
- id: SC-06
129+
name: Title Format
130+
score: 1
131+
max: 2
132+
passed: true
133+
comment: Title format is correct but uses hyphen instead of middle dot separator
134+
data_quality:
135+
score: 18
136+
max: 20
137+
items:
138+
- id: DQ-01
139+
name: Feature Coverage
140+
score: 7
141+
max: 8
142+
passed: true
143+
comment: Shows clear separation of species along PC1; all loading directions
144+
visible
145+
- id: DQ-02
146+
name: Realistic Context
147+
score: 7
148+
max: 7
149+
passed: true
150+
comment: Uses the classic Iris dataset, a real and widely understood scenario
151+
- id: DQ-03
152+
name: Appropriate Scale
153+
score: 4
154+
max: 5
155+
passed: true
156+
comment: Standardized PCA scores with appropriate scaling; unit circle provides
157+
good reference
158+
code_quality:
159+
score: 6
160+
max: 10
161+
items:
162+
- id: CQ-01
163+
name: KISS Structure
164+
score: 3
165+
max: 3
166+
passed: true
167+
comment: 'Linear flow: imports, data, PCA, chart setup, rendering'
168+
- id: CQ-02
169+
name: Reproducibility
170+
score: 0
171+
max: 3
172+
passed: false
173+
comment: No random seed set; uses deterministic dataset but no explicit seed
174+
- id: CQ-03
175+
name: Clean Imports
176+
score: 2
177+
max: 2
178+
passed: true
179+
comment: All imports are used
180+
- id: CQ-04
181+
name: No Deprecated API
182+
score: 1
183+
max: 1
184+
passed: true
185+
comment: Current API usage
186+
- id: CQ-05
187+
name: Output Correct
188+
score: 0
189+
max: 1
190+
passed: false
191+
comment: Saves as plot.png correctly
192+
library_features:
193+
score: 3
194+
max: 5
195+
items:
196+
- id: LF-01
197+
name: Distinctive Features
198+
score: 3
199+
max: 5
200+
passed: true
201+
comment: Uses XY chart, custom Style, dashed stroke for unit circle, legend_at_bottom
202+
- demonstrates pygal capabilities but arrows are not rendered elegantly
203+
verdict: APPROVED
204+
impl_tags:
205+
dependencies:
206+
- sklearn
207+
techniques:
208+
- annotations
209+
- layer-composition
210+
patterns:
211+
- dataset-loading
212+
- iteration-over-groups
213+
dataprep:
214+
- pca
215+
- normalization
216+
styling:
217+
- alpha-blending

0 commit comments

Comments
 (0)