Skip to content

Commit 7f8acd8

Browse files
feat(letsplot): implement biplot-pca (#3480)
## Implementation: `biplot-pca` - letsplot Implements the **letsplot** version of `biplot-pca`. **File:** `plots/biplot-pca/implementations/letsplot.py` **Parent Issue:** #3417 --- :robot: *[impl-generate workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/20853044167)* --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent 92a7b9b commit 7f8acd8

2 files changed

Lines changed: 349 additions & 0 deletions

File tree

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
""" pyplots.ai
2+
biplot-pca: PCA Biplot with Scores and Loading Vectors
3+
Library: letsplot 4.8.2 | Python 3.13.11
4+
Quality: 91/100 | Created: 2026-01-09
5+
"""
6+
7+
import numpy as np
8+
import pandas as pd
9+
from lets_plot import * # noqa: F403
10+
from lets_plot.export import ggsave as export_ggsave
11+
from sklearn.datasets import load_iris
12+
from sklearn.decomposition import PCA
13+
from sklearn.preprocessing import StandardScaler
14+
15+
16+
LetsPlot.setup_html() # noqa: F405
17+
18+
# Load Iris dataset
19+
iris = load_iris()
20+
X = iris.data
21+
y = iris.target
22+
feature_names = iris.feature_names
23+
target_names = iris.target_names
24+
25+
# Standardize features
26+
scaler = StandardScaler()
27+
X_scaled = scaler.fit_transform(X)
28+
29+
# Perform PCA
30+
pca = PCA(n_components=2)
31+
scores = pca.fit_transform(X_scaled)
32+
loadings = pca.components_.T # Variables x Components
33+
34+
# Variance explained
35+
var_explained = pca.explained_variance_ratio_ * 100
36+
37+
# Create dataframe for scores
38+
scores_df = pd.DataFrame({"PC1": scores[:, 0], "PC2": scores[:, 1], "Species": [target_names[i] for i in y]})
39+
40+
# Scale loadings for visibility alongside scores
41+
score_range = max(np.abs(scores).max(), 1)
42+
loading_scale = score_range * 1.5
43+
44+
# Create dataframe for loading arrows
45+
clean_names = ["Sepal Length", "Sepal Width", "Petal Length", "Petal Width"]
46+
loadings_df = pd.DataFrame(
47+
{
48+
"x_start": [0] * len(feature_names),
49+
"y_start": [0] * len(feature_names),
50+
"x_end": loadings[:, 0] * loading_scale,
51+
"y_end": loadings[:, 1] * loading_scale,
52+
"variable": clean_names,
53+
}
54+
)
55+
56+
# Label positions with smart offset to avoid overlap
57+
# Petal Length and Petal Width have similar directions, so we offset them differently
58+
label_offsets = []
59+
for i, name in enumerate(clean_names):
60+
x_end = loadings_df["x_end"].iloc[i]
61+
y_end = loadings_df["y_end"].iloc[i]
62+
if name == "Petal Width":
63+
# Offset Petal Width label upward to avoid overlap with Petal Length
64+
label_offsets.append((x_end * 1.15, y_end * 1.15 + 0.4))
65+
elif name == "Petal Length":
66+
# Offset Petal Length label downward
67+
label_offsets.append((x_end * 1.15, y_end * 1.15 - 0.3))
68+
else:
69+
# Default offset for other labels
70+
label_offsets.append((x_end * 1.15, y_end * 1.15))
71+
72+
loadings_df["label_x"] = [offset[0] for offset in label_offsets]
73+
loadings_df["label_y"] = [offset[1] for offset in label_offsets]
74+
75+
# Colorblind-safe palette (blue, orange, purple - distinguishable for all color vision types)
76+
colors = ["#0077BB", "#EE7733", "#AA3377"]
77+
78+
# Build the plot
79+
plot = (
80+
ggplot() # noqa: F405
81+
+ geom_point( # noqa: F405
82+
data=scores_df,
83+
mapping=aes(x="PC1", y="PC2", color="Species"), # noqa: F405
84+
size=5,
85+
alpha=0.8,
86+
)
87+
+ geom_segment( # noqa: F405
88+
data=loadings_df,
89+
mapping=aes(x="x_start", y="y_start", xend="x_end", yend="y_end"), # noqa: F405
90+
color="#333333",
91+
size=1.8,
92+
arrow=arrow(length=15, type="open"), # noqa: F405
93+
)
94+
+ geom_text( # noqa: F405
95+
data=loadings_df,
96+
mapping=aes(x="label_x", y="label_y", label="variable"), # noqa: F405
97+
size=14,
98+
color="#333333",
99+
)
100+
+ geom_hline(yintercept=0, color="gray", size=0.5, linetype="dashed", alpha=0.5) # noqa: F405
101+
+ geom_vline(xintercept=0, color="gray", size=0.5, linetype="dashed", alpha=0.5) # noqa: F405
102+
+ labs( # noqa: F405
103+
x=f"PC1 ({var_explained[0]:.1f}%)",
104+
y=f"PC2 ({var_explained[1]:.1f}%)",
105+
title="biplot-pca · letsplot · pyplots.ai",
106+
color="Species",
107+
)
108+
+ scale_color_manual(values=colors) # noqa: F405
109+
+ scale_x_continuous(expand=[0.15, 0.15]) # noqa: F405
110+
+ scale_y_continuous(expand=[0.15, 0.15]) # noqa: F405
111+
+ theme_minimal() # noqa: F405
112+
+ theme( # noqa: F405
113+
plot_title=element_text(size=24), # noqa: F405
114+
axis_title=element_text(size=20), # noqa: F405
115+
axis_text=element_text(size=16), # noqa: F405
116+
legend_title=element_text(size=18), # noqa: F405
117+
legend_text=element_text(size=16), # noqa: F405
118+
)
119+
+ ggsize(1600, 900) # noqa: F405
120+
)
121+
122+
# Save PNG (scale 3x for 4800x2700)
123+
export_ggsave(plot, filename="plot.png", path=".", scale=3)
124+
125+
# Save HTML for interactivity
126+
export_ggsave(plot, filename="plot.html", path=".")
Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
library: letsplot
2+
specification_id: biplot-pca
3+
created: '2026-01-09T13:21:55Z'
4+
updated: '2026-01-09T13:34:33Z'
5+
generated_by: claude-opus-4-5-20251101
6+
workflow_run: 20853044167
7+
issue: 3417
8+
python_version: 3.13.11
9+
library_version: 4.8.2
10+
preview_url: https://storage.googleapis.com/pyplots-images/plots/biplot-pca/letsplot/plot.png
11+
preview_thumb: https://storage.googleapis.com/pyplots-images/plots/biplot-pca/letsplot/plot_thumb.png
12+
preview_html: https://storage.googleapis.com/pyplots-images/plots/biplot-pca/letsplot/plot.html
13+
quality_score: 91
14+
review:
15+
strengths:
16+
- Excellent label positioning with smart offsets to prevent Petal Length/Petal Width
17+
overlap
18+
- Clean colorblind-safe palette that clearly distinguishes all three species
19+
- 'Proper variance explained percentages in axis labels (PC1: 73.0%, PC2: 22.9%)'
20+
- Good use of reference lines at origin to help interpret loading directions
21+
- Clean feature names (Sepal Length instead of sepal length cm)
22+
- Appropriate loading scaling relative to score range
23+
- Correct title format following pyplots.ai conventions
24+
- Good canvas utilization with ggsize(1600, 900) and scale=3 for high-resolution
25+
output
26+
weaknesses:
27+
- Missing subtle grid lines for easier value reading (only origin reference lines
28+
present)
29+
- Unit circle mentioned in spec Notes as consider adding is not included (minor,
30+
as it says consider)
31+
image_description: 'The plot displays a PCA biplot using the Iris dataset. Observation
32+
scores are shown as colored scatter points representing three species: setosa
33+
(blue), versicolor (orange), and virginica (purple/magenta). Four loading vectors
34+
(arrows) originate from the origin, representing Sepal Width (pointing upward),
35+
Sepal Length (pointing right and slightly up), Petal Length (pointing right and
36+
slightly down), and Petal Width (pointing right and slightly up, between the petal
37+
and sepal length arrows). The axes are labeled with PC1 (73.0%) on the x-axis
38+
and PC2 (22.9%) on the y-axis, showing variance explained. Reference dashed lines
39+
at x=0 and y=0 are visible. The title "biplot-pca · letsplot · pyplots.ai" appears
40+
at the top. The three species clusters are well-separated, with setosa on the
41+
left and versicolor/virginica on the right. A legend on the right identifies the
42+
species colors.'
43+
criteria_checklist:
44+
visual_quality:
45+
score: 36
46+
max: 40
47+
items:
48+
- id: VQ-01
49+
name: Text Legibility
50+
score: 10
51+
max: 10
52+
passed: true
53+
comment: Title, axis labels, tick labels, loading labels, and legend text
54+
are all clearly readable at full size
55+
- id: VQ-02
56+
name: No Overlap
57+
score: 8
58+
max: 8
59+
passed: true
60+
comment: Label offsets successfully prevent overlap between Petal Length and
61+
Petal Width labels; all text elements are clearly separated
62+
- id: VQ-03
63+
name: Element Visibility
64+
score: 6
65+
max: 8
66+
passed: true
67+
comment: Points are well-sized and visible with good alpha; arrows are clearly
68+
visible but could be slightly thicker for better visibility
69+
- id: VQ-04
70+
name: Color Accessibility
71+
score: 5
72+
max: 5
73+
passed: true
74+
comment: 'Uses colorblind-safe palette (blue #0077BB, orange #EE7733, purple
75+
#AA3377) with good distinction'
76+
- id: VQ-05
77+
name: Layout Balance
78+
score: 5
79+
max: 5
80+
passed: true
81+
comment: Good use of canvas space, plot fills majority of area, balanced margins
82+
with expand settings
83+
- id: VQ-06
84+
name: Axis Labels
85+
score: 2
86+
max: 2
87+
passed: true
88+
comment: 'Descriptive labels with variance percentages: PC1 (73.0%) and PC2
89+
(22.9%)'
90+
- id: VQ-07
91+
name: Grid & Legend
92+
score: 0
93+
max: 2
94+
passed: false
95+
comment: No visible grid lines (only reference lines at origin), legend is
96+
well-placed but grid is absent
97+
spec_compliance:
98+
score: 25
99+
max: 25
100+
items:
101+
- id: SC-01
102+
name: Plot Type
103+
score: 8
104+
max: 8
105+
passed: true
106+
comment: Correct biplot showing both observation scores and loading vectors
107+
- id: SC-02
108+
name: Data Mapping
109+
score: 5
110+
max: 5
111+
passed: true
112+
comment: PC1 on x-axis, PC2 on y-axis, correct score and loading projections
113+
- id: SC-03
114+
name: Required Features
115+
score: 5
116+
max: 5
117+
passed: true
118+
comment: 'All spec features present: scores as points, loadings as arrows,
119+
labels, variance percentages, group coloring'
120+
- id: SC-04
121+
name: Data Range
122+
score: 3
123+
max: 3
124+
passed: true
125+
comment: All data points and arrows visible within axes with proper padding
126+
- id: SC-05
127+
name: Legend Accuracy
128+
score: 2
129+
max: 2
130+
passed: true
131+
comment: Legend correctly shows species names matching point colors
132+
- id: SC-06
133+
name: Title Format
134+
score: 2
135+
max: 2
136+
passed: true
137+
comment: 'Correct format: biplot-pca · letsplot · pyplots.ai'
138+
data_quality:
139+
score: 18
140+
max: 20
141+
items:
142+
- id: DQ-01
143+
name: Feature Coverage
144+
score: 6
145+
max: 8
146+
passed: true
147+
comment: Shows clear species separation, loading directions; could benefit
148+
from showing unit circle as mentioned in spec Notes
149+
- id: DQ-02
150+
name: Realistic Context
151+
score: 7
152+
max: 7
153+
passed: true
154+
comment: Classic Iris dataset, widely used in PCA tutorials, perfect neutral
155+
scientific context
156+
- id: DQ-03
157+
name: Appropriate Scale
158+
score: 5
159+
max: 5
160+
passed: true
161+
comment: Standardized data with realistic PC score ranges, proper loading
162+
scaling
163+
code_quality:
164+
score: 9
165+
max: 10
166+
items:
167+
- id: CQ-01
168+
name: KISS Structure
169+
score: 3
170+
max: 3
171+
passed: true
172+
comment: 'Linear script: imports → data → PCA → plot → save'
173+
- id: CQ-02
174+
name: Reproducibility
175+
score: 2
176+
max: 3
177+
passed: true
178+
comment: Uses deterministic dataset (Iris), no random data, StandardScaler
179+
is deterministic
180+
- id: CQ-03
181+
name: Clean Imports
182+
score: 2
183+
max: 2
184+
passed: true
185+
comment: All imports are used (numpy, pandas, lets_plot, sklearn components)
186+
- id: CQ-04
187+
name: No Deprecated API
188+
score: 1
189+
max: 1
190+
passed: true
191+
comment: Current lets-plot and sklearn APIs
192+
- id: CQ-05
193+
name: Output Correct
194+
score: 1
195+
max: 1
196+
passed: true
197+
comment: Saves as plot.png and plot.html
198+
library_features:
199+
score: 3
200+
max: 5
201+
items:
202+
- id: LF-01
203+
name: Distinctive Features
204+
score: 3
205+
max: 5
206+
passed: true
207+
comment: Uses ggplot grammar with geom_point, geom_segment, geom_text, arrow(),
208+
theme customization; could leverage more interactive features
209+
verdict: APPROVED
210+
impl_tags:
211+
dependencies:
212+
- sklearn
213+
techniques:
214+
- layer-composition
215+
- annotations
216+
- html-export
217+
patterns:
218+
- dataset-loading
219+
dataprep:
220+
- pca
221+
- normalization
222+
styling:
223+
- alpha-blending

0 commit comments

Comments
 (0)