Skip to content

Commit 29f7ab1

Browse files
feat(letsplot): implement sequence-logo-basic (#4612)
## Implementation: `sequence-logo-basic` - letsplot Implements the **letsplot** version of `sequence-logo-basic`. **File:** `plots/sequence-logo-basic/implementations/letsplot.py` **Parent Issue:** #4421 --- :robot: *[impl-generate workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/22780525013)* --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent 05b5927 commit 29f7ab1

2 files changed

Lines changed: 374 additions & 0 deletions

File tree

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
""" pyplots.ai
2+
sequence-logo-basic: Sequence Logo for Motif Visualization
3+
Library: letsplot 4.8.2 | Python 3.14.3
4+
Quality: 79/100 | Created: 2026-03-06
5+
"""
6+
7+
import numpy as np
8+
import pandas as pd
9+
from lets_plot import *
10+
11+
12+
LetsPlot.setup_html()
13+
14+
# Data — 10-position DNA transcription factor binding site motif
15+
positions = list(range(1, 11))
16+
17+
# Realistic motif frequencies (resembling a TATA-box-like binding site)
18+
frequencies = {
19+
1: {"A": 0.25, "C": 0.25, "G": 0.25, "T": 0.25},
20+
2: {"A": 0.10, "C": 0.05, "G": 0.05, "T": 0.80},
21+
3: {"A": 0.85, "C": 0.05, "G": 0.05, "T": 0.05},
22+
4: {"A": 0.05, "C": 0.05, "G": 0.05, "T": 0.85},
23+
5: {"A": 0.90, "C": 0.02, "G": 0.02, "T": 0.06},
24+
6: {"A": 0.60, "C": 0.05, "G": 0.05, "T": 0.30},
25+
7: {"A": 0.15, "C": 0.05, "G": 0.70, "T": 0.10},
26+
8: {"A": 0.05, "C": 0.80, "G": 0.10, "T": 0.05},
27+
9: {"A": 0.30, "C": 0.30, "G": 0.20, "T": 0.20},
28+
10: {"A": 0.25, "C": 0.25, "G": 0.25, "T": 0.25},
29+
}
30+
31+
# Color scheme: A=green, C=blue, G=orange, T=red
32+
color_map = {"A": "#2CA02C", "C": "#1F77B4", "G": "#FF7F0E", "T": "#D62728"}
33+
34+
# Calculate information content and build letter data
35+
rows = []
36+
max_info = 0.0
37+
for pos in positions:
38+
freqs = frequencies[pos]
39+
entropy = -sum(f * np.log2(f) for f in freqs.values() if f > 0)
40+
info_content = 2.0 - entropy
41+
42+
# Sort by frequency (least frequent at bottom, most frequent on top)
43+
sorted_letters = sorted(freqs.items(), key=lambda x: x[1])
44+
45+
y_bottom = 0.0
46+
for letter, freq in sorted_letters:
47+
height = freq * info_content
48+
if height < 0.04:
49+
y_bottom += height
50+
continue
51+
rows.append(
52+
{
53+
"position": pos,
54+
"xmin": pos - 0.45,
55+
"xmax": pos + 0.45,
56+
"ymin": y_bottom,
57+
"ymax": y_bottom + height,
58+
"ymid": y_bottom + height / 2,
59+
"height": height,
60+
"letter": letter,
61+
"frequency": freq,
62+
"info_bits": round(info_content, 3),
63+
}
64+
)
65+
y_bottom += height
66+
if info_content > max_info:
67+
max_info = info_content
68+
69+
df = pd.DataFrame(rows)
70+
71+
# Legend data — invisible points to create a proper legend with square symbols
72+
legend_df = pd.DataFrame({"x": [0] * 4, "y": [0] * 4, "letter": ["A", "C", "G", "T"]})
73+
74+
# Y-axis upper limit: round up to nearest 0.2 with small padding
75+
y_max = np.ceil(max_info * 5) / 5 + 0.05
76+
77+
# Build plot with colored letters as the primary visual element
78+
plot = (
79+
ggplot()
80+
# Subtle background rectangles for structure
81+
+ geom_rect(
82+
aes(xmin="xmin", xmax="xmax", ymin="ymin", ymax="ymax", fill="letter"),
83+
data=df,
84+
alpha=0.15,
85+
color="rgba(0,0,0,0)",
86+
size=0,
87+
show_legend=False,
88+
)
89+
# Colored letter glyphs — the primary visual element
90+
+ geom_text(
91+
aes(x="position", y="ymid", label="letter", color="letter", size="height"),
92+
data=df,
93+
fontface="bold",
94+
show_legend=False,
95+
tooltips=layer_tooltips()
96+
.format("@frequency", ".0%")
97+
.format("@info_bits", ".3f")
98+
.line("@letter")
99+
.line("Frequency: @frequency")
100+
.line("Info content: @info_bits bits"),
101+
)
102+
# Invisible points for proper legend (square shape shows color blocks)
103+
+ geom_point(
104+
aes(x="x", y="y", fill="letter"),
105+
data=legend_df,
106+
size=6,
107+
shape=22,
108+
color="rgba(0,0,0,0)",
109+
alpha=0,
110+
tooltips="none",
111+
)
112+
# Manual color scales
113+
+ scale_fill_manual(values=color_map, name="Nucleotide", breaks=["A", "C", "G", "T"])
114+
+ scale_color_manual(values=color_map)
115+
+ scale_size(range=[6, 36], guide="none")
116+
+ scale_x_continuous(breaks=positions, limits=[0.3, 10.7])
117+
+ scale_y_continuous(limits=[0, y_max], breaks=[0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4])
118+
+ guides(fill=guide_legend(override_aes={"size": 12, "alpha": 1.0}))
119+
+ labs(x="Position", y="Information content (bits)", title="sequence-logo-basic \u00b7 letsplot \u00b7 pyplots.ai")
120+
+ theme_minimal()
121+
+ theme(
122+
plot_title=element_text(size=28, face="bold"),
123+
axis_title=element_text(size=22),
124+
axis_text=element_text(size=18),
125+
legend_title=element_text(size=20, face="bold"),
126+
legend_text=element_text(size=18),
127+
panel_grid_major_x=element_blank(),
128+
panel_grid_minor=element_blank(),
129+
panel_grid_major_y=element_line(color="#E0E0E0", size=0.5),
130+
plot_background=element_rect(color="white", fill="white"),
131+
)
132+
+ ggsize(1600, 900)
133+
)
134+
135+
# Save
136+
ggsave(plot, "plot.png", scale=3, path=".")
137+
ggsave(plot, "plot.html", path=".")
Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
library: letsplot
2+
specification_id: sequence-logo-basic
3+
created: '2026-03-06T20:26:04Z'
4+
updated: '2026-03-06T20:56:37Z'
5+
generated_by: claude-opus-4-5-20251101
6+
workflow_run: 22780525013
7+
issue: 4421
8+
python_version: 3.14.3
9+
library_version: 4.8.2
10+
preview_url: https://storage.googleapis.com/pyplots-images/plots/sequence-logo-basic/letsplot/plot.png
11+
preview_thumb: https://storage.googleapis.com/pyplots-images/plots/sequence-logo-basic/letsplot/plot_thumb.png
12+
preview_html: https://storage.googleapis.com/pyplots-images/plots/sequence-logo-basic/letsplot/plot.html
13+
quality_score: 79
14+
review:
15+
strengths:
16+
- Excellent realistic data choice (TATA-box motif) that immediately communicates
17+
biological meaning
18+
- Semi-transparent background rectangles are a creative solution that adds structure
19+
and helps visualize letter boundaries
20+
- Clean well-organized code with proper information content calculation
21+
- Good use of lets-plot-specific features (layer_tooltips with formatted biological
22+
data, HTML export)
23+
- 'Strong visual refinement: subtle grid, removed x-grid lines, explicit font sizing'
24+
weaknesses:
25+
- Letters are not stretched to fill their allocated height as true sequence logos
26+
require — fundamental lets-plot limitation with text rendering
27+
- Small-frequency letters at the base of stacks are very small and hard to read
28+
- Color scheme uses green+red which is challenging for colorblind users (though
29+
spec-mandated for DNA)
30+
image_description: 'The plot displays a sequence logo for a 10-position DNA transcription
31+
factor binding site (TATA-box-like motif). At each position, nucleotide letters
32+
(A, C, G, T) are stacked vertically with heights proportional to information content
33+
in bits. Standard DNA colors are used: A=green, C=blue, G=orange, T=red. Semi-transparent
34+
background rectangles behind each letter stack add visual depth. Positions 2-5
35+
show the conserved TATA core with large dominant letters (T, A, T, A). Positions
36+
1, 9, and 10 have no visible letters (uniform distribution = 0 bits). Positions
37+
7 (G) and 8 (C) show moderate conservation. The y-axis ranges from 0 to 1.4 bits
38+
with subtle horizontal gridlines. A "Nucleotide" legend with colored squares appears
39+
on the right. The title reads "sequence-logo-basic · letsplot · pyplots.ai". Overall
40+
layout is clean with good canvas utilization.'
41+
criteria_checklist:
42+
visual_quality:
43+
score: 23
44+
max: 30
45+
items:
46+
- id: VQ-01
47+
name: Text Legibility
48+
score: 6
49+
max: 8
50+
passed: true
51+
comment: Font sizes explicitly set. Main letters readable; small-frequency
52+
letters at stack bases are very small due to size scaling.
53+
- id: VQ-02
54+
name: No Overlap
55+
score: 5
56+
max: 6
57+
passed: true
58+
comment: Small letters at bottom of stacks slightly crowded at positions 2-4,
59+
but no significant overlap.
60+
- id: VQ-03
61+
name: Element Visibility
62+
score: 4
63+
max: 6
64+
passed: true
65+
comment: Dominant letters clearly visible. Background rectangles aid visibility.
66+
Small-frequency letters at base are quite tiny.
67+
- id: VQ-04
68+
name: Color Accessibility
69+
score: 3
70+
max: 4
71+
passed: true
72+
comment: Standard DNA colors (green/blue/orange/red) as spec-mandated. Green+red
73+
can challenge colorblind users.
74+
- id: VQ-05
75+
name: Layout & Canvas
76+
score: 3
77+
max: 4
78+
passed: true
79+
comment: Good canvas utilization (~60%). Empty positions 1, 9, 10 create some
80+
whitespace reflecting the data.
81+
- id: VQ-06
82+
name: Axis Labels & Title
83+
score: 2
84+
max: 2
85+
passed: true
86+
comment: Position and Information content (bits) — descriptive with units.
87+
design_excellence:
88+
score: 13
89+
max: 20
90+
items:
91+
- id: DE-01
92+
name: Aesthetic Sophistication
93+
score: 5
94+
max: 8
95+
passed: true
96+
comment: Custom DNA palette, semi-transparent background rectangles, bold
97+
letters, minimal theme. Above defaults but not publication-level.
98+
- id: DE-02
99+
name: Visual Refinement
100+
score: 4
101+
max: 6
102+
passed: true
103+
comment: theme_minimal(), x-grid removed, subtle y-grid only, white background,
104+
generous spacing.
105+
- id: DE-03
106+
name: Data Storytelling
107+
score: 4
108+
max: 6
109+
passed: true
110+
comment: TATA-box motif creates natural visual hierarchy. Conserved core dominates
111+
visually.
112+
spec_compliance:
113+
score: 12
114+
max: 15
115+
items:
116+
- id: SC-01
117+
name: Plot Type
118+
score: 3
119+
max: 5
120+
passed: false
121+
comment: Sequence logo with stacked letters scaled by info content, but letters
122+
not stretched to fill rectangles as spec requires — lets-plot limitation.
123+
- id: SC-02
124+
name: Required Features
125+
score: 3
126+
max: 4
127+
passed: true
128+
comment: Vertical stacking, info content scaling, standard colors, frequency
129+
ordering, axis labels all present. Missing stretched glyph rendering.
130+
- id: SC-03
131+
name: Data Mapping
132+
score: 3
133+
max: 3
134+
passed: true
135+
comment: X=position, Y=information content. Correct mapping with proper calculation.
136+
- id: SC-04
137+
name: Title & Legend
138+
score: 3
139+
max: 3
140+
passed: true
141+
comment: Title format correct. Legend shows all four nucleotides with correct
142+
color-coded squares.
143+
data_quality:
144+
score: 14
145+
max: 15
146+
items:
147+
- id: DQ-01
148+
name: Feature Coverage
149+
score: 5
150+
max: 6
151+
passed: true
152+
comment: Shows high conservation (pos 2-5, 7-8), low conservation (pos 1,
153+
9-10), and mixed (pos 6).
154+
- id: DQ-02
155+
name: Realistic Context
156+
score: 5
157+
max: 5
158+
passed: true
159+
comment: TATA-box-like transcription factor binding site — a real, well-known
160+
biological motif.
161+
- id: DQ-03
162+
name: Appropriate Scale
163+
score: 4
164+
max: 4
165+
passed: true
166+
comment: Information content 0 to ~1.4 bits, realistic for 4-letter DNA alphabet
167+
(max 2 bits).
168+
code_quality:
169+
score: 10
170+
max: 10
171+
items:
172+
- id: CQ-01
173+
name: KISS Structure
174+
score: 3
175+
max: 3
176+
passed: true
177+
comment: 'Linear flow: imports, data, calculation, plot, save. No functions
178+
or classes.'
179+
- id: CQ-02
180+
name: Reproducibility
181+
score: 2
182+
max: 2
183+
passed: true
184+
comment: Deterministic hardcoded frequency data.
185+
- id: CQ-03
186+
name: Clean Imports
187+
score: 2
188+
max: 2
189+
passed: true
190+
comment: All imports (numpy, pandas, lets_plot) are used.
191+
- id: CQ-04
192+
name: Code Elegance
193+
score: 2
194+
max: 2
195+
passed: true
196+
comment: Clean, well-organized code. Invisible legend points technique is
197+
a reasonable workaround.
198+
- id: CQ-05
199+
name: Output & API
200+
score: 1
201+
max: 1
202+
passed: true
203+
comment: Saves as plot.png with scale=3 for 4800x2700. Current API.
204+
library_mastery:
205+
score: 7
206+
max: 10
207+
items:
208+
- id: LM-01
209+
name: Idiomatic Usage
210+
score: 4
211+
max: 5
212+
passed: true
213+
comment: 'Good ggplot grammar: aes(), multiple geom layers, scale_*_manual(),
214+
theme customization, guides().'
215+
- id: LM-02
216+
name: Distinctive Features
217+
score: 3
218+
max: 5
219+
passed: true
220+
comment: Uses layer_tooltips() with custom formatting — a distinctive lets-plot
221+
interactive feature. Also exports HTML.
222+
verdict: REJECTED
223+
impl_tags:
224+
dependencies: []
225+
techniques:
226+
- layer-composition
227+
- custom-legend
228+
- hover-tooltips
229+
- html-export
230+
patterns:
231+
- data-generation
232+
- iteration-over-groups
233+
dataprep:
234+
- normalization
235+
styling:
236+
- grid-styling
237+
- alpha-blending

0 commit comments

Comments
 (0)