Skip to content

Commit 05b5927

Browse files
feat(pygal): implement sequence-logo-basic (#4610)
## Implementation: `sequence-logo-basic` - pygal Implements the **pygal** version of `sequence-logo-basic`. **File:** `plots/sequence-logo-basic/implementations/pygal.py` **Parent Issue:** #4421 --- :robot: *[impl-generate workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/22780525053)* --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent 779dd58 commit 05b5927

2 files changed

Lines changed: 367 additions & 0 deletions

File tree

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
""" pyplots.ai
2+
sequence-logo-basic: Sequence Logo for Motif Visualization
3+
Library: pygal 3.1.0 | Python 3.14.3
4+
Quality: 81/100 | Created: 2026-03-06
5+
"""
6+
7+
import numpy as np
8+
import pygal
9+
from pygal.style import Style
10+
11+
12+
# Data — ETS transcription factor binding site motif (10 positions)
13+
# Positions 2-7 form the conserved GGAATT core
14+
frequencies = {
15+
1: {"A": 0.40, "C": 0.20, "G": 0.25, "T": 0.15},
16+
2: {"A": 0.10, "C": 0.05, "G": 0.80, "T": 0.05},
17+
3: {"A": 0.05, "C": 0.05, "G": 0.85, "T": 0.05},
18+
4: {"A": 0.90, "C": 0.03, "G": 0.04, "T": 0.03},
19+
5: {"A": 0.85, "C": 0.05, "G": 0.05, "T": 0.05},
20+
6: {"A": 0.05, "C": 0.05, "G": 0.05, "T": 0.85},
21+
7: {"A": 0.05, "C": 0.05, "G": 0.05, "T": 0.85},
22+
8: {"A": 0.25, "C": 0.30, "G": 0.20, "T": 0.25},
23+
9: {"A": 0.15, "C": 0.35, "G": 0.35, "T": 0.15},
24+
10: {"A": 0.30, "C": 0.20, "G": 0.30, "T": 0.20},
25+
}
26+
27+
nucleotides = ["A", "C", "G", "T"]
28+
max_entropy = 2.0 # bits for DNA
29+
30+
# Calculate information content at each position
31+
info_content = {}
32+
for pos, freqs in frequencies.items():
33+
entropy = 0
34+
for nt in nucleotides:
35+
f = freqs[nt]
36+
if f > 0:
37+
entropy -= f * np.log2(f)
38+
info_content[pos] = max_entropy - entropy
39+
40+
# Identify conserved core (IC > 0.5 bits) for visual emphasis
41+
core_positions = {pos for pos, ic in info_content.items() if ic > 0.5}
42+
43+
# Scale each nucleotide height by frequency * information content
44+
# Build per-nucleotide series with letter labels inside bars
45+
stacked_data = {nt: [] for nt in nucleotides}
46+
for pos in sorted(frequencies.keys()):
47+
ic = info_content[pos]
48+
for nt in nucleotides:
49+
height = round(frequencies[pos][nt] * ic, 4)
50+
show_letter = height >= 0.10
51+
is_core = pos in core_positions
52+
stacked_data[nt].append(
53+
{
54+
"value": height,
55+
"label": (
56+
f"{'[core] ' if is_core else ''}Pos {pos}: {nt} = {frequencies[pos][nt]:.0%} x {ic:.2f} bits"
57+
),
58+
"formatter": (lambda x, letter=nt, show=show_letter: letter if show else ""),
59+
}
60+
)
61+
62+
# Colorblind-safe DNA palette: teal A, blue C, amber G, purple T
63+
# Avoids red-green confusion while maintaining visual distinctiveness
64+
custom_style = Style(
65+
background="white",
66+
plot_background="#f8f9fa",
67+
foreground="#2d2d2d",
68+
foreground_strong="#111111",
69+
foreground_subtle="#e0e0e0",
70+
colors=("#0f766e", "#1d4ed8", "#d97706", "#7c3aed"),
71+
opacity=0.92,
72+
opacity_hover=1.0,
73+
title_font_size=36,
74+
label_font_size=22,
75+
major_label_font_size=20,
76+
legend_font_size=22,
77+
value_font_size=24,
78+
title_font_family="sans-serif",
79+
label_font_family="sans-serif",
80+
major_label_font_family="sans-serif",
81+
legend_font_family="sans-serif",
82+
value_font_family="monospace",
83+
tooltip_font_size=18,
84+
tooltip_font_family="monospace",
85+
)
86+
87+
# X-labels: mark conserved core positions for emphasis
88+
x_labels = []
89+
for pos in sorted(frequencies.keys()):
90+
if pos in core_positions:
91+
label = f"*{pos}*"
92+
else:
93+
label = str(pos)
94+
x_labels.append(label)
95+
96+
# Plot
97+
chart = pygal.StackedBar(
98+
width=4800,
99+
height=2700,
100+
style=custom_style,
101+
title="sequence-logo-basic · pygal · pyplots.ai",
102+
x_title="Position (* = conserved core, IC > 0.5 bits)",
103+
y_title="Information content (bits)",
104+
show_x_guides=False,
105+
show_y_guides=True,
106+
show_minor_y_labels=False,
107+
margin=60,
108+
margin_bottom=120,
109+
spacing=6,
110+
legend_at_bottom=True,
111+
legend_at_bottom_columns=4,
112+
legend_box_size=28,
113+
print_values=True,
114+
print_values_position="center",
115+
rounded_bars=3,
116+
y_labels_major_count=6,
117+
truncate_legend=-1,
118+
tooltip_border_radius=10,
119+
tooltip_fancy_mode=True,
120+
min_scale=0,
121+
range=(0, 1.6),
122+
x_label_rotation=0,
123+
secondary_style=custom_style,
124+
inner_radius=0,
125+
js=[],
126+
)
127+
128+
chart.x_labels = x_labels
129+
130+
for nt in nucleotides:
131+
chart.add(nt, stacked_data[nt])
132+
133+
# Save
134+
chart.render_to_png("plot.png")
135+
chart.render_to_file("plot.html")
Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
library: pygal
2+
specification_id: sequence-logo-basic
3+
created: '2026-03-06T20:25:16Z'
4+
updated: '2026-03-06T20:51:27Z'
5+
generated_by: claude-opus-4-5-20251101
6+
workflow_run: 22780525053
7+
issue: 4421
8+
python_version: 3.14.3
9+
library_version: 3.1.0
10+
preview_url: https://storage.googleapis.com/pyplots-images/plots/sequence-logo-basic/pygal/plot.png
11+
preview_thumb: https://storage.googleapis.com/pyplots-images/plots/sequence-logo-basic/pygal/plot_thumb.png
12+
preview_html: https://storage.googleapis.com/pyplots-images/plots/sequence-logo-basic/pygal/plot.html
13+
quality_score: 81
14+
review:
15+
strengths:
16+
- 'Excellent data quality: real ETS transcription factor binding site with biologically
17+
accurate frequencies'
18+
- Colorblind-safe palette (teal/blue/amber/purple) with deliberate avoidance of
19+
red-green confusion
20+
- Clean, well-structured code with appropriate complexity
21+
- Good use of pygal-specific features (formatters, tooltips, rounded bars, dual
22+
output)
23+
- Core position marking on x-axis adds context to the visualization
24+
weaknesses:
25+
- 'Not a true sequence logo: letters are small text labels inside bars rather than
26+
scaled glyphs filling the bar height'
27+
- Fixed stacking order (A, C, G, T) rather than frequency-sorted per position as
28+
spec requires
29+
- Y-axis range (0-1.6) creates wasted vertical space above the tallest bar (~1.35)
30+
image_description: 'The plot is a stacked bar chart with 10 positions along the
31+
x-axis and "Information content (bits)" on the y-axis (ranging from 0 to ~1.5).
32+
Four nucleotide series are displayed: A (teal/green), C (blue), G (amber/orange),
33+
T (red/purple). Positions 2-7 feature tall stacked bars representing a conserved
34+
GGAATT core — position 4 is the tallest (~1.35 bits, dominated by green A). Positions
35+
1, 8, 9, 10 have very short bars indicating low information content. Letters (G,
36+
A, T) are printed as text labels inside the larger bar segments. The title reads
37+
"sequence-logo-basic · pygal · pyplots.ai" at the top. A legend at the bottom
38+
shows all four nucleotides in a 4-column layout. The plot background is light
39+
gray (#f8f9fa) on a white canvas, with rounded bar corners and subtle styling.'
40+
criteria_checklist:
41+
visual_quality:
42+
score: 27
43+
max: 30
44+
items:
45+
- id: VQ-01
46+
name: Text Legibility
47+
score: 7
48+
max: 8
49+
passed: true
50+
comment: Font sizes explicitly set (title=36, labels=22, major_labels=20,
51+
value=24). All text readable. Title slightly compact relative to canvas.
52+
- id: VQ-02
53+
name: No Overlap
54+
score: 6
55+
max: 6
56+
passed: true
57+
comment: No overlapping text or elements. X-labels well-spaced, legend at
58+
bottom in 4 columns.
59+
- id: VQ-03
60+
name: Element Visibility
61+
score: 5
62+
max: 6
63+
passed: true
64+
comment: Conserved positions prominent. Positions 8-10 have very short bars
65+
where segments are hard to distinguish.
66+
- id: VQ-04
67+
name: Color Accessibility
68+
score: 4
69+
max: 4
70+
passed: true
71+
comment: Teal/blue/amber/purple palette avoids red-green confusion with good
72+
contrast.
73+
- id: VQ-05
74+
name: Layout & Canvas
75+
score: 3
76+
max: 4
77+
passed: true
78+
comment: Plot fills ~50-60% of canvas. Some wasted vertical space above tallest
79+
bar.
80+
- id: VQ-06
81+
name: Axis Labels & Title
82+
score: 2
83+
max: 2
84+
passed: true
85+
comment: Y-axis with units (bits), X-axis descriptive with core position context.
86+
design_excellence:
87+
score: 12
88+
max: 20
89+
items:
90+
- id: DE-01
91+
name: Aesthetic Sophistication
92+
score: 5
93+
max: 8
94+
passed: true
95+
comment: Custom palette, light gray background, rounded bars, custom opacity.
96+
Above defaults but not publication-ready.
97+
- id: DE-02
98+
name: Visual Refinement
99+
score: 4
100+
max: 6
101+
passed: true
102+
comment: No x-guides, subtle foreground colors, custom margins and spacing,
103+
organized legend.
104+
- id: DE-03
105+
name: Data Storytelling
106+
score: 3
107+
max: 6
108+
passed: true
109+
comment: Conserved core visually stands out. Core positions marked on x-axis.
110+
Letters identify dominant nucleotides.
111+
spec_compliance:
112+
score: 11
113+
max: 15
114+
items:
115+
- id: SC-01
116+
name: Plot Type
117+
score: 3
118+
max: 5
119+
passed: false
120+
comment: Stacked bar chart approximation rather than true sequence logo with
121+
scaled letter glyphs.
122+
- id: SC-02
123+
name: Required Features
124+
score: 2
125+
max: 4
126+
passed: false
127+
comment: Missing frequency-based stacking order and scaled letter glyphs.
128+
IC-scaled heights and DNA colors present.
129+
- id: SC-03
130+
name: Data Mapping
131+
score: 3
132+
max: 3
133+
passed: true
134+
comment: X-axis positions 1-10, Y-axis information content in bits. Stack
135+
heights correctly reflect IC.
136+
- id: SC-04
137+
name: Title & Legend
138+
score: 3
139+
max: 3
140+
passed: true
141+
comment: Title follows exact format. Legend correctly labels all four nucleotides.
142+
data_quality:
143+
score: 14
144+
max: 15
145+
items:
146+
- id: DQ-01
147+
name: Feature Coverage
148+
score: 5
149+
max: 6
150+
passed: true
151+
comment: '10-position motif with good variation: highly conserved, mixed,
152+
and near-uniform positions.'
153+
- id: DQ-02
154+
name: Realistic Context
155+
score: 5
156+
max: 5
157+
passed: true
158+
comment: ETS transcription factor binding site with GGAATT core — real, well-known
159+
biological motif.
160+
- id: DQ-03
161+
name: Appropriate Scale
162+
score: 4
163+
max: 4
164+
passed: true
165+
comment: Frequencies sum to 1.0 per position. IC ranges 0.02-1.35 bits, within
166+
0-2 bits for DNA.
167+
code_quality:
168+
score: 10
169+
max: 10
170+
items:
171+
- id: CQ-01
172+
name: KISS Structure
173+
score: 3
174+
max: 3
175+
passed: true
176+
comment: 'Linear flow: imports, data, IC calculation, style, chart config,
177+
save. No functions or classes.'
178+
- id: CQ-02
179+
name: Reproducibility
180+
score: 2
181+
max: 2
182+
passed: true
183+
comment: Fully deterministic data, no random generation.
184+
- id: CQ-03
185+
name: Clean Imports
186+
score: 2
187+
max: 2
188+
passed: true
189+
comment: numpy, pygal, pygal.style.Style — all used.
190+
- id: CQ-04
191+
name: Code Elegance
192+
score: 2
193+
max: 2
194+
passed: true
195+
comment: Clean, well-organized. Lambda with default arguments for formatters
196+
is appropriate.
197+
- id: CQ-05
198+
name: Output & API
199+
score: 1
200+
max: 1
201+
passed: true
202+
comment: Saves as plot.png via render_to_png. Current API.
203+
library_mastery:
204+
score: 7
205+
max: 10
206+
items:
207+
- id: LM-01
208+
name: Idiomatic Usage
209+
score: 4
210+
max: 5
211+
passed: true
212+
comment: StackedBar, Style class, structured data dicts with value/label/formatter,
213+
idiomatic pygal patterns.
214+
- id: LM-02
215+
name: Distinctive Features
216+
score: 3
217+
max: 5
218+
passed: true
219+
comment: print_values with custom formatters, tooltip config, rounded_bars,
220+
legend_at_bottom_columns, dual PNG+HTML render.
221+
verdict: REJECTED
222+
impl_tags:
223+
dependencies: []
224+
techniques:
225+
- html-export
226+
patterns:
227+
- data-generation
228+
- iteration-over-groups
229+
dataprep:
230+
- normalization
231+
styling:
232+
- alpha-blending

0 commit comments

Comments
 (0)