Skip to content

Commit 0774983

Browse files
feat(seaborn): implement sequence-logo-basic (#4614)
## Implementation: `sequence-logo-basic` - seaborn Implements the **seaborn** version of `sequence-logo-basic`. **File:** `plots/sequence-logo-basic/implementations/seaborn.py` **Parent Issue:** #4421 --- :robot: *[impl-generate workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/22780524945)* --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent fc0ff4b commit 0774983

2 files changed

Lines changed: 383 additions & 0 deletions

File tree

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
""" pyplots.ai
2+
sequence-logo-basic: Sequence Logo for Motif Visualization
3+
Library: seaborn 0.13.2 | Python 3.14.3
4+
Quality: 92/100 | Created: 2026-03-06
5+
"""
6+
7+
import matplotlib.pyplot as plt
8+
import matplotlib.transforms as mtransforms
9+
import numpy as np
10+
import pandas as pd
11+
import seaborn as sns
12+
from matplotlib.font_manager import FontProperties
13+
from matplotlib.patches import PathPatch
14+
from matplotlib.textpath import TextPath
15+
16+
17+
# Data - DNA transcription factor binding site motif (10 positions)
18+
bases = ["A", "C", "G", "T"]
19+
20+
frequencies = np.array(
21+
[
22+
[0.05, 0.80, 0.10, 0.05], # pos 1: strong C
23+
[0.70, 0.10, 0.10, 0.10], # pos 2: strong A
24+
[0.05, 0.05, 0.85, 0.05], # pos 3: strong G
25+
[0.10, 0.10, 0.10, 0.70], # pos 4: strong T
26+
[0.25, 0.25, 0.25, 0.25], # pos 5: no preference
27+
[0.60, 0.15, 0.15, 0.10], # pos 6: moderate A
28+
[0.05, 0.05, 0.05, 0.85], # pos 7: strong T
29+
[0.90, 0.03, 0.04, 0.03], # pos 8: very strong A
30+
[0.10, 0.60, 0.20, 0.10], # pos 9: moderate C
31+
[0.05, 0.05, 0.80, 0.10], # pos 10: strong G
32+
]
33+
)
34+
35+
n_positions = frequencies.shape[0]
36+
37+
# Calculate information content (bits) per position
38+
# IC = log2(4) + sum(f * log2(f)) = 2 + sum(f * log2(f))
39+
info_content = np.zeros(n_positions)
40+
for i in range(n_positions):
41+
entropy = sum(f * np.log2(f) for f in frequencies[i] if f > 0)
42+
info_content[i] = 2.0 + entropy
43+
44+
# Standard DNA color scheme via seaborn palette management
45+
base_color_list = ["#3AA655", "#4169E1", "#F5A623", "#E74C3C"]
46+
base_palette = sns.color_palette(base_color_list)
47+
base_colors = dict(zip(bases, base_palette, strict=True))
48+
49+
# Build frequency DataFrame for heatmap
50+
freq_df = pd.DataFrame(frequencies.T, index=bases, columns=range(1, n_positions + 1))
51+
52+
# Plot setup with seaborn context and style
53+
sns.set_context("talk", font_scale=1.0)
54+
sns.set_style("white")
55+
fig, (ax_logo, ax_heat) = plt.subplots(2, 1, figsize=(16, 9), height_ratios=[3.5, 1], gridspec_kw={"hspace": 0.45})
56+
57+
# --- Top panel: Sequence Logo ---
58+
fp = FontProperties(family="monospace", weight="bold")
59+
letter_width = 0.78
60+
61+
for pos in range(n_positions):
62+
ic = info_content[pos]
63+
letter_heights = frequencies[pos] * ic
64+
sorted_indices = np.argsort(letter_heights)
65+
y_offset = 0.0
66+
67+
for idx in sorted_indices:
68+
height = letter_heights[idx]
69+
if height < 0.01:
70+
continue
71+
72+
letter = bases[idx]
73+
color = base_colors[letter]
74+
x_center = pos
75+
x_left = x_center - letter_width / 2
76+
77+
tp = TextPath((0, 0), letter, size=1, prop=fp)
78+
bbox = tp.get_extents()
79+
if bbox.width == 0 or bbox.height == 0:
80+
continue
81+
82+
scale_x = letter_width / bbox.width
83+
scale_y = height / bbox.height
84+
tx = x_left - bbox.x0 * scale_x
85+
ty = y_offset - bbox.y0 * scale_y
86+
87+
transform = mtransforms.Affine2D().scale(scale_x, scale_y).translate(tx, ty) + ax_logo.transData
88+
patch = PathPatch(tp, facecolor=color, edgecolor="none", transform=transform)
89+
ax_logo.add_patch(patch)
90+
y_offset += height
91+
92+
# Logo axis styling
93+
ax_logo.set_xlim(-0.6, n_positions - 0.4)
94+
ax_logo.set_ylim(0, 2.1)
95+
ax_logo.set_xticks(range(n_positions))
96+
ax_logo.set_xticklabels(range(1, n_positions + 1))
97+
ax_logo.set_xlabel("Position", fontsize=20)
98+
ax_logo.set_ylabel("Information content (bits)", fontsize=20)
99+
ax_logo.set_title("sequence-logo-basic \u00b7 seaborn \u00b7 pyplots.ai", fontsize=24, fontweight="medium", pad=15)
100+
ax_logo.tick_params(axis="both", labelsize=16)
101+
sns.despine(ax=ax_logo, top=True, right=True)
102+
ax_logo.yaxis.grid(True, alpha=0.15, linewidth=0.5, color="#cccccc")
103+
ax_logo.set_axisbelow(True)
104+
105+
# Highlight the most conserved position
106+
max_ic_pos = int(np.argmax(info_content))
107+
ax_logo.axvspan(max_ic_pos - 0.42, max_ic_pos + 0.42, color="#ffd700", alpha=0.12, zorder=0)
108+
109+
# Conservation annotation
110+
ax_logo.annotate(
111+
f"Most conserved\n({info_content[max_ic_pos]:.1f} bits)",
112+
xy=(max_ic_pos, info_content[max_ic_pos]),
113+
xytext=(max_ic_pos - 2.5, 1.90),
114+
fontsize=13,
115+
fontstyle="italic",
116+
color="#444444",
117+
arrowprops={"arrowstyle": "->", "color": "#888888", "lw": 1.3, "connectionstyle": "arc3,rad=-0.2"},
118+
ha="center",
119+
va="center",
120+
)
121+
122+
# --- Bottom panel: Frequency heatmap using seaborn ---
123+
sns.heatmap(
124+
freq_df,
125+
ax=ax_heat,
126+
cmap=sns.light_palette("#306998", as_cmap=True),
127+
annot=True,
128+
fmt=".2f",
129+
annot_kws={"fontsize": 12, "fontweight": "medium"},
130+
linewidths=1.5,
131+
linecolor="white",
132+
cbar_kws={"label": "Frequency", "shrink": 0.8, "aspect": 15, "pad": 0.02},
133+
vmin=0,
134+
vmax=1,
135+
square=False,
136+
)
137+
ax_heat.set_xlabel("Position", fontsize=16)
138+
ax_heat.set_ylabel("", fontsize=16)
139+
ax_heat.tick_params(axis="both", labelsize=14)
140+
ax_heat.tick_params(axis="y", rotation=0)
141+
142+
# Color the y-axis base labels to match the logo colors
143+
for tick_label in ax_heat.get_yticklabels():
144+
base = tick_label.get_text()
145+
if base in base_colors:
146+
tick_label.set_color(base_colors[base])
147+
tick_label.set_fontweight("bold")
148+
149+
plt.savefig("plot.png", dpi=300, bbox_inches="tight")
Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
library: seaborn
2+
specification_id: sequence-logo-basic
3+
created: '2026-03-06T20:26:36Z'
4+
updated: '2026-03-06T20:58:49Z'
5+
generated_by: claude-opus-4-5-20251101
6+
workflow_run: 22780524945
7+
issue: 4421
8+
python_version: 3.14.3
9+
library_version: 0.13.2
10+
preview_url: https://storage.googleapis.com/pyplots-images/plots/sequence-logo-basic/seaborn/plot.png
11+
preview_thumb: https://storage.googleapis.com/pyplots-images/plots/sequence-logo-basic/seaborn/plot_thumb.png
12+
preview_html: null
13+
quality_score: 92
14+
review:
15+
strengths:
16+
- Excellent two-panel design combining sequence logo with frequency heatmap for
17+
complementary views
18+
- Properly rendered letter glyphs using TextPath/PathPatch (correct technique for
19+
sequence logos)
20+
- Strong data storytelling with highlighted most-conserved position and annotation
21+
arrow
22+
- Colored heatmap y-axis labels matching logo colors — thoughtful design touch
23+
- Well-chosen data showing full range of conservation levels including uniform position
24+
weaknesses:
25+
- Minor colorblind concern with green-red pairing (though this is the standard bioinformatics
26+
convention)
27+
image_description: 'The plot features a two-panel layout on a white background.
28+
The top panel is a sequence logo showing 10 positions of a DNA transcription factor
29+
binding site. Letters (A in green, C in blue, G in orange, T in red) are stacked
30+
vertically at each position, with height proportional to information content in
31+
bits (y-axis, 0–2). Position 8 has the tallest stack (~1.4 bits, dominated by
32+
a large green "A") and is highlighted with a pale yellow vertical band and an
33+
italic annotation "Most conserved (1.4 bits)" with a curved arrow. Position 5
34+
is nearly empty (uniform distribution). The title reads "sequence-logo-basic ·
35+
seaborn · pyplots.ai" in medium weight at the top. Top and right spines are removed;
36+
a very subtle horizontal grid is visible. The bottom panel is a seaborn heatmap
37+
showing the raw frequency matrix (4 rows: A, C, G, T × 10 columns) with annotated
38+
values (.2f format), colored from white to dark blue. Y-axis labels are colored
39+
to match their respective base colors (green A, blue C, orange G, red T). A "Frequency"
40+
colorbar appears on the right. Both panels share "Position" as the x-axis label.'
41+
criteria_checklist:
42+
visual_quality:
43+
score: 29
44+
max: 30
45+
items:
46+
- id: VQ-01
47+
name: Text Legibility
48+
score: 8
49+
max: 8
50+
passed: true
51+
comment: 'All font sizes explicitly set: title 24pt, labels 20pt, ticks 16pt,
52+
heatmap annotations 12pt'
53+
- id: VQ-02
54+
name: No Overlap
55+
score: 6
56+
max: 6
57+
passed: true
58+
comment: No overlapping text or elements; annotation well-positioned
59+
- id: VQ-03
60+
name: Element Visibility
61+
score: 6
62+
max: 6
63+
passed: true
64+
comment: Letter glyphs properly scaled and clearly visible at all positions
65+
- id: VQ-04
66+
name: Color Accessibility
67+
score: 3
68+
max: 4
69+
passed: true
70+
comment: Standard DNA colors; green-red pairing is mild colorblind concern
71+
but established convention
72+
- id: VQ-05
73+
name: Layout & Canvas
74+
score: 4
75+
max: 4
76+
passed: true
77+
comment: Two-panel layout with 3.5:1 ratio gives proper emphasis; good use
78+
of 16x9 canvas
79+
- id: VQ-06
80+
name: Axis Labels & Title
81+
score: 2
82+
max: 2
83+
passed: true
84+
comment: Position and Information content (bits) with units
85+
design_excellence:
86+
score: 16
87+
max: 20
88+
items:
89+
- id: DE-01
90+
name: Aesthetic Sophistication
91+
score: 6
92+
max: 8
93+
passed: true
94+
comment: Custom DNA palette, two-panel layout, colored heatmap y-labels, annotation
95+
with curved arrow
96+
- id: DE-02
97+
name: Visual Refinement
98+
score: 5
99+
max: 6
100+
passed: true
101+
comment: Spines removed, subtle grid alpha=0.15, white linecolor in heatmap,
102+
generous panel spacing
103+
- id: DE-03
104+
name: Data Storytelling
105+
score: 5
106+
max: 6
107+
passed: true
108+
comment: Yellow highlight on most conserved position with annotation; heatmap
109+
provides complementary detail
110+
spec_compliance:
111+
score: 15
112+
max: 15
113+
items:
114+
- id: SC-01
115+
name: Plot Type
116+
score: 5
117+
max: 5
118+
passed: true
119+
comment: Correct sequence logo with vertically stacked letters scaled by information
120+
content
121+
- id: SC-02
122+
name: Required Features
123+
score: 4
124+
max: 4
125+
passed: true
126+
comment: 'All spec features: stacked letters, IC scaling, standard colors,
127+
scaled glyphs'
128+
- id: SC-03
129+
name: Data Mapping
130+
score: 3
131+
max: 3
132+
passed: true
133+
comment: X-axis positions 1-10, Y-axis IC in bits 0-2 range
134+
- id: SC-04
135+
name: Title & Legend
136+
score: 3
137+
max: 3
138+
passed: true
139+
comment: Correct title format; letters self-identify by color and shape
140+
data_quality:
141+
score: 15
142+
max: 15
143+
items:
144+
- id: DQ-01
145+
name: Feature Coverage
146+
score: 6
147+
max: 6
148+
passed: true
149+
comment: 'Full range: strong conservation (pos 3,7,8), moderate (pos 2,6,9),
150+
none (pos 5)'
151+
- id: DQ-02
152+
name: Realistic Context
153+
score: 5
154+
max: 5
155+
passed: true
156+
comment: DNA transcription factor binding site motif — real bioinformatics
157+
context
158+
- id: DQ-03
159+
name: Appropriate Scale
160+
score: 4
161+
max: 4
162+
passed: true
163+
comment: IC values 0-1.4 bits realistic for DNA; frequencies sum to 1 per
164+
position
165+
code_quality:
166+
score: 10
167+
max: 10
168+
items:
169+
- id: CQ-01
170+
name: KISS Structure
171+
score: 3
172+
max: 3
173+
passed: true
174+
comment: 'Linear flow: imports, data, IC calculation, logo rendering, heatmap,
175+
save'
176+
- id: CQ-02
177+
name: Reproducibility
178+
score: 2
179+
max: 2
180+
passed: true
181+
comment: Fully deterministic hardcoded frequency data
182+
- id: CQ-03
183+
name: Clean Imports
184+
score: 2
185+
max: 2
186+
passed: true
187+
comment: All imports used
188+
- id: CQ-04
189+
name: Code Elegance
190+
score: 2
191+
max: 2
192+
passed: true
193+
comment: Clean, well-structured; TextPath/PathPatch is correct technique
194+
- id: CQ-05
195+
name: Output & API
196+
score: 1
197+
max: 1
198+
passed: true
199+
comment: Saves as plot.png, dpi=300, bbox_inches=tight, no deprecated API
200+
library_mastery:
201+
score: 7
202+
max: 10
203+
items:
204+
- id: LM-01
205+
name: Idiomatic Usage
206+
score: 4
207+
max: 5
208+
passed: true
209+
comment: Good use of sns.set_context, set_style, despine, heatmap, color_palette,
210+
light_palette
211+
- id: LM-02
212+
name: Distinctive Features
213+
score: 3
214+
max: 5
215+
passed: true
216+
comment: sns.heatmap with annotations, sns.light_palette for custom cmap,
217+
sns.despine
218+
verdict: APPROVED
219+
impl_tags:
220+
dependencies: []
221+
techniques:
222+
- subplots
223+
- annotations
224+
- patches
225+
- manual-ticks
226+
- colorbar
227+
patterns:
228+
- data-generation
229+
- iteration-over-groups
230+
dataprep: []
231+
styling:
232+
- grid-styling
233+
- custom-colormap
234+
- edge-highlighting

0 commit comments

Comments
 (0)