Commit 736571d
evo2 dashboard: ColoredSequence + GeneUMAPView + 500-gene precompute
Two new visualizations for the SAE interpretability dashboard, plus the
offline pipeline that produces the gene-UMAP precompute bundle.
scripts/generate_fake_genes.py
500-row genes.tsv stand-in (gene_symbol, species, sequence) until a real
curated catalog lands. Realistic-ish distributions across 7 species.
scripts/gene_umap_precompute.py
End-to-end offline pipeline: genes.tsv -> Evo2 1B layer-20 -> TopK SAE
encode -> mean per gene -> UMAP (cosine) -> HDBSCAN clusters -> per-feature
firing stats. Writes G.npz, genes_umap.parquet, feature_stats.parquet,
manifest.json. Reuses predict_evo2 via torchrun subprocess; aggregates
.pt files by seq_idx + pad_mask. Idempotent (skips predict if .pt
files exist).
src/ColoredSequence.jsx
React component: paste a DNA sequence -> each base background-colored
by its top-firing SAE feature, opacity scaled by activation strength.
Two modes: top-feature (default), single-feature lookup. Builds mock
activations internally when no `analysis` prop is supplied so the
component works standalone before the /analyze backend is wired.
Tableau-10 colorblind palette, hover tooltip with top-5 features,
legend sorted by per-color position count.
src/GeneUMAPView.jsx
Renders the 500-gene UMAP via canvas. Loads G.bin (raw float32),
genes_meta.json, feature_stats.json from public/gene_umap/. Click a
feature in the sidebar -> instant recolor by activation strength
(no recompute). Click Reorganize -> re-runs UMAP client-side with
feature-weighted vectors (umap-js, ~2-5s at N=500), animates the
transition with ease-in-out cubic. Hover shows gene metadata + top 5
firing features.
src/Preview.jsx + src/index.jsx
Tabbed entry at /#preview: "Main" (the existing dashboard, untouched),
"ColoredSequence", "Gene UMAP". Hash-gated so / still goes to the
unchanged production layout. The ColoredSequence tab includes a paste
textarea so users can drop their own sequences in.
public/gene_umap/
Precomputed bundle for the GeneUMAPView (G.bin 30 MB, plus small JSON
metadata + per-feature stats filtered to n_firing >= 10).
Dep change: umap-js for client-side reorganize. Generated genes are
synthetic; replace fake_genes.tsv with a real curated 500-gene list and
re-run the precompute when one is available.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent c36ed2f commit 736571d
9 files changed
Lines changed: 1613 additions & 2 deletions
File tree
- bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2
- evo2_dashboard_mockup
- public/gene_umap
- src
- scripts
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.
Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.
0 commit comments