Skip to content

Commit 736571d

Browse files
polinabinder1claude
andcommitted
evo2 dashboard: ColoredSequence + GeneUMAPView + 500-gene precompute
Two new visualizations for the SAE interpretability dashboard, plus the offline pipeline that produces the gene-UMAP precompute bundle. scripts/generate_fake_genes.py 500-row genes.tsv stand-in (gene_symbol, species, sequence) until a real curated catalog lands. Realistic-ish distributions across 7 species. scripts/gene_umap_precompute.py End-to-end offline pipeline: genes.tsv -> Evo2 1B layer-20 -> TopK SAE encode -> mean per gene -> UMAP (cosine) -> HDBSCAN clusters -> per-feature firing stats. Writes G.npz, genes_umap.parquet, feature_stats.parquet, manifest.json. Reuses predict_evo2 via torchrun subprocess; aggregates .pt files by seq_idx + pad_mask. Idempotent (skips predict if .pt files exist). src/ColoredSequence.jsx React component: paste a DNA sequence -> each base background-colored by its top-firing SAE feature, opacity scaled by activation strength. Two modes: top-feature (default), single-feature lookup. Builds mock activations internally when no `analysis` prop is supplied so the component works standalone before the /analyze backend is wired. Tableau-10 colorblind palette, hover tooltip with top-5 features, legend sorted by per-color position count. src/GeneUMAPView.jsx Renders the 500-gene UMAP via canvas. Loads G.bin (raw float32), genes_meta.json, feature_stats.json from public/gene_umap/. Click a feature in the sidebar -> instant recolor by activation strength (no recompute). Click Reorganize -> re-runs UMAP client-side with feature-weighted vectors (umap-js, ~2-5s at N=500), animates the transition with ease-in-out cubic. Hover shows gene metadata + top 5 firing features. src/Preview.jsx + src/index.jsx Tabbed entry at /#preview: "Main" (the existing dashboard, untouched), "ColoredSequence", "Gene UMAP". Hash-gated so / still goes to the unchanged production layout. The ColoredSequence tab includes a paste textarea so users can drop their own sequences in. public/gene_umap/ Precomputed bundle for the GeneUMAPView (G.bin 30 MB, plus small JSON metadata + per-feature stats filtered to n_firing >= 10). Dep change: umap-js for client-side reorganize. Generated genes are synthetic; replace fake_genes.tsv with a real curated 500-gene list and re-run the precompute when one is available. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c36ed2f commit 736571d

9 files changed

Lines changed: 1613 additions & 2 deletions

File tree

bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/evo2_dashboard_mockup/package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515
"embedding-atlas": "^0.16.1",
1616
"lucide-react": "^0.577.0",
1717
"react": "^18.2.0",
18-
"react-dom": "^18.2.0"
18+
"react-dom": "^18.2.0",
19+
"umap-js": "^1.4.0"
1920
},
2021
"devDependencies": {
2122
"@vitejs/plugin-react": "^4.2.0",

bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/evo2_dashboard_mockup/public/gene_umap/feature_stats.json

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/evo2_dashboard_mockup/public/gene_umap/genes_meta.json

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)