[DRAFT] Evo 2 SAE feature explorer — visualization mockup by polinabinder1 · Pull Request #1582 · NVIDIA/bionemo-framework

polinabinder1 · 2026-05-26T23:51:04Z

⚠️ Mockup — synthetic data, not a real SAE result

This PR ships a demo-only visualization shell at bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/evo2_dashboard_mockup/. There is no real SAE inference involved. Everything you see in the dashboard is generated by scripts/make_mockup_features.py from a fixed seed.

A yellow MOCKUP — synthetic data, not from a real SAE run banner is rendered at the top of every page so nobody mistakes it for real model output.

The point of this v1 is to lock in the data contract that the future real eval pipeline will need to target. Schemas and column names below.

What's in the visualization

Three panels, in the order they appear:

1. UMAP scatter (left)

One point per SAE feature (20 features in this mockup: 11 labeled + 9 unlabeled).
X/Y are synthetic UMAP coords arranged into four visible blobs:
- Eukaryotic regulatory (top-left): TATA box, polyadenylation signal, CpG island, splice donor site, splice acceptor site.
- Bacterial regulatory (top-right): bacterial promoter -10 box, bacterial promoter -35 box, Shine-Dalgarno sequence.
- Codon context (bottom): start codon (ATG) context, stop codon (TAA) context, stop codon (TAG) context.
- Uninterpreted (center, diffuse): 9 unlabeled features — mimics the realistic case where most SAE features don't have known biology mapped to them yet.
Color-by default is cluster_id. Other columns also work: db_source, log_frequency, max_activation.

2. Histograms (top)

Two crossfilter histograms over log_frequency and max_activation.
Brushing either one filters the feature list and highlights the matching points on the UMAP.

3. Feature list (right)

Each card shows:

Feature ID and label (or Feature N for unlabeled).
Freq: fraction of tokens where this feature fires (synthetic, log-uniform between 0.001 and 0.1).
Max: peak activation observed (synthetic, 5–30).
Three v2 roadmap placeholders that render as greyed em-dashes with hover tooltips:
- Annotation — top annotation-database match (RefSeq / Rfam / JASPAR). Coming in v2.
- Sensitivity — recall against annotation database. Coming in v2.
- Recon Δ — reconstruction loss change from ablating this feature. Coming in v2.
Expand to reveal top activating sequences: 6 of 30 stored synthetic windows, each a 200bp string with per-base activation coloring (warmer = higher). The central ~20bp is the spliced motif for that feature (e.g., TATAAA for TATA box, AATAAA for polyA signal).

Clicking Full analysis opens a detail page with the same sequences (showing up to 30 examples), plus two more v2 placeholders:

Annotations — annotation overlay (RefSeq, Rfam, JASPAR) — coming in v2
Conservation — phyloP-style conservation track — coming in v2

Data contract (what the real eval pipeline will need to write)

Three parquet fixtures live in evo2_dashboard_mockup/public/. Their columns are the spec the future real eval pipeline must match.

`features_atlas.parquet` (one row per feature)

Column	Type	Meaning
`feature_id`	int	Unique SAE feature index
`x`, `y`	float	UMAP coordinates
`label`	string \| null	Biological-sounding feature name; null = uninterpreted
`db_source`	string \| null	Annotation database the label came from (RefSeq / JASPAR / etc.); null when unlabeled
`activation_freq`	float	Fraction of tokens where this feature fires
`log_frequency`	float	log10(activation_freq), pre-computed for histogram binning
`max_activation`	float	Peak activation observed
`cluster_id`	int	UMAP cluster assignment (used for color-by)

`feature_metadata.parquet`

Same schema as features_atlas.parquet in the mockup. Kept separate so the real eval pipeline can add metadata columns (e.g. sensitivity / faithfulness numbers) without touching the atlas table.

`feature_examples.parquet` (long-form, one row per top-activating example)

Column	Type	Meaning
`feature_id`	int	Which feature this example belongs to
`example_rank`	int	Rank among the feature's top activators (0 = highest)
`sequence_id`	string	Source sequence accession (e.g. `NC_000913.3`, `chr1`)
`start`, `end`	int	Genomic coordinates within `sequence_id`
`sequence`	string	200bp DNA window (raw `ACGT`)
`activations`	list[float]	Per-base activation values; length matches `sequence`
`max_activation`	float	Max of the activations array for this example
`max_activation_position`	int	Argmax within the 200bp window
`best_annotation`	string \| null	Optional per-example annotation; mirrors `db_source` in the mockup

How to run locally

cd bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/evo2_dashboard_mockup
npm install
npm run dev   # serves on localhost:5173 (or first free port above)

The three parquet fixtures are committed in public/ so npm run dev works without first running the Python generator. To regenerate them (e.g. with a different --n-unlabeled count or a different seed):

python ../scripts/make_mockup_features.py --n-unlabeled 9 --seed 42

Out of scope (deferred to follow-up PRs)

Real SAE inference / real activation pass
Annotation overlays (RefSeq / Rfam / JASPAR) — sensitivity / specificity scores
Reconstruction-faithfulness ablation column
Decomposability metric column
Auto-interp / LLM-generated feature descriptions
Conservation tracks (phyloP)
Strand handling, codon framing, annotation tracks, chromosome ideograms
External link-outs (UCSC, Ensembl)
sae.launch_dashboard() Python wiring / FastAPI / Lepton deployment

Test plan

npm install && npm run dev from evo2_dashboard_mockup/ opens a working dashboard at http://localhost:5173.
MOCKUP banner visible at top.
UMAP shows the four blobs described above; brushing a histogram crossfilters the feature list.
Clicking a labeled feature shows top sequences with the expected central motif (e.g. TATAAA for TATA box).
Unlabeled features render as Feature N and have no spliced motif in their sequences.
No console errors related to missing columns (cluster_id, etc.).

🤖 Generated with Claude Code

torch 2.6 changed the default of `weights_only` to True. The Savanna checkpoint pickle includes numpy globals (`numpy.core.multiarray._reconstruct`), which the safer loader rejects. The converter then exits 0 with no output written and the error gets buried in stderr — silent failure. The Savanna repos under arcinstitute/* are trusted sources, so load with weights_only=False. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors the existing esm2 / codonfm SAE recipes. Pipeline: chunk -> convert (Savanna->MBridge) -> predict_evo2 -> pt_to_parquet -> train Differences from esm2/codonfm are forced by Evo2 specifics: - Hyena/Megatron-Core model, no HF AutoModel path => reuses the existing `predict_evo2` CLI for inference instead of writing a custom extract.py - `pt_to_parquet.py` shim bridges predict_evo2's .pt output to the universal `sae.activation_store` parquet contract - `chunk_fasta.py` preprocessor keeps inputs within the model's trained context length (8192 bp for 1B); Hyena fftconv OOMs on long sequences even at micro-batch=1 - `train.py` is the same as codonfm's, copied verbatim per bionemo-recipes' KISS-over-DRY convention Validated end-to-end on 100 organelle sequences (Evo2 1B layer 12): loss 0.67 -> 0.045, FVU 0.90 -> 0.10, var_exp 0.10 -> 0.90, 2m14s wall. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The recipe currently has no model-specific Python module — the extractor is upstream (`predict_evo2`) and the two scripts are simple CLIs in scripts/. Drop the empty package and adjust pyproject.toml so setuptools doesn't try to discover anything. Will reintroduce when there's actual library code to put there (eval, dashboard, dataloaders). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fork of recipes/codonfm/codon_dashboard adapted for DNA + Evo 2, populated with synthetic data. Demo-able artifact, not a real result. What's here: - scripts/make_mockup_features.py: deterministic synthetic data generator (seed 42). Writes features_atlas.parquet, feature_metadata.parquet, feature_examples.parquet to evo2_dashboard_mockup/public/. Fixtures are committed for one-step npm-only setup. - evo2_dashboard_mockup/: Vite/React SPA forked from codon_dashboard with these swaps: * Removed molstar dep + MolstarThumbnail.jsx * Renamed ProteinSequence.jsx -> SequenceView.jsx; per-base rendering (no codon framing, no AA translation) * Renamed ProteinDetailModal.jsx -> RegionDetailModal.jsx; UniProt content swapped for genomic-region content * utils.js: getRegionLabel + parseBases (replacing getAccession/uniprotUrl/parseCodons/codonToAA) * MOCKUP banner at top of App * "Evo 2 SAE Feature Explorer (Mockup)" title - v2 roadmap placeholders (greyed em-dashes with hover tooltips): * FeatureCard: Annotation, Sensitivity, Recon Δ stats * FeatureDetailPage: Annotations, Conservation sections Quick start: cd evo2_dashboard_mockup && npm install && npm run dev The synthetic data schema is the contract the future real eval pipeline will need to target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ed features Three changes on top of the initial mockup commit: 1. Drop codonfm-specific scaffolding from forked components. - .gitignore the auto-generated package-lock.json (regenerates on `npm install`) - FeatureCard.jsx: 793 -> 508 lines. Removed dead stat tiles (Hi-Score, Variant/Site/Local deltas, ClinVar, PhyloP, GC, Trinuc/Gene entropy), codonfm vocab-logits chart, codonfm GSEA tags, codonfm CSV export sections — all conditional on fields our synthetic data doesn't provide. - FeatureDetailPage.jsx: 522 -> 187 lines. Replaced codonfm-specific VocabLogitChart / CodonAnnotations / FeatureMetrics components with a simpler DNA-friendly detail view. 2. Refine the synthetic feature set. - 11 labeled DNA-native features in 3 thematic UMAP clusters: * eukaryotic regulatory (TATA box, polyA signal, CpG island, splice donor, splice acceptor) * bacterial regulatory (-10 box, -35 box, Shine-Dalgarno) * codon context (start ATG, stop TAA, stop TAG) - 9 unlabeled features in a 4th diffuse cluster (label=NULL, db_source=NULL) — mimics the realistic case where most SAE features are uninterpreted. - New `db_source` column on each feature (RefSeq / JASPAR-ENCODE / bacterial annotation / RefSeq UTR / ENCODE-RefSeq / NULL). 3. Bug fixes for cross-pod port-forward demo: - App.jsx defaults: `selectedCategory` and `histMetric3` were hardcoded to codonfm's `mean_variant_1bcdwt` column, which doesn't exist in our atlas and threw Binder errors. Switched to `cluster_id`. - Atlas column rename: `cluster` -> `cluster_id` to match what App.jsx queries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

copy-pr-bot · 2026-05-26T23:51:07Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-26T23:51:11Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 20ba5ab3-3f5d-4e13-b71f-7796ecde5427

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

polinabinder1 and others added 6 commits May 21, 2026 00:42

Merge branch 'main' into evo2-sae-recipe

2760eed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] Evo 2 SAE feature explorer — visualization mockup#1582

[DRAFT] Evo 2 SAE feature explorer — visualization mockup#1582
polinabinder1 wants to merge 6 commits into
NVIDIA:mainfrom
polinabinder1:evo2-sae-dashboard

polinabinder1 commented May 26, 2026

Uh oh!

copy-pr-bot Bot commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

polinabinder1 commented May 26, 2026

⚠️ Mockup — synthetic data, not a real SAE result

What's in the visualization

1. UMAP scatter (left)

2. Histograms (top)

3. Feature list (right)

Data contract (what the real eval pipeline will need to write)

features_atlas.parquet (one row per feature)

feature_metadata.parquet

feature_examples.parquet (long-form, one row per top-activating example)

How to run locally

Out of scope (deferred to follow-up PRs)

Test plan

Uh oh!

copy-pr-bot Bot commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`features_atlas.parquet` (one row per feature)

`feature_metadata.parquet`

`feature_examples.parquet` (long-form, one row per top-activating example)