Skip to content

[DRAFT] Evo 2 SAE feature explorer — visualization mockup#1582

Draft
polinabinder1 wants to merge 6 commits into
NVIDIA:mainfrom
polinabinder1:evo2-sae-dashboard
Draft

[DRAFT] Evo 2 SAE feature explorer — visualization mockup#1582
polinabinder1 wants to merge 6 commits into
NVIDIA:mainfrom
polinabinder1:evo2-sae-dashboard

Conversation

@polinabinder1
Copy link
Copy Markdown
Collaborator

⚠️ Mockup — synthetic data, not a real SAE result

This PR ships a demo-only visualization shell at bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/evo2_dashboard_mockup/. There is no real SAE inference involved. Everything you see in the dashboard is generated by scripts/make_mockup_features.py from a fixed seed.

A yellow MOCKUP — synthetic data, not from a real SAE run banner is rendered at the top of every page so nobody mistakes it for real model output.

The point of this v1 is to lock in the data contract that the future real eval pipeline will need to target. Schemas and column names below.

What's in the visualization

Three panels, in the order they appear:

1. UMAP scatter (left)

  • One point per SAE feature (20 features in this mockup: 11 labeled + 9 unlabeled).
  • X/Y are synthetic UMAP coords arranged into four visible blobs:
    • Eukaryotic regulatory (top-left): TATA box, polyadenylation signal, CpG island, splice donor site, splice acceptor site.
    • Bacterial regulatory (top-right): bacterial promoter -10 box, bacterial promoter -35 box, Shine-Dalgarno sequence.
    • Codon context (bottom): start codon (ATG) context, stop codon (TAA) context, stop codon (TAG) context.
    • Uninterpreted (center, diffuse): 9 unlabeled features — mimics the realistic case where most SAE features don't have known biology mapped to them yet.
  • Color-by default is cluster_id. Other columns also work: db_source, log_frequency, max_activation.

2. Histograms (top)

  • Two crossfilter histograms over log_frequency and max_activation.
  • Brushing either one filters the feature list and highlights the matching points on the UMAP.

3. Feature list (right)

Each card shows:

  • Feature ID and label (or Feature N for unlabeled).
  • Freq: fraction of tokens where this feature fires (synthetic, log-uniform between 0.001 and 0.1).
  • Max: peak activation observed (synthetic, 5–30).
  • Three v2 roadmap placeholders that render as greyed em-dashes with hover tooltips:
    • Annotation — top annotation-database match (RefSeq / Rfam / JASPAR). Coming in v2.
    • Sensitivity — recall against annotation database. Coming in v2.
    • Recon Δ — reconstruction loss change from ablating this feature. Coming in v2.
  • Expand to reveal top activating sequences: 6 of 30 stored synthetic windows, each a 200bp string with per-base activation coloring (warmer = higher). The central ~20bp is the spliced motif for that feature (e.g., TATAAA for TATA box, AATAAA for polyA signal).

Clicking Full analysis opens a detail page with the same sequences (showing up to 30 examples), plus two more v2 placeholders:

  • Annotations — annotation overlay (RefSeq, Rfam, JASPAR) — coming in v2
  • Conservation — phyloP-style conservation track — coming in v2

Data contract (what the real eval pipeline will need to write)

Three parquet fixtures live in evo2_dashboard_mockup/public/. Their columns are the spec the future real eval pipeline must match.

features_atlas.parquet (one row per feature)

Column Type Meaning
feature_id int Unique SAE feature index
x, y float UMAP coordinates
label string | null Biological-sounding feature name; null = uninterpreted
db_source string | null Annotation database the label came from (RefSeq / JASPAR / etc.); null when unlabeled
activation_freq float Fraction of tokens where this feature fires
log_frequency float log10(activation_freq), pre-computed for histogram binning
max_activation float Peak activation observed
cluster_id int UMAP cluster assignment (used for color-by)

feature_metadata.parquet

Same schema as features_atlas.parquet in the mockup. Kept separate so the real eval pipeline can add metadata columns (e.g. sensitivity / faithfulness numbers) without touching the atlas table.

feature_examples.parquet (long-form, one row per top-activating example)

Column Type Meaning
feature_id int Which feature this example belongs to
example_rank int Rank among the feature's top activators (0 = highest)
sequence_id string Source sequence accession (e.g. NC_000913.3, chr1)
start, end int Genomic coordinates within sequence_id
sequence string 200bp DNA window (raw ACGT)
activations list[float] Per-base activation values; length matches sequence
max_activation float Max of the activations array for this example
max_activation_position int Argmax within the 200bp window
best_annotation string | null Optional per-example annotation; mirrors db_source in the mockup

How to run locally

cd bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/evo2_dashboard_mockup
npm install
npm run dev   # serves on localhost:5173 (or first free port above)

The three parquet fixtures are committed in public/ so npm run dev works without first running the Python generator. To regenerate them (e.g. with a different --n-unlabeled count or a different seed):

python ../scripts/make_mockup_features.py --n-unlabeled 9 --seed 42

Out of scope (deferred to follow-up PRs)

  • Real SAE inference / real activation pass
  • Annotation overlays (RefSeq / Rfam / JASPAR) — sensitivity / specificity scores
  • Reconstruction-faithfulness ablation column
  • Decomposability metric column
  • Auto-interp / LLM-generated feature descriptions
  • Conservation tracks (phyloP)
  • Strand handling, codon framing, annotation tracks, chromosome ideograms
  • External link-outs (UCSC, Ensembl)
  • sae.launch_dashboard() Python wiring / FastAPI / Lepton deployment

Test plan

  • npm install && npm run dev from evo2_dashboard_mockup/ opens a working dashboard at http://localhost:5173.
  • MOCKUP banner visible at top.
  • UMAP shows the four blobs described above; brushing a histogram crossfilters the feature list.
  • Clicking a labeled feature shows top sequences with the expected central motif (e.g. TATAAA for TATA box).
  • Unlabeled features render as Feature N and have no spliced motif in their sequences.
  • No console errors related to missing columns (cluster_id, etc.).

🤖 Generated with Claude Code

polinabinder1 and others added 6 commits May 21, 2026 00:42
torch 2.6 changed the default of `weights_only` to True. The Savanna
checkpoint pickle includes numpy globals (`numpy.core.multiarray._reconstruct`),
which the safer loader rejects. The converter then exits 0 with no output
written and the error gets buried in stderr — silent failure.

The Savanna repos under arcinstitute/* are trusted sources, so load with
weights_only=False.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the existing esm2 / codonfm SAE recipes. Pipeline:

  chunk -> convert (Savanna->MBridge) -> predict_evo2 -> pt_to_parquet -> train

Differences from esm2/codonfm are forced by Evo2 specifics:
  - Hyena/Megatron-Core model, no HF AutoModel path => reuses the
    existing `predict_evo2` CLI for inference instead of writing
    a custom extract.py
  - `pt_to_parquet.py` shim bridges predict_evo2's .pt output to
    the universal `sae.activation_store` parquet contract
  - `chunk_fasta.py` preprocessor keeps inputs within the model's
    trained context length (8192 bp for 1B); Hyena fftconv OOMs
    on long sequences even at micro-batch=1
  - `train.py` is the same as codonfm's, copied verbatim per
    bionemo-recipes' KISS-over-DRY convention

Validated end-to-end on 100 organelle sequences (Evo2 1B layer 12):
loss 0.67 -> 0.045, FVU 0.90 -> 0.10, var_exp 0.10 -> 0.90, 2m14s wall.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The recipe currently has no model-specific Python module — the extractor
is upstream (`predict_evo2`) and the two scripts are simple CLIs in
scripts/. Drop the empty package and adjust pyproject.toml so setuptools
doesn't try to discover anything. Will reintroduce when there's actual
library code to put there (eval, dashboard, dataloaders).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fork of recipes/codonfm/codon_dashboard adapted for DNA + Evo 2,
populated with synthetic data. Demo-able artifact, not a real result.

What's here:
  - scripts/make_mockup_features.py: deterministic synthetic data generator
    (seed 42). Writes features_atlas.parquet, feature_metadata.parquet,
    feature_examples.parquet to evo2_dashboard_mockup/public/. Fixtures
    are committed for one-step npm-only setup.
  - evo2_dashboard_mockup/: Vite/React SPA forked from codon_dashboard
    with these swaps:
      * Removed molstar dep + MolstarThumbnail.jsx
      * Renamed ProteinSequence.jsx -> SequenceView.jsx; per-base
        rendering (no codon framing, no AA translation)
      * Renamed ProteinDetailModal.jsx -> RegionDetailModal.jsx;
        UniProt content swapped for genomic-region content
      * utils.js: getRegionLabel + parseBases (replacing
        getAccession/uniprotUrl/parseCodons/codonToAA)
      * MOCKUP banner at top of App
      * "Evo 2 SAE Feature Explorer (Mockup)" title
  - v2 roadmap placeholders (greyed em-dashes with hover tooltips):
      * FeatureCard: Annotation, Sensitivity, Recon Δ stats
      * FeatureDetailPage: Annotations, Conservation sections

Quick start: cd evo2_dashboard_mockup && npm install && npm run dev

The synthetic data schema is the contract the future real eval pipeline
will need to target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed features

Three changes on top of the initial mockup commit:

1. Drop codonfm-specific scaffolding from forked components.
   - .gitignore the auto-generated package-lock.json (regenerates on `npm install`)
   - FeatureCard.jsx: 793 -> 508 lines. Removed dead stat tiles (Hi-Score,
     Variant/Site/Local deltas, ClinVar, PhyloP, GC, Trinuc/Gene entropy),
     codonfm vocab-logits chart, codonfm GSEA tags, codonfm CSV export
     sections — all conditional on fields our synthetic data doesn't provide.
   - FeatureDetailPage.jsx: 522 -> 187 lines. Replaced codonfm-specific
     VocabLogitChart / CodonAnnotations / FeatureMetrics components with a
     simpler DNA-friendly detail view.

2. Refine the synthetic feature set.
   - 11 labeled DNA-native features in 3 thematic UMAP clusters:
     * eukaryotic regulatory (TATA box, polyA signal, CpG island,
       splice donor, splice acceptor)
     * bacterial regulatory (-10 box, -35 box, Shine-Dalgarno)
     * codon context (start ATG, stop TAA, stop TAG)
   - 9 unlabeled features in a 4th diffuse cluster (label=NULL,
     db_source=NULL) — mimics the realistic case where most SAE
     features are uninterpreted.
   - New `db_source` column on each feature (RefSeq / JASPAR-ENCODE /
     bacterial annotation / RefSeq UTR / ENCODE-RefSeq / NULL).

3. Bug fixes for cross-pod port-forward demo:
   - App.jsx defaults: `selectedCategory` and `histMetric3` were
     hardcoded to codonfm's `mean_variant_1bcdwt` column, which doesn't
     exist in our atlas and threw Binder errors. Switched to `cluster_id`.
   - Atlas column rename: `cluster` -> `cluster_id` to match what
     App.jsx queries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 20ba5ab3-3f5d-4e13-b71f-7796ecde5427

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant