[DRAFT] Evo 2 SAE feature explorer — visualization mockup#1582
Draft
polinabinder1 wants to merge 6 commits into
Draft
[DRAFT] Evo 2 SAE feature explorer — visualization mockup#1582polinabinder1 wants to merge 6 commits into
polinabinder1 wants to merge 6 commits into
Conversation
torch 2.6 changed the default of `weights_only` to True. The Savanna checkpoint pickle includes numpy globals (`numpy.core.multiarray._reconstruct`), which the safer loader rejects. The converter then exits 0 with no output written and the error gets buried in stderr — silent failure. The Savanna repos under arcinstitute/* are trusted sources, so load with weights_only=False. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the existing esm2 / codonfm SAE recipes. Pipeline:
chunk -> convert (Savanna->MBridge) -> predict_evo2 -> pt_to_parquet -> train
Differences from esm2/codonfm are forced by Evo2 specifics:
- Hyena/Megatron-Core model, no HF AutoModel path => reuses the
existing `predict_evo2` CLI for inference instead of writing
a custom extract.py
- `pt_to_parquet.py` shim bridges predict_evo2's .pt output to
the universal `sae.activation_store` parquet contract
- `chunk_fasta.py` preprocessor keeps inputs within the model's
trained context length (8192 bp for 1B); Hyena fftconv OOMs
on long sequences even at micro-batch=1
- `train.py` is the same as codonfm's, copied verbatim per
bionemo-recipes' KISS-over-DRY convention
Validated end-to-end on 100 organelle sequences (Evo2 1B layer 12):
loss 0.67 -> 0.045, FVU 0.90 -> 0.10, var_exp 0.10 -> 0.90, 2m14s wall.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The recipe currently has no model-specific Python module — the extractor is upstream (`predict_evo2`) and the two scripts are simple CLIs in scripts/. Drop the empty package and adjust pyproject.toml so setuptools doesn't try to discover anything. Will reintroduce when there's actual library code to put there (eval, dashboard, dataloaders). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fork of recipes/codonfm/codon_dashboard adapted for DNA + Evo 2,
populated with synthetic data. Demo-able artifact, not a real result.
What's here:
- scripts/make_mockup_features.py: deterministic synthetic data generator
(seed 42). Writes features_atlas.parquet, feature_metadata.parquet,
feature_examples.parquet to evo2_dashboard_mockup/public/. Fixtures
are committed for one-step npm-only setup.
- evo2_dashboard_mockup/: Vite/React SPA forked from codon_dashboard
with these swaps:
* Removed molstar dep + MolstarThumbnail.jsx
* Renamed ProteinSequence.jsx -> SequenceView.jsx; per-base
rendering (no codon framing, no AA translation)
* Renamed ProteinDetailModal.jsx -> RegionDetailModal.jsx;
UniProt content swapped for genomic-region content
* utils.js: getRegionLabel + parseBases (replacing
getAccession/uniprotUrl/parseCodons/codonToAA)
* MOCKUP banner at top of App
* "Evo 2 SAE Feature Explorer (Mockup)" title
- v2 roadmap placeholders (greyed em-dashes with hover tooltips):
* FeatureCard: Annotation, Sensitivity, Recon Δ stats
* FeatureDetailPage: Annotations, Conservation sections
Quick start: cd evo2_dashboard_mockup && npm install && npm run dev
The synthetic data schema is the contract the future real eval pipeline
will need to target.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed features
Three changes on top of the initial mockup commit:
1. Drop codonfm-specific scaffolding from forked components.
- .gitignore the auto-generated package-lock.json (regenerates on `npm install`)
- FeatureCard.jsx: 793 -> 508 lines. Removed dead stat tiles (Hi-Score,
Variant/Site/Local deltas, ClinVar, PhyloP, GC, Trinuc/Gene entropy),
codonfm vocab-logits chart, codonfm GSEA tags, codonfm CSV export
sections — all conditional on fields our synthetic data doesn't provide.
- FeatureDetailPage.jsx: 522 -> 187 lines. Replaced codonfm-specific
VocabLogitChart / CodonAnnotations / FeatureMetrics components with a
simpler DNA-friendly detail view.
2. Refine the synthetic feature set.
- 11 labeled DNA-native features in 3 thematic UMAP clusters:
* eukaryotic regulatory (TATA box, polyA signal, CpG island,
splice donor, splice acceptor)
* bacterial regulatory (-10 box, -35 box, Shine-Dalgarno)
* codon context (start ATG, stop TAA, stop TAG)
- 9 unlabeled features in a 4th diffuse cluster (label=NULL,
db_source=NULL) — mimics the realistic case where most SAE
features are uninterpreted.
- New `db_source` column on each feature (RefSeq / JASPAR-ENCODE /
bacterial annotation / RefSeq UTR / ENCODE-RefSeq / NULL).
3. Bug fixes for cross-pod port-forward demo:
- App.jsx defaults: `selectedCategory` and `histMetric3` were
hardcoded to codonfm's `mean_variant_1bcdwt` column, which doesn't
exist in our atlas and threw Binder errors. Switched to `cluster_id`.
- Atlas column rename: `cluster` -> `cluster_id` to match what
App.jsx queries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR ships a demo-only visualization shell at
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/evo2_dashboard_mockup/. There is no real SAE inference involved. Everything you see in the dashboard is generated byscripts/make_mockup_features.pyfrom a fixed seed.A yellow
MOCKUP — synthetic data, not from a real SAE runbanner is rendered at the top of every page so nobody mistakes it for real model output.The point of this v1 is to lock in the data contract that the future real eval pipeline will need to target. Schemas and column names below.
What's in the visualization
Three panels, in the order they appear:
1. UMAP scatter (left)
cluster_id. Other columns also work:db_source,log_frequency,max_activation.2. Histograms (top)
log_frequencyandmax_activation.3. Feature list (right)
Each card shows:
Feature Nfor unlabeled).Freq: fraction of tokens where this feature fires (synthetic, log-uniform between 0.001 and 0.1).Max: peak activation observed (synthetic, 5–30).Annotation— top annotation-database match (RefSeq / Rfam / JASPAR). Coming in v2.Sensitivity— recall against annotation database. Coming in v2.Recon Δ— reconstruction loss change from ablating this feature. Coming in v2.TATAAAfor TATA box,AATAAAfor polyA signal).Clicking Full analysis opens a detail page with the same sequences (showing up to 30 examples), plus two more v2 placeholders:
Annotations— annotation overlay (RefSeq, Rfam, JASPAR) — coming in v2Conservation— phyloP-style conservation track — coming in v2Data contract (what the real eval pipeline will need to write)
Three parquet fixtures live in
evo2_dashboard_mockup/public/. Their columns are the spec the future real eval pipeline must match.features_atlas.parquet(one row per feature)feature_idx,ylabeldb_sourceactivation_freqlog_frequencymax_activationcluster_idfeature_metadata.parquetSame schema as
features_atlas.parquetin the mockup. Kept separate so the real eval pipeline can add metadata columns (e.g. sensitivity / faithfulness numbers) without touching the atlas table.feature_examples.parquet(long-form, one row per top-activating example)feature_idexample_ranksequence_idNC_000913.3,chr1)start,endsequence_idsequenceACGT)activationssequencemax_activationmax_activation_positionbest_annotationdb_sourcein the mockupHow to run locally
The three parquet fixtures are committed in
public/sonpm run devworks without first running the Python generator. To regenerate them (e.g. with a different--n-unlabeledcount or a different seed):Out of scope (deferred to follow-up PRs)
sae.launch_dashboard()Python wiring / FastAPI / Lepton deploymentTest plan
npm install && npm run devfromevo2_dashboard_mockup/opens a working dashboard athttp://localhost:5173.TATAAAfor TATA box).Feature Nand have no spliced motif in their sequences.cluster_id, etc.).🤖 Generated with Claude Code