Skip to content

Add Evo2 1B SAE recipe + fix Savanna weights_only loading#1579

Open
polinabinder1 wants to merge 4 commits into
NVIDIA-BioNeMo:mainfrom
polinabinder1:evo2-sae-recipe
Open

Add Evo2 1B SAE recipe + fix Savanna weights_only loading#1579
polinabinder1 wants to merge 4 commits into
NVIDIA-BioNeMo:mainfrom
polinabinder1:evo2-sae-recipe

Conversation

@polinabinder1
Copy link
Copy Markdown
Collaborator

Evo2 1B SAE — working on Lepton

TL;DR: New SAE recipe at bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/. ~200 lines original code + one-line bug fix in the evo2_megatron converter.

Pipeline: chunk → convert (Savanna→MBridge) → predict_evo2 → pt_to_parquet shim → train. Longer than esm2's extract → train because Evo2 is a Hyena model in Megatron-Core (no HF AutoModel path), ships as Savanna checkpoints needing conversion, and takes unbounded-length genomes that have to be chunked. The shim and chunker are the only model-specific code; train.py is reused verbatim from codonfm.

Results on 100 organelle sequences (558 chunks at 8192 bp, Evo2 1B layer 12): SAE trained in 2m 14s on one H100. Loss 0.67 → 0.045 (15× reduction). FVU 0.90 → 0.10, variance explained 0.10 → 0.90, monotonic. Dead latents 5.4% at end (normal range; auxk revival is working). Encoder/decoder shapes [1920 ↔ 15360].

Three gotchas worth flagging:

  1. weights_only=True (torch ≥ 2.6) silently kills Savanna HF checkpoint loads — exit 0, empty dir. One-line patch at savanna_to_mbridge.py:138.
  2. Hyena fftconv OOMs on long sequences even at micro-batch=1. Chunk inputs to the model's trained context (8192 bp for 1B). Bonus: ~17× per-batch speedup.
  3. "0% dead latents" is a stuck reading until you cross dead_tokens_threshold (10M tokens) — smoke tests always show 0%. Trust it only after the window fills.

Reusable for the next model: three-stage pattern (extractor → ActivationStore parquet → universal train.py) holds for esm2, codonfm, and evo2. Only the extractor changes per model.

Test plan

  • Smoke test: FASTA=<small.fasta> bash scripts/1b.sh should run end-to-end and produce checkpoint_final.pt with the expected encoder/decoder shapes
  • Loss curve sanity: training log should show loss decreasing and var_exp increasing monotonically (up to the auxk revival event)
  • Confirm the weights_only=False patch resolves the converter failure on torch ≥ 2.6

🤖 Generated with Claude Code

polinabinder1 and others added 2 commits May 21, 2026 00:42
torch 2.6 changed the default of `weights_only` to True. The Savanna
checkpoint pickle includes numpy globals (`numpy.core.multiarray._reconstruct`),
which the safer loader rejects. The converter then exits 0 with no output
written and the error gets buried in stderr — silent failure.

The Savanna repos under arcinstitute/* are trusted sources, so load with
weights_only=False.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the existing esm2 / codonfm SAE recipes. Pipeline:

  chunk -> convert (Savanna->MBridge) -> predict_evo2 -> pt_to_parquet -> train

Differences from esm2/codonfm are forced by Evo2 specifics:
  - Hyena/Megatron-Core model, no HF AutoModel path => reuses the
    existing `predict_evo2` CLI for inference instead of writing
    a custom extract.py
  - `pt_to_parquet.py` shim bridges predict_evo2's .pt output to
    the universal `sae.activation_store` parquet contract
  - `chunk_fasta.py` preprocessor keeps inputs within the model's
    trained context length (8192 bp for 1B); Hyena fftconv OOMs
    on long sequences even at micro-batch=1
  - `train.py` is the same as codonfm's, copied verbatim per
    bionemo-recipes' KISS-over-DRY convention

Validated end-to-end on 100 organelle sequences (Evo2 1B layer 12):
loss 0.67 -> 0.045, FVU 0.90 -> 0.10, var_exp 0.10 -> 0.90, 2m14s wall.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ac737204-c72a-44e8-8dd1-e42014785fc9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@polinabinder1 polinabinder1 marked this pull request as draft May 21, 2026 20:07
polinabinder1 and others added 2 commits May 26, 2026 21:14
The recipe currently has no model-specific Python module — the extractor
is upstream (`predict_evo2`) and the two scripts are simple CLIs in
scripts/. Drop the empty package and adjust pyproject.toml so setuptools
doesn't try to discover anything. Will reintroduce when there's actual
library code to put there (eval, dashboard, dataloaders).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant