Skip to content

evo2 SAE steering analysis: dose-response / selectivity harness + tested metrics#1635

Open
polinabinder1 wants to merge 5 commits into
pbinder/evo2-sae-servefrom
pbinder/evo2-steering-analysis
Open

evo2 SAE steering analysis: dose-response / selectivity harness + tested metrics#1635
polinabinder1 wants to merge 5 commits into
pbinder/evo2-sae-servefrom
pbinder/evo2-steering-analysis

Conversation

@polinabinder1

@polinabinder1 polinabinder1 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

Evo2 SAE steering analysis — a dose-response + selectivity harness over the decode-only steering hook, with pure, CPU-tested metrics. Now lives in evo2_sae.eval.steering (alongside evo2_sae.eval.probing).

Stacked on #1622 (uses the Evo2SAE engine + its sae.steering clamp hook).

Contents — evo2_sae.eval.steering

  • pure metrics: common_prefix_len, edit_distance, divergence, dose_response, selectivity — string math, no model.
  • engine-driven: pick_target (top labeled feature on a sequence) and run_steering (sweep clamp strengths on a target + control features → dose-response + selectivity).
  • scripts/steer.py — thin CLI wrapper (builds an Evo2SAE, runs the sweep); imports the harness from evo2_sae.eval.steering.

Design decisions (the non-trivial ones)

  • Edit distance, not positional Hamming. Steering decodes greedily (temperature=0), so a single early insertion/shift makes every downstream position differ — Hamming would saturate to ~1.0 from one change and report a fixed sweep as maximally divergent. divergence uses normalized edit distance (doesn't saturate on a shift) plus first_divergence (shared-prefix length), so the metric tracks the real magnitude of the effect.
  • The engine is injected, not imported. pick_target/run_steering take the engine as an argument, so the whole harness is torch-free and runs against a fake in tests — no GPU, no checkpoints, no evo2_sae.core import. (The fake is intentionally local to the test file, not a conftest fixture, so it doesn't collide with evo2 SAE serve: FastAPI server + CLI (on the engine) #1637's GPU fixtures.)
  • Drives the production Evo2SAE.generate path (decode-only clamp) — the analysis can't drift from how steering actually behaves in the server. It also forces greedy/deterministic decode (temperature=0, top_k=1), so strength-0 reproduces the baseline exactly and dose-response isn't polluted by sampling noise.
  • max_clamp mirrors core._sanitize_steering — requested strengths beyond MAX_CLAMP_STRENGTH are capped, and the result surfaces the capped values, so the sweep matches what the engine would actually apply.
  • pick_target prefers the top labeled feature (the interpretable target) while still surfacing the top-active feature in the returned table — so steering a meaningful concept is the default, not whatever fired hardest.

How to run

cd interpretability/sparse_autoencoders/recipes/evo2
bash .ci_build.sh && source .ci_test_env.sh        # or: PYTHONPATH=src:../../sae/src
pytest tests/test_steer_analysis.py
# CLI: python scripts/steer.py --evo2-ckpt-dir <mbridge> --sae-checkpoint <sae.pt> --layer 26 --sequence ATGC...

No dedicated CI lane (deferred — see #1622; CI should fold into the repo-wide recipe lane later).

Tests

9 passed (CPU, no model): the string metrics (divergence / dose_response / selectivity / edit_distance) against hand-computed references, plus pick_target / run_steering driven against an injected torch-free fake engine. No GPU tests (the real-engine steering path is covered by test_steering.py in #1622).

@copy-pr-bot

copy-pr-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9fb4b67b-73c8-44fc-ade3-79c57fe73dcd

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces a complete SAE steering harness for Evo2: a pure metrics module for quantifying feature-driven text changes, a CLI script that orchestrates encoding, generation sweeps, and causal analysis, and corresponding unit tests. The workflow measures dose-response and selectivity of target feature clamping against baseline and control features.

Changes

SAE Steering Harness Implementation

Layer / File(s) Summary
Steering metrics module
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/steer_analysis.py
Pure functions divergence, dose_response, and selectivity compute character-level divergence between text strings, quantify per-strength dose-response curves, and measure selectivity of target features against control features at matching clamp strengths.
Steering harness script and orchestration
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/steer.py
CLI-driven harness that loads Evo2SAE, encodes DNA sequences, auto-selects or accepts target features, generates baseline and clamped continuations, invokes metrics to measure dose-response and selectivity, and optionally writes results to JSON.
Metrics test coverage
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/tests/test_steer_analysis.py
Three unit tests verify divergence correctness, dose-response sorting and monotonicity, and selectivity ratio behavior under target vs control activation patterns.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A steering harness takes the stage,
With clamps and metrics on each page,
We measure what each feature sways,
In DNA through many ways. 🧬✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: adding tested steering analysis metrics (dose-response/selectivity) and a harness for Evo2 SAE steering.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description is detailed and covers summary, usage, design choices, and tests, though it does not fully mirror the repository template.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pbinder/evo2-steering-analysis

Comment @coderabbitai help to get the list of available commands.

@polinabinder1

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/tests/test_steer_analysis.py (1)

27-50: ⚡ Quick win

Consider adding edge case tests.

The current tests cover happy paths well. Consider adding tests for edge cases such as:

  • Empty strings in divergence (e.g., divergence("", "") or divergence("A", ""))
  • Empty control_steered dict in selectivity (verifies the edge case flagged in steer_analysis.py)
  • dose_response with empty steered_by_strength dict
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/tests/test_steer_analysis.py`
around lines 27 - 50, Add unit tests covering edge cases: call divergence with
empty strings and with one empty vs non-empty (e.g., divergence("", "") and
divergence("A", "")) and assert expected behavior; add a test for selectivity
that passes an empty control_steered dict to ensure the function handles/raises
as intended; and add a test for dose_response with an empty steered_by_strength
dict to verify it returns an empty/appropriate response. Place these new tests
in the existing test_steer_analysis.py alongside the other tests and reference
the same functions divergence, selectivity, and dose_response so edge behaviors
are validated.
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/steer_analysis.py (1)

33-68: ⚡ Quick win

Expand docstrings to Google-style format.

The coding guidelines require Google-style docstrings with Args, Returns, and (if applicable) Raises sections. The current brief docstrings are functional but could be more explicit about parameter types, return structure, and edge cases.

📚 Example expansion for `divergence`
def divergence(a: str, b: str) -> tuple[int, float]:
    """Return divergence metrics between two strings over their shared prefix.

    Args:
        a: First string to compare.
        b: Second string to compare.

    Returns:
        A tuple of (first_differing_index, fraction_differing) where:
            - first_differing_index is the index of the first character that differs,
              or the shared prefix length if strings are identical up to that point.
            - fraction_differing is the fraction of characters that differ within
              the shared prefix length (min of the two string lengths).
    """
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/steer_analysis.py`
around lines 33 - 68, Update the short docstrings for divergence, dose_response,
and selectivity to full Google-style docstrings: for divergence, add an Args
section documenting a: str and b: str, a Returns section describing the tuple
(first_differing_index: int, fraction_differing: float) and explain edge cases
(identical prefix => index == shared prefix length, fraction computed over
min(len(a), len(b)) and uses max(1, n) to avoid division by zero); for
dose_response, document Args (baseline: str, steered_by_strength: dict[float,
str]) and Returns (list[dict] with keys "strength": float, "first_divergence":
int, "frac_changed": float) and note rows are sorted by ascending strength; for
selectivity, document Args (baseline: str, target_steered: str, control_steered:
dict[int, str]) and Returns (dict with keys "target_frac_changed",
"control_frac_changed", "mean_control_frac_changed", "selectivity_ratio") and
mention handling of empty controls (mean_control_frac_changed==0 and
selectivity_ratio uses denominator max(mean_c, 1e-9)). Ensure each docstring
uses Google-style sections (Args, Returns, and mention Raises if applicable) and
references the existing function names divergence, dose_response, and
selectivity so reviewers can locate the changes.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/steer_analysis.py`:
- Around line 54-68: The selectivity() function currently divides by a tiny
epsilon when control_steered is empty, producing a misleading huge
selectivity_ratio; modify selectivity() to detect when controls is empty (e.g.,
if not control_steered or if len(controls)==0) and set "selectivity_ratio" to
None (or another sentinel) instead of computing target / max(mean_c, 1e-9); keep
the other fields the same (you may also set "mean_control_frac_changed" to None
if you prefer), and update any type hints or callers of selectivity to accept a
nullable selectivity_ratio.
- Line 32: The top-level imports in steer_analysis.py currently have only one
blank line following them; update the module by inserting a second blank line
after the import block so there are exactly two blank lines before the first
subsequent statement (e.g., before any module-level constants, function or class
definitions) to satisfy the isort configuration.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/steer.py`:
- Line 38: The import block in steer.py currently has only one blank line after
the imports which violates the isort rule requiring two blank lines; update the
top-of-file imports section by adding one additional newline so there are two
blank lines separating the imports from the rest of the code (i.e., ensure the
imports block in steer.py is followed by two blank lines).
- Around line 77-78: The parsing of comma-separated inputs for controls and
strengths (variables controls and strengths derived from a.controls and
a.strengths) lacks error handling and will raise ValueError on malformed tokens;
wrap the list comprehensions in a try/except that catches ValueError, validate
each token with .strip() before converting (skip empty tokens), convert controls
with int() and strengths with float() inside the try block, verify the two lists
have the expected matching lengths and raise or exit with a clear error message
including the offending token and the original input (use the argparse object a
for context) so the script fails gracefully instead of crashing.
- Around line 71-76: The code can leave target as None when no labeled features
are in the top activations; after the loop that inspects vals, ids and sets
target from a.feature or eng.labels, add a validation: if target is still None,
either set a sensible fallback (e.g., target = int(ids[0]) to pick the
highest-activation feature) or raise a clear error asking the user to provide
--feature; ensure the check references target, a.feature, vals, ids and
eng.labels so downstream code that uses target as a dict key/for formatting
won't get a TypeError.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/tests/test_steer_analysis.py`:
- Line 21: The file test_steer_analysis.py has only one blank line after the
import block which violates the isort config requiring two blank lines; open
test_steer_analysis.py, locate the import section at the top of the module (the
import statements in the test_steer_analysis module), and add one additional
blank line so there are two consecutive blank lines immediately following the
imports before the first test or code block.

---

Nitpick comments:
In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/steer_analysis.py`:
- Around line 33-68: Update the short docstrings for divergence, dose_response,
and selectivity to full Google-style docstrings: for divergence, add an Args
section documenting a: str and b: str, a Returns section describing the tuple
(first_differing_index: int, fraction_differing: float) and explain edge cases
(identical prefix => index == shared prefix length, fraction computed over
min(len(a), len(b)) and uses max(1, n) to avoid division by zero); for
dose_response, document Args (baseline: str, steered_by_strength: dict[float,
str]) and Returns (list[dict] with keys "strength": float, "first_divergence":
int, "frac_changed": float) and note rows are sorted by ascending strength; for
selectivity, document Args (baseline: str, target_steered: str, control_steered:
dict[int, str]) and Returns (dict with keys "target_frac_changed",
"control_frac_changed", "mean_control_frac_changed", "selectivity_ratio") and
mention handling of empty controls (mean_control_frac_changed==0 and
selectivity_ratio uses denominator max(mean_c, 1e-9)). Ensure each docstring
uses Google-style sections (Args, Returns, and mention Raises if applicable) and
references the existing function names divergence, dose_response, and
selectivity so reviewers can locate the changes.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/tests/test_steer_analysis.py`:
- Around line 27-50: Add unit tests covering edge cases: call divergence with
empty strings and with one empty vs non-empty (e.g., divergence("", "") and
divergence("A", "")) and assert expected behavior; add a test for selectivity
that passes an empty control_steered dict to ensure the function handles/raises
as intended; and add a test for dose_response with an empty steered_by_strength
dict to verify it returns an empty/appropriate response. Place these new tests
in the existing test_steer_analysis.py alongside the other tests and reference
the same functions divergence, selectivity, and dose_response so edge behaviors
are validated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ea342482-73ae-48e5-b854-6dc5e460dff3

📥 Commits

Reviewing files that changed from the base of the PR and between f310289 and 46f5426.

📒 Files selected for processing (3)
  • bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/steer.py
  • bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/steer_analysis.py
  • bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/tests/test_steer_analysis.py

Comment thread interpretability/sparse_autoencoders/recipes/evo2/scripts/steer.py Outdated
Comment thread interpretability/sparse_autoencoders/recipes/evo2/scripts/steer.py Outdated
@polinabinder1 polinabinder1 marked this pull request as ready for review June 12, 2026 05:32
@polinabinder1 polinabinder1 requested review from jstjohn and pstjohn and removed request for jstjohn, pstjohn and trvachov June 12, 2026 05:32
@polinabinder1 polinabinder1 force-pushed the pbinder/evo2-sae-serve branch from e567efc to 9bedf2b Compare June 23, 2026 03:57
polinabinder1 added a commit that referenced this pull request Jun 23, 2026
…d onto migrated #1622

Re-lands #1635 on the post-#1633 layout, on top of migrated #1622: the steering-eval harness
(scripts/{steer,steer_analysis}.py) over #1622's generate(), with model-agnostic metrics.

Validated: tests/test_steer_analysis.py -> 3 passed (CPU); harness scripts compile.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Polina Binder <pbinder@nvidia.com>
@polinabinder1 polinabinder1 force-pushed the pbinder/evo2-steering-analysis branch from 74ed67c to 0189d98 Compare June 23, 2026 06:29
polinabinder1 added a commit that referenced this pull request Jun 23, 2026
…d onto migrated #1622

Re-lands #1635 on the post-#1633 layout, on top of migrated #1622: the steering-eval harness
(scripts/{steer,steer_analysis}.py) over #1622's generate(), with model-agnostic metrics.

Validated: tests/test_steer_analysis.py -> 3 passed (CPU); harness scripts compile.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Polina Binder <pbinder@nvidia.com>
@polinabinder1 polinabinder1 force-pushed the pbinder/evo2-steering-analysis branch from 0189d98 to 7461e2b Compare June 23, 2026 21:59
@copy-pr-bot

copy-pr-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@polinabinder1 polinabinder1 force-pushed the pbinder/evo2-steering-analysis branch 2 times, most recently from 474650d to 2dcc5c9 Compare June 23, 2026 22:15
polinabinder1 and others added 2 commits June 24, 2026 04:44
…d onto migrated #1622

Clamp an SAE feature via the production Evo2SAE.generate path and quantify the causal effect:
dose-response (effect vs strength) + selectivity (target vs control features), persisted to a
structured steering_results.json.

Metric / robustness:
  * normalized edit (Levenshtein) distance, not positional Hamming. Greedy decode is
    autoregressive, so one early flipped token shifts every downstream base and pins Hamming at
    ~1.0 — erasing the dose curve. Edit distance is shift-robust; first_divergence (shared-prefix
    length) is the complementary monotone signal. Tested with the shift case.
  * surface the clamp cap: generate() silently caps |strength| to MAX_CLAMP_STRENGTH, so two
    requests above it produce an identical clamp (a fake plateau). run_steering warns, steers at
    the effective value, and records max_clamp_strength + capped_strengths.

Consolidation:
  * harness + metrics live in the package (src/evo2_sae/steer_analysis.py), engine injected, so
    they import as a normal torch-free module like evo2_sae.fasta — dropped all four sys.path
    inserts. scripts/steer.py is now a thin CLI (matches train.py/extract.py).
  * pick_target reuses Evo2SAE.top_features (the CLI/server ranking) instead of re-deriving topk.
  * one CPU test file (metrics + fake-engine harness) instead of two; fake stays local, not in
    conftest, to avoid colliding with the sibling server PR's engine fixtures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Polina Binder <pbinder@nvidia.com>
Relocate the steering dose-response / selectivity metrics from evo2_sae.steer_analysis into the
evo2_sae.eval package (src/evo2_sae/steer_analysis.py -> src/evo2_sae/eval/steering.py), alongside
the eval/probing harness. Update the importers (scripts/steer.py CLI + tests/test_steer_analysis.py)
to evo2_sae.eval.steering. The CI lane is dropped via the rebase onto the updated #1622. Pure-CPU
tests, no GPU/model.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Polina Binder <pbinder@nvidia.com>
@polinabinder1 polinabinder1 force-pushed the pbinder/evo2-steering-analysis branch from 2dcc5c9 to d6b6813 Compare June 24, 2026 04:50
polinabinder1 pushed a commit that referenced this pull request Jun 29, 2026
Re-add .github/workflows/unit-tests-interpretability-recipes.yaml (removed in
c31d4e9 "defer CI for now"), recovered verbatim from 9bedf2b. Path-gated GPU
lane (L4 + megatron squashed image) that builds the evo2 SAE recipe via
.ci_build.sh and runs pytest tests/ — including the GPU steering/encode tests.
Rooted on this PR (#1622), so the stacked SAE PRs (#1623/#1629/#1635) inherit it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
polinabinder1 and others added 3 commits June 30, 2026 16:46
Signed-off-by: polinabinder1 <pbinder@nvidia.com>
Signed-off-by: polinabinder1 <pbinder@nvidia.com>
… the recipe

The steering clamp hook moved sae.steering -> evo2_sae.steering (#1622). This
merges that base in and fixes the one remaining reference (a docstring in
eval/steering.py). No import change was needed — the harness is engine-injected
and never imported sae.steering directly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: polinabinder1 <pbinder@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant