evo2 SAE recipe: feature-explorer dashboard (viz) by polinabinder1 · Pull Request #1623 · NVIDIA-BioNeMo/bionemo-recipes

polinabinder1 · 2026-06-10T21:27:21Z

Summary

React feature-explorer dashboard (4 tabs) over the Evo2SAE backend (#1622) — and the packaging
to ship it: the front-end can be built to static files baked into the recipe image, so one
container serves the dashboard and the API on a single port. Runnable end-to-end.

Stack update (2026-06-29): #1637 (server + CLI) merged into the base branch, so the server this
dashboard talks to now lives in #1622. This PR was rebased onto that base — references below that
say "#1637" mean the server//api mount that is now part of #1622.

Tabs: Feature atlas · Sequence inspector (/api/annotate) · Generative steering
(/api/generate) · Sequence UMAP (/api/gene_embed).

How to run

Full instructions (three modes) live in feature_explorer/README.md. In short:

# Deploy / share — one container, UI + API on :8001, no Node at runtime (dashboard is opt-in):
docker build --build-arg WITH_DASHBOARD=1 \
  -f interpretability/sparse_autoencoders/recipes/evo2/Dockerfile -t evo2-sae .
docker run --gpus all -p 8001:8001 \
  -e EVO2_CKPT_DIR=/ckpt/evo2 -e SAE_CKPT_PATH=/ckpt/sae.pt -e EMBEDDING_LAYER=26 \
  -v /ckpt:/ckpt evo2-sae scripts/launch_inference.sh serve     # -> http://localhost:8001

# Local dev — Vite hot reload (Node >=18) + backend (run from recipes/evo2); tunnel :5176:
scripts/launch_inference.sh serve
scripts/launch_dashboard.py                                     # tsh -L 5176:localhost:5176 <box>

What this PR adds

Dashboard front-end (feature_explorer/, React + Vite): 4 tabs, dark/light, search, CSV export,
KaTeX steering equation, crash-safe localStorage, shared widgets in components.jsx.
Opt-in single-container serving — --build-arg WITH_DASHBOARD=1 adds a multi-stage Node build
that compiles the front-end to static dist/ (Node only in the build stage) and bakes it in;
server.py mounts it at / via DASHBOARD_DIST. One docker run, no Node at runtime, no second
process. The default build is engine + server only — it never pulls Node or builds the
front-end, so an SAE-only image isn't coupled to the dashboard toolchain.
/gene_embed endpoint + Evo2SAE.embed_bundle — pool sequences into per-feature vectors for the
Sequence-UMAP tab; shared with the offline precompute so the static bundle matches the live response.
Per-pane descriptions + Limitations — each tab documents its caveats in-app (static atlas + 2-D
UMAP; ±300 steering clamp, continuation-only; single-layer encode; stochastic 2-D UMAP, pooled,
min_firing-filtered, 1000-seq cap).
UMAP no longer silently truncates — /gene_embed skips (not truncates) over-length/too-short
sequences and anything past the 1000 cap, reports the counts
(n_received/n_skipped_short/n_skipped_too_long/n_dropped_over_cap/max_seq_len/max_genes),
and the tab shows a warning banner; all-invalid → 400 with an accounting message. Matches
/annotate's reject-don't-truncate.
Offline / static mode — probes /api/health; with no backend, the Feature atlas (static
parquets) and Sequence UMAP (precomputed sequmap_embeddings.json) work, steering/inspector hide.
Precompute via scripts/dashboard.py atlas|examples|embeddings; npm run build → a fully static site.

Architectural decisions

Thin viz over the HTTP API — no model code in JS; every tab calls the evo2_sae backend (evo2 SAE: inference engine + steering + server/CLI, tests, Dockerfile #1622).
Keeps one validated inference path; the UI can't drift from the engine.
API under /api, frontend on / — so a single origin serves both. The front-end always calls
/api/*; in dev Vite proxies it through without a path rewrite, so dev and the single-container
build hit identical paths (no dev/prod drift). (/api prefix + static mount are in the base, evo2 SAE: inference engine + steering + server/CLI, tests, Dockerfile #1622.)
Dashboard build is opt-in; default image is engine-only — the front-end is a Node build-time
dependency, so gating it behind WITH_DASHBOARD keeps the SAE engine image buildable without npm
network access or a working front-end, and an SAE-only user pays nothing for the viz. Node never
reaches the runtime image regardless.
embed_bundle is a method on Evo2SAE (in core.py) — it needs encode_batch + the SAE, and
living in one place keeps the live /gene_embed response and the offline precompute identical. It's
dashboard-only (nothing else in evo2 SAE: inference engine + steering + server/CLI, tests, Dockerfile #1622 uses it), so it ships with this PR.
Report, don't silently drop — UMAP surfaces what it couldn't embed rather than quietly returning a
partial layout, consistent with the backend's reject-don't-truncate stance.
/gene_embed wire format — one encode per sequence yields both mean- and max-pooled vectors
(client toggles pooling without re-running the model), and the response ships only features firing
in ≥ min_firing sequences (65,536 → n_firing, with a remap to real feature ids) as a base64
float32 [n_genes, n_firing] matrix. Shipping only the firing set is sound because TopK/ReLU codes
are ≥ 0, so the firing set is pooling-invariant (mean and max agree on what's nonzero).
Atlas needs only the SAE, not the 7B — dashboard.py atlas builds firing-rates from the cached
activation store and the 2-D layout from the SAE decoder geometry, so most dashboard data is
generatable with no big model loaded; examples / embeddings load the 7B but only over a small
FASTA. (Splits cheap, model-free precompute from the expensive model-touching kind.)
--layout auto (UMAP → PCA fallback) — UMAP gives the best clusters but needs numba (NumPy ≤ 2.3);
auto falls back to PCA/t-SNE (numba-free) so the atlas runs single-env without forcing a fragile
numba/NumPy pin on the whole recipe.

Running the tests

No dedicated CI lane (deferred — see #1622). Run them via the recipe's build script:

cd interpretability/sparse_autoencoders/recipes/evo2
bash .ci_build.sh && source .ci_test_env.sh
pytest tests/

CPU (no model): test_server.py (incl. /api/gene_embed decodable matrix + the skip/over-cap accounting + unknown-organism 400) and test_cli.py via FakeEngine; plus evo2 SAE: inference engine + steering + server/CLI, tests, Dockerfile #1622's contract tests. The dashboard front-end is build-validated (npm run build), no JS unit suite.
GPU: inherited test_steering.py, gated by @pytest.mark.skipif(not torch.cuda.is_available()) — runs on a GPU box, skips otherwise.

Stacked on #1622 (which now includes the server + CLI, formerly the separate #1637).

coderabbitai · 2026-06-10T21:27:31Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 12619ff1-3342-471f-ac94-d6cafa404cb1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds a new Evo2 SAE Feature Explorer: a Vite/React dashboard (four tabs) with DuckDB-WASM/Mosaic front-end, sequence embedding/UMAP tooling, generative steering and inspector UIs, a backend /gene_embed endpoint, scripts to build atlas/examples parquets, a dashboard launcher, tests, and docs/sample data.

Changes

Evo2 SAE Feature Explorer Frontend & Backend

Layer / File(s)	Summary
Project setup & dependencies `bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/.gitignore`, `package.json`, `vite.config.js`	Vite React project manifest, dev server `/api` proxy to `localhost:8001`, and Node artifact ignores including generated parquet files.
HTML entry point and React bootstrap `feature_explorer/index.html`, `src/index.jsx`	HTML shell with NVIDIA Sans font, light/dark CSS variables, and React StrictMode bootstrap into `#root`.
Core app data orchestration and state `src/App.jsx`	Initializes DuckDB-WASM, loads parquet assets, derives categorical/sequential encodings, creates Mosaic crossfilter, manages search/sort/pagination, selection syncing, CSV export, and UI modals.
Dashboard navigation and tab routing `src/Dashboard.jsx`	Four-tab shell (Feature atlas, Generative steering, Sequence inspector, Sequence UMAP) with dark/light theme toggle.
Embedding visualization and histograms `src/EmbeddingView.jsx`, `src/Histogram.jsx`	Mosaic-based embedding view with category palettes, tooltips, viewport sync and histogram VGPlot component with brush-filtered foreground and injected axis.
Feature card list and detail page `src/FeatureCard.jsx`, `src/FeatureDetailPage.jsx`, `src/FeatureList.jsx`	Feature cards with editable titles, lazy-loaded examples, alignment modes, export; list pagination and detail modal.
Sequence and region visualization `src/SequenceView.jsx`, `src/RegionDetailModal.jsx`	Per-base activation rendering with alignment, synchronized scrolling, tooltips, and portal-based region modal.
Interactive exploration tabs `src/GenerativeSteering.jsx`, `src/SequenceInspector.jsx`, `src/SequenceUMAPView.jsx`	Generative steering (feature clamps → /generate), SequenceInspector (annotate → /annotate, heatmaps), SequenceUMAPView (embed/pool/reorganize sequences, label edits).
Shared utilities and backend integration `src/backend.js`, `src/utils.js`, `src/styles.js`, `src/InfoButton.jsx`	Backend proxy and health polling, Viridis colormap and activation color mapping, DNA sanitizer and HTTP helpers, region/base utilities, styles and small UI helper.
Backend gene embedding endpoint `src/evo2_sae/server.py`	`GeneEmbedRequest` model and `POST /gene_embed` endpoint: batch encoding, mean/max pooling across gene DNAs, firing-count filtering, and base64-encoded matrices in response.
Dashboard data-generation scripts `scripts/dashboard.py`	CLI to build `features_atlas.parquet` and `feature_examples.parquet` (sampling, layout via UMAP/PCA/TSNE fallback, per-feature examples extraction).
Dashboard launcher and data staging `scripts/launch_dashboard.py`	Staging required parquet files into frontend `public/`, optional npm install, starts Vite dev server, and optional browser open.
Tests for launcher and backend `tests/test_launch_dashboard.py`, `tests/test_server.py`	Tests for parquet staging success/failure and schema validation; contract test for `/gene_embed` decodable matrix and FakeEngine.batch encode.
Documentation and sample data `feature_explorer/README.md`, `feature_explorer/public/sequence_library.json`	Usage/launch instructions, tab/data dependencies, and preset sequence library for UMAP exploration.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

jstjohn
pstjohn
jwilber
trvachov

Poem

🐰 A dashboard sprouts where sequences play,
Tabs of insight chase the dark away,
DuckDB hums and Mosaic paints the map,
UMAP dances—features bridge the gap.
The rabbit cheers: explore, export, and sway!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.84% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding a React/Vite feature-explorer dashboard with visualization capabilities for the Evo2 SAE recipe.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description is detailed and covers summary, usage, architecture, and testing, even though it doesn't fully follow every template section.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pbinder/evo2-sae-dashboard

_{Comment @coderabbitai help to get the list of available commands.}

copy-pr-bot · 2026-06-10T21:28:06Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Recipe-level entry point tying together the offline extract/train (#1621), the live inference engine + server + CLI (this PR), and the dashboard (#1623): venv setup, 7B/L26 config, CLI encode/batch, serve, dashboard launch, and the CPU/GPU test commands. codonfm/esm2 have one; evo2 didn't. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

The recipe-level README documented the dashboard (#1623) and the atlas generator (deferred), neither of which is in this PR — premature here. The inference run instructions live in launch_inference.sh's header. A complete recipe README lands once the dashboard + generator exist. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

Embeds many sequences -> per-feature vectors for the #1623 Sequence-UMAP panel: each sequence is encode_batch'd, pooled (mean + max) over the DNA region, and returned as base64 float32 [n x n_features] (both pools from one forward, so the client toggles pooling without re-running) plus per-sequence metadata + feature stats. Thin wrapper over the engine's encode_batch. test_server.py: asserts the response decodes to an [ng x nf] matrix. 7 contract tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

polinabinder1 · 2026-06-11T18:52:45Z

@coderabbitai review

coderabbitai · 2026-06-11T18:52:51Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 14

🧹 Nitpick comments (5)

bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/SequenceView.jsx (1)

228-267: ⚡ Quick win

Extract duplicated anchor computation logic.

The anchor computation logic is duplicated three times: in the SequenceView component (lines 96-102), and twice in this function (lines 240-246 and 254-260). This violates the DRY principle and creates a maintenance burden.

♻️ Proposed refactor to extract shared logic

Add a helper function before SequenceView:

+function computeAnchor(acts, alignMode) {
+  let anchor = 0
+  if (alignMode === 'first_activation') {
+    anchor = acts.findIndex(a => a > 0)
+    if (anchor < 0) anchor = 0
+  } else if (alignMode === 'max_activation') {
+    let maxVal = -1
+    acts.forEach((a, i) => { if (a > maxVal) { maxVal = a; anchor = i } })
+  }
+  return anchor
+}
+
 export default function SequenceView({

Then refactor the three call sites to use computeAnchor(acts, alignMode) instead of the duplicated logic.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/SequenceView.jsx`
around lines 228 - 267, Extract the duplicated anchor calculation into a helper
(e.g., computeAnchor(acts, alignMode)) and use it in computeAlignInfo and
SequenceView: implement computeAnchor to accept an activation array and
alignMode and return the anchor index (handling 'start', 'first_activation',
'max_activation' and defaulting to 0), then replace the repeated blocks inside
computeAlignInfo (both the loop that computes maxAnchor and the loop that
computes totalLength) and the anchor logic in SequenceView with calls to
computeAnchor(acts, alignMode) so all three sites use the single helper.

bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/RegionDetailModal.jsx (2)

114-154: ⚡ Quick win

Add accessibility attributes for the modal dialog.

The modal is missing important accessibility attributes:

role="dialog" and aria-modal="true" on the modal container
aria-labelledby pointing to the region label
Focus trap to prevent tabbing out of the modal
Focus return to the trigger element when closed

These attributes are required for screen reader users to understand and navigate the modal correctly.

♻️ Proposed accessibility improvements

       <div style={styles.backdrop} onClick={onClose}>
-        <div style={styles.modal} onClick={e => e.stopPropagation()}>
+        <div style={styles.modal} onClick={e => e.stopPropagation()}
+             role="dialog" aria-modal="true" aria-labelledby="region-label">
           <div style={styles.closeBtn} onClick={onClose}>x</div>

           <div style={styles.body}>
             <div style={styles.header}>
-              <span style={styles.regionLabel}>{label}</span>
+              <span id="region-label" style={styles.regionLabel}>{label}</span>
             </div>

For a complete solution, consider adding a focus trap library or implementing focus management to capture Tab key presses and cycle focus within the modal.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/RegionDetailModal.jsx`
around lines 114 - 154, The modal JSX rendered by the modal variable needs ARIA
attributes and focus management: add role="dialog" and aria-modal="true" to the
element using styles.modal, give the region label span (styles.regionLabel) a
unique id and set aria-labelledby on the modal to that id, and implement focus
trapping and focus return around modal open/close (use a ref to store the
element that opened the modal, set initial focus to a focusable element inside
the modal on mount, intercept Tab/Shift+Tab to cycle focus within the modal, and
onClose restore focus to the saved trigger element). Ensure onClose still closes
the modal and that focus cleanup happens on unmount.

6-102: ⚡ Quick win

Use CSS custom properties for theme consistency.

Several styles use hardcoded color values instead of CSS custom properties, which will break dark mode support:

Line 17: background: '#fff' should use var(--bg-card) or similar
Line 33: background: 'rgba(255,255,255,0.9)' in closeBtn
Lines 34, 63, 72-76, 86, 96: Various hardcoded grays like #ddd, #222, #f9fafb, #eee, #888, #333, #fafafa

Other components in the codebase (e.g., SequenceInspector.jsx, GenerativeSteering.jsx) consistently use CSS custom properties like var(--bg-card), var(--text), var(--border) for theme support.

♻️ Proposed fix for theme compatibility

   modal: {
-    background: '`#fff`',
+    background: 'var(--bg-card)',
     borderRadius: '12px',
     // ... rest
   },
   closeBtn: {
     // ...
-    background: 'rgba(255,255,255,0.9)',
-    border: '1px solid `#ddd`',
+    background: 'var(--bg-card)',
+    border: '1px solid var(--border)',
     // ...
-    color: '`#555`',
+    color: 'var(--text-secondary)',
   },
   // Apply similar changes to other style properties

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/RegionDetailModal.jsx`
around lines 6 - 102, The styles object in RegionDetailModal.jsx uses hardcoded
color literals that break theme/dark-mode; update the color values in the styles
constant (backdrop, modal, closeBtn, regionLabel, statBox, statLabel, statValue,
sequenceBox and any other color/border/background properties) to use CSS custom
properties (e.g. var(--bg-card), var(--bg-overlay) or var(--bg), var(--text),
var(--muted), var(--border), var(--mono) as appropriate) so theming and dark
mode match other components like SequenceInspector.jsx and
GenerativeSteering.jsx; keep the same property keys (backdrop.background,
modal.background, closeBtn.background/border, regionLabel.color,
statBox.background/border, statLabel.color, statValue.color,
sequenceBox.background, etc.) and provide sensible fallbacks if needed.

bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/SequenceUMAPView.jsx (1)

113-131: ⚖️ Poor tradeoff

Consider making UMAP computation non-blocking.

Line 128 calls buildItems(Gmean, nf, ng, r.genes) which synchronously computes UMAP (line 415). For large datasets (many sequences or features), this CPU-intensive computation can freeze the UI.

Consider wrapping the UMAP computation in a setTimeout(..., 0) or using a Web Worker to keep the UI responsive during computation.

♻️ Example: Add a microtask delay

       const Gmean = dec(r.G_b64)
       const Gmax = r.Gmax_b64 ? dec(r.Gmax_b64) : Gmean
       const nf = r.n_features, ng = r.n_genes
+      await new Promise(r => setTimeout(r, 16)) // yield to browser
       const items = buildItems(Gmean, nf, ng, r.genes)
       setBundle({ G: Gmean, Gmean, Gmax, nf, ng, meta: r.genes, items, stats: r.feature_stats, saeId: r.sae_id })

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/SequenceUMAPView.jsx`
around lines 113 - 131, The embed() handler currently calls buildItems(Gmean,
nf, ng, r.genes) synchronously which triggers UMAP computation and can freeze
the UI; change it so the heavy work runs asynchronously — either enqueue
buildItems on the next tick (e.g., wrap the buildItems + setBundle(...) call in
setTimeout(..., 0) or queueMicrotask) or move the UMAP logic into a Web Worker
and post the response to the main thread; ensure you still setBusy(true) before
and setBusy(false) only after setBundle is applied, and reference the same
symbols: embed(), buildItems, setBundle, and the decoded arrays Gmean/Gmax, nf,
ng, r.genes when transferring data to the worker or delayed callback.

bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/GenerativeSteering.jsx (1)

183-210: ⚡ Quick win

Use theme-aware colors for DNA sequence readout.

Lines 206 and 209 use hardcoded light-theme colors (#ffffff, #e0e0e0, #111) that will not adapt to dark mode. While the comment on line 162 suggests a "light panel" is intentional for readability, the panel should still respond to the active theme.

Consider using var(--bg-code) or similar custom properties, or checking the current theme to provide appropriate contrast.

♻️ Proposed theme compatibility fix

   seqReadout: {
-    background: '`#ffffff`',
-    border: '1px solid `#e0e0e0`',
+    background: 'var(--bg-card-expanded)',
+    border: '1px solid var(--border)',
     borderRadius: '6px',
     padding: '8px 10px',
     fontFamily: 'ui-monospace, Menlo, monospace',
     fontSize: '13px',
     lineHeight: 1.7,
   },
   seqLine: { display: 'flex', gap: '8px', alignItems: 'baseline' },
   seqIdx: { color: '`#999`', fontSize: '11px', minWidth: '40px', textAlign: 'right', whiteSpace: 'pre' },
-  seqText: { color: '`#111`', letterSpacing: '1px', wordBreak: 'break-all' },
+  seqText: { color: 'var(--text)', letterSpacing: '1px', wordBreak: 'break-all' },

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/GenerativeSteering.jsx`
around lines 183 - 210, S.seqReadout, S.seqText and S.seqIdx use hardcoded
light-theme colors; update them to theme-aware CSS variables so the DNA readout
adapts to dark mode. Replace seqReadout background and border colors with
variables (e.g., var(--bg-code) or var(--bg-panel) and var(--border-muted) or
var(--border)), set seqText color to var(--text) (or var(--text-heading)), and
set seqIdx to a muted variable like var(--text-muted) instead of fixed hex
values; make the changes in the S object entries named seqReadout, seqText and
seqIdx.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/index.html`:
- Line 12: Replace invalid CSS token "light" used for font-weight inside the
`@font-face` declarations with a numeric weight (e.g., 300); locate the two
`@font-face` blocks that currently set font-weight: light and change them to
font-weight: 300 (or another valid numeric weight consistent with the font
files) so the font faces are recognized by browsers.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/App.jsx`:
- Around line 943-950: Replace the clickable <span> that opens the metrics modal
with an actual keyboard-accessible control: change the interactive element that
calls setShowMetricsModal(true) to a <button type="button"> with an explicit
accessible label (aria-label or visually hidden text) and preserve the inline
styles and click handler logic; do the same for the other instances referenced
(the similar spans at lines where handlers call setShow* or toggle functions
around the components) so keyboard users can tab to and activate these controls
and screen readers receive the label.
- Around line 14-19: App is mutating the document root class via the useEffect
that toggles document.documentElement.classList based on its local darkMode
state, while Dashboard also controls the root theme causing cross-component
drift; remove the root DOM mutation from App and instead accept darkMode and
setDarkMode as props (or lift darkMode state up) so Dashboard remains the single
owner of root theme control; specifically delete or disable the useEffect in App
that references document.documentElement.classList.toggle and ensure App uses
the passed-in darkMode and setDarkMode (or propagate them up) so only Dashboard
applies the root 'dark' class.
- Around line 474-475: The current conditional sets selected feature ids to null
when ids.size is 0, which treats an empty selection as "no filter" and shows all
features; change the logic in the setSelectedFeatureIds call (referencing
setSelectedFeatureIds, ids and totalFeatures) so that only a full selection
(ids.size === totalFeatures) becomes null while an empty Set is preserved (i.e.,
set to ids), e.g. replace the condition with one that returns null only when
ids.size === totalFeatures, otherwise setSelectedFeatureIds(ids).
- Around line 165-174: The parquetUrl coming from query params (via
dataPath/parquetUrl) is directly interpolated into the SQL sent to
vg.coordinator().exec() for read_parquet, which allows a single quote in the URL
to break or inject SQL; fix by escaping the URL before embedding (e.g., replace
any single quote with two single quotes or use a SQL-quoting helper) or use a
parameterized API if available, and pass the escaped/quoted value into the SQL
string used in vg.coordinator().exec(`... read_parquet('${...}')`) so
read_parquet receives a safe literal.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/EmbeddingView.jsx`:
- Around line 266-307: The effect starting with useEffect(...) updates category
colors but omits darkMode and features from its dependency array, so changes to
theme or the features array won't recompute hidden-category color mapping;
update the effect dependencies to include darkMode and features (in addition to
categoryColumn, categoryColumns, hiddenCategories) and ensure the logic inside
(references to categoryColumn, categoryColumns, hiddenCategories, features,
SEQUENTIAL_COLORS, CATEGORY_COLORS, HIDDEN_COLOR and the call
viewRef.current.update({...})) will re-run when those values change so colors
remain consistent.
- Around line 53-59: The current EmbeddingView.jsx sets this.inner.innerHTML
with interpolated label and colorField (and numeric fields), which enables DOM
XSS; instead, refactor the render to build DOM nodes programmatically (e.g.,
createElement for container divs and text nodes) and assign user-provided
strings to element.textContent (not innerHTML), then append numeric values (use
logFreq.toFixed and maxAct.toFixed) into textContent or separate text nodes;
replace the template usage of this.inner.innerHTML with
this.inner.appendChild(...) logic so label and colorField are never inserted as
raw HTML.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/FeatureCard.jsx`:
- Around line 315-316: The card expand and title-edit handlers are bound to
non-semantic elements (e.g. the div with styles.header using handleClick and
similar spans around lines 368-381), which prevents keyboard users from
activating them; update those elements in the FeatureCard component to be
keyboard-accessible by replacing purely clickable divs/spans with semantic
interactive elements (preferably <button>) or by adding role="button",
tabIndex="0", and onKeyDown handlers that call the same handler on Enter/Space;
also add appropriate ARIA attributes (e.g. aria-expanded on the card toggle and
aria-label or aria-labelledby on the title-edit trigger) and ensure focus styles
are preserved so keyboard users can see focus.
- Around line 275-311: The CSV export is vulnerable to formula injection and
broken quoting; add a sanitizeCell helper used by exportToCSV to (1) stringify
null/undefined, (2) escape quotes by doubling them, (3) wrap fields containing
commas/newlines/quotes in double quotes, and (4) if the resulting field starts
with =, +, -, or @, prefix it with a single quote to neutralize spreadsheets.
Update all places that push data into lines (metadata rows using
feature.feature_id, displayTitle, userTitle, freq, maxAct and example rows using
getRegionLabel(ex), ex.max_activation, ex.sequence) to pass values through
sanitizeCell before concatenation; also sanitize the filename components
(displayTitle) to avoid surprises.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/GenerativeSteering.jsx`:
- Around line 38-58: In the generate function, validate and coerce numeric
inputs before building the request body: ensure Number(nTokens) and
Number(temperature) are parsed and checked (e.g., parseInt/parseFloat and verify
not NaN and n_tokens >= 1) and show setError or bail if invalid; similarly map
through clamps in the same function to parse each c.strength, filter out or
reject empty/NaN strengths (or enforce a minimum) before constructing features
(feature_id and numeric strength) so you never send 0 from empty strings —
update generate to perform these checks and either set a client-side error or
replace invalid values with safe defaults prior to calling postJSON.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/InfoButton.jsx`:
- Around line 33-53: Replace the non-semantic clickable <span> in InfoButton.jsx
with a real <button> element so it is keyboard-focusable and operable; update
the element using the same ref (buttonRef), keep the onClick={() => setOpen(o =>
!o)} handler, add type="button" and an appropriate aria-label (e.g.,
aria-label="More information" or aria-expanded based on open state), and
preserve the existing inline style properties on the button so visual appearance
remains unchanged; ensure any pointer/user-select/CSS properties are applied to
the button and remove any unnecessary role attributes since a native button is
used.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/utils.js`:
- Around line 9-11: The label construction uses sid with
`${sid}:${example.start}-${example.end}`, which yields a leading colon when sid
is missing; update the code that builds the label (the expression using sid,
example.start and example.end in utils.js) to conditionally prepend the colon
only when sid is present — i.e., build a prefix that is either "sid:" or an
empty string, then append "start-end" using example.start and example.end;
ensure the existing null checks for example.start/example.end remain in place.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/launch_dashboard.py`:
- Around line 93-97: The finally block sends SIGTERM to the Vite subprocess via
proc.terminate() but does not reap it; update the shutdown sequence for proc
(the Popen object) to wait for it and force-kill if it ignores SIGTERM: call
proc.terminate(), then attempt proc.wait(timeout=...) and if it times out call
proc.kill() and then proc.wait() again to ensure the process is reaped; also
handle and ignore exceptions from wait/kill so the script exits cleanly.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/src/evo2_sae/server.py`:
- Around line 251-258: The response object currently returns keys "nf", "ng",
"meta", and "stats" alongside "G_b64" and "Gmax_b64"; rename these keys to match
the frontend contract used by SequenceUMAPView.jsx: replace "nf" ->
"n_features", "ng" -> "n_genes", "meta" -> "genes", and "stats" ->
"feature_stats" in the returned dict (keep the same values/types, e.g.,
int(gmean.shape[1]) etc.), so callers consuming "G_b64"/"Gmax_b64" and the UMAP
consumer get the expected fields.

---

Nitpick comments:
In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/GenerativeSteering.jsx`:
- Around line 183-210: S.seqReadout, S.seqText and S.seqIdx use hardcoded
light-theme colors; update them to theme-aware CSS variables so the DNA readout
adapts to dark mode. Replace seqReadout background and border colors with
variables (e.g., var(--bg-code) or var(--bg-panel) and var(--border-muted) or
var(--border)), set seqText color to var(--text) (or var(--text-heading)), and
set seqIdx to a muted variable like var(--text-muted) instead of fixed hex
values; make the changes in the S object entries named seqReadout, seqText and
seqIdx.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/RegionDetailModal.jsx`:
- Around line 114-154: The modal JSX rendered by the modal variable needs ARIA
attributes and focus management: add role="dialog" and aria-modal="true" to the
element using styles.modal, give the region label span (styles.regionLabel) a
unique id and set aria-labelledby on the modal to that id, and implement focus
trapping and focus return around modal open/close (use a ref to store the
element that opened the modal, set initial focus to a focusable element inside
the modal on mount, intercept Tab/Shift+Tab to cycle focus within the modal, and
onClose restore focus to the saved trigger element). Ensure onClose still closes
the modal and that focus cleanup happens on unmount.
- Around line 6-102: The styles object in RegionDetailModal.jsx uses hardcoded
color literals that break theme/dark-mode; update the color values in the styles
constant (backdrop, modal, closeBtn, regionLabel, statBox, statLabel, statValue,
sequenceBox and any other color/border/background properties) to use CSS custom
properties (e.g. var(--bg-card), var(--bg-overlay) or var(--bg), var(--text),
var(--muted), var(--border), var(--mono) as appropriate) so theming and dark
mode match other components like SequenceInspector.jsx and
GenerativeSteering.jsx; keep the same property keys (backdrop.background,
modal.background, closeBtn.background/border, regionLabel.color,
statBox.background/border, statLabel.color, statValue.color,
sequenceBox.background, etc.) and provide sensible fallbacks if needed.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/SequenceUMAPView.jsx`:
- Around line 113-131: The embed() handler currently calls buildItems(Gmean, nf,
ng, r.genes) synchronously which triggers UMAP computation and can freeze the
UI; change it so the heavy work runs asynchronously — either enqueue buildItems
on the next tick (e.g., wrap the buildItems + setBundle(...) call in
setTimeout(..., 0) or queueMicrotask) or move the UMAP logic into a Web Worker
and post the response to the main thread; ensure you still setBusy(true) before
and setBusy(false) only after setBundle is applied, and reference the same
symbols: embed(), buildItems, setBundle, and the decoded arrays Gmean/Gmax, nf,
ng, r.genes when transferring data to the worker or delayed callback.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/SequenceView.jsx`:
- Around line 228-267: Extract the duplicated anchor calculation into a helper
(e.g., computeAnchor(acts, alignMode)) and use it in computeAlignInfo and
SequenceView: implement computeAnchor to accept an activation array and
alignMode and return the anchor index (handling 'start', 'first_activation',
'max_activation' and defaulting to 0), then replace the repeated blocks inside
computeAlignInfo (both the loop that computes maxAnchor and the loop that
computes totalLength) and the anchor logic in SequenceView with calls to
computeAnchor(acts, alignMode) so all three sites use the single helper.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fd177528-4a24-4837-a844-cbfb935a4481

📥 Commits

Reviewing files that changed from the base of the PR and between bb38064 and 3284d12.

📒 Files selected for processing (27)

bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/.gitignore
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/README.md
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/index.html
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/package.json
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/public/sequence_library.json
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/App.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/Dashboard.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/EmbeddingView.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/FeatureCard.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/FeatureDetailPage.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/FeatureList.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/GenerativeSteering.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/Histogram.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/InfoButton.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/RegionDetailModal.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/SequenceInspector.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/SequenceUMAPView.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/SequenceView.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/backend.js
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/index.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/styles.js
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/utils.js
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/vite.config.js
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/launch_dashboard.py
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/src/evo2_sae/server.py
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/tests/test_launch_dashboard.py
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/tests/test_server.py

coderabbitai · 2026-06-11T19:09:01Z

+  const [darkMode, setDarkMode] = useState(true)
+
+  // Toggle dark class on document root
+  useEffect(() => {
+    document.documentElement.classList.toggle('dark', darkMode)
+  }, [darkMode])


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Unify theme ownership to a single source of truth.

App mutates the root dark class locally, while Dashboard also controls it. This creates cross-tab/remount theme drift (e.g., atlas remount can reset theme unexpectedly). Keep root theme control in one component only and pass state/handlers via props.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/App.jsx` around lines 14 - 19, App is mutating the document root class via the useEffect that toggles document.documentElement.classList based on its local darkMode state, while Dashboard also controls the root theme causing cross-component drift; remove the root DOM mutation from App and instead accept darkMode and setDarkMode as props (or lift darkMode state up) so Dashboard remains the single owner of root theme control; specifically delete or disable the useEffect in App that references document.documentElement.classList.toggle and ensure App uses the passed-in darkMode and setDarkMode (or propagate them up) so only Dashboard applies the root 'dark' class.

coderabbitai · 2026-06-11T19:09:01Z

+          setSelectedFeatureIds(ids.size > 0 && ids.size < totalFeatures ? ids : null)
+        } catch (err) {


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve empty selections instead of collapsing them to “all.”

At Line 474, ids.size === 0 currently falls back to null, which is treated as “no filter.” That incorrectly shows all features when a filter yields zero rows.

Suggested fix

- setSelectedFeatureIds(ids.size > 0 && ids.size < totalFeatures ? ids : null) + setSelectedFeatureIds(ids.size === totalFeatures ? null : ids)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/App.jsx` around lines 474 - 475, The current conditional sets selected feature ids to null when ids.size is 0, which treats an empty selection as "no filter" and shows all features; change the logic in the setSelectedFeatureIds call (referencing setSelectedFeatureIds, ids and totalFeatures) so that only a full selection (ids.size === totalFeatures) becomes null while an empty Set is preserved (i.e., set to ids), e.g. replace the condition with one that returns null only when ids.size === totalFeatures, otherwise setSelectedFeatureIds(ids).

coderabbitai · 2026-06-11T19:09:02Z

+      <span
+        ref={buttonRef}
+        onClick={() => setOpen(o => !o)}
+        style={{
+          display: 'inline-flex',
+          alignItems: 'center',
+          justifyContent: 'center',
+          width: '15px',
+          height: '15px',
+          borderRadius: '50%',
+          border: '1px solid var(--border-input)',
+          fontSize: '10px',
+          fontWeight: '600',
+          color: 'var(--text-tertiary)',
+          cursor: 'pointer',
+          userSelect: 'none',
+          lineHeight: 1,
+        }}
+      >
+        i
+      </span>


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/InfoButton.jsx" # Show file header + relevant section with line numbers nl -ba "$FILE" | sed -n '1,120p' # Quick search for open state / tooltip usage rg -n "open|setOpen|tooltip|aria-|buttonRef|onClick|span|button" "$FILE" || true

Repository: NVIDIA-BioNeMo/bionemo-framework

Length of output: 119

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/InfoButton.jsx" # Show the whole file if it's small; otherwise show the referenced region. lines=$(wc -l < "$FILE") echo "FILE_LINES=$lines" if [ "$lines" -le 160 ]; then cat -n "$FILE" else # Lines 25-90 around the reported snippet sed -n '25,90p' <(cat -n "$FILE") fi echo "---- rg hits ----" rg -n "open|setOpen|tooltip|aria-|buttonRef|onClick|role=|type=" "$FILE" || true

Repository: NVIDIA-BioNeMo/bionemo-framework

Length of output: 3222

Replace the tooltip-triggering “i” span with a semantic button for keyboard/a11y support.

InfoButton.jsx uses a clickable <span> (onClick only), so it’s not reliably focusable/activatable via keyboard. Switch it to a <button> and add appropriate ARIA.

Proposed fix

- <span + <button + type="button" ref={buttonRef} - onClick={() => setOpen(o => !o)} + onClick={() => setOpen((o) => !o)} + aria-label="More information" + aria-expanded={open} style={{ display: 'inline-flex', alignItems: 'center', justifyContent: 'center', width: '15px', height: '15px', + background: 'transparent', + padding: 0, borderRadius: '50%', border: '1px solid var(--border-input)', fontSize: '10px', fontWeight: '600', color: 'var(--text-tertiary)', cursor: 'pointer', userSelect: 'none', lineHeight: 1, }} > i - </span> + </button>

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/InfoButton.jsx` around lines 33 - 53, Replace the non-semantic clickable <span> in InfoButton.jsx with a real <button> element so it is keyboard-focusable and operable; update the element using the same ref (buttonRef), keep the onClick={() => setOpen(o => !o)} handler, add type="button" and an appropriate aria-label (e.g., aria-label="More information" or aria-expanded based on open state), and preserve the existing inline style properties on the button so visual appearance remains unchanged; ensure any pointer/user-select/CSS properties are applied to the button and remove any unnecessary role attributes since a native button is used.

polinabinder1 · 2026-06-12T04:52:15Z

@coderabbitai review

coderabbitai · 2026-06-12T04:52:20Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/GenerativeSteering.jsx (1)
39-59: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The nTokens validation concern from the previous review is still present.

Line 92 stores the input value as a string (setNTokens(e.target.value)), and line 48 converts it without validation. If the user clears the field, Number('') evaluates to 0, violating the min={1} constraint. HTML5 validation doesn't prevent the onClick handler from executing.

However, the previous comment overstated the scope:

Temperature is safe: Lines 84-86 use a range input with controlled parseFloat, which guarantees a valid numeric value.

Strength validation is unnecessary: 0 is a valid strength value (the UI help text notes "0 = suppress"), and the FeaturePicker UI already clamps values.

Only nTokens requires client-side validation before submission.
🛡️ Proposed validation fix (nTokens only)
  const generate = async () => {
    setBusy(true)
    setError(null)
    try {
+     const tokens = Number(nTokens)
+     if (!Number.isInteger(tokens) || tokens < 1) {
+       throw new Error('Token count must be at least 1')
+     }
      const body = {
        prompt: cleanDNA(prompt),
        organism,
        tag: tag ?? (organismTags?.[organism] ?? ''),
        features: clamps.map((c) => ({ feature_id: c.id, strength: c.strength })),
-       n_tokens: Number(nTokens),
+       n_tokens: tokens,
        temperature: Number(temperature),
        compare_baseline: compareBaseline,
      }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/GenerativeSteering.jsx`
around lines 39 - 59, The generate handler must validate nTokens before sending:
parse nTokens into a numeric integer (e.g., const tokens = Number(nTokens) or
parseInt(nTokens, 10)), verify Number.isInteger(tokens) && tokens >= 1, and if
invalid call setError(...) and early-return (and setBusy(false) if needed) to
prevent submission; then use that validated tokens value for the request body as
n_tokens. Update the generate function (references: generate, nTokens,
setNTokens, setError, setBusy, setResult, postJSON) to perform this check and
avoid sending Number('') -> 0.

🧹 Nitpick comments (2)

bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/dashboard.py (2)

60-402: ⚖️ Poor tradeoff

Expand docstrings to full Google-style format.

The coding guidelines require Google-style docstrings (pydocstyle convention) with Args, Returns, and Raises sections where applicable. Current docstrings provide brief descriptions but lack structured parameter and return value documentation.

Example:

📝 Google-style docstring example

 def _load_sae_only(args):
-    """Load just the SAE (+ labels) by reusing the engine's loaders — no 7B / megatron.
+    """Load just the SAE (+ labels) by reusing the engine's loaders — no 7B / megatron.
 
-    ``Evo2SAE.__init__`` only records config; ``_load_sae``/``_load_feature_meta`` touch the
-    SAE checkpoint and annotation parquet but never ``bionemo.evo2``, so this stays light.
-    """
+    ``Evo2SAE.__init__`` only records config; ``_load_sae``/``_load_feature_meta`` touch the
+    SAE checkpoint and annotation parquet but never ``bionemo.evo2``, so this stays light.
+
+    Args:
+        args: Parsed command-line arguments containing sae_ckpt_path, layer, device, and
+            feature_annotations paths.
+
+    Returns:
+        A tuple of (sae_module, n_features, label_dict) where sae_module is the loaded SAE
+        model, n_features is the SAE's feature count, and label_dict maps feature IDs to labels.
+    """

As per coding guidelines: "Use Google-style docstrings (pydocstyle convention) in Python code" (applies to **/*.py).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/dashboard.py`
around lines 60 - 402, Several top-level functions (e.g., _add_sae_args,
parse_args, _load_sae_only, _write_label_columns, _iter_sampled_activations,
_compute_layout, run_atlas, _pass1_max_acts, _pass2_examples, run_examples,
main) have short or non-Google-style docstrings; update each to a full
Google-style docstring including a one-line summary plus Args: (describe
parameters and types), Returns: (describe return values and types) and Raises:
(list raised exceptions) where applicable, preserving existing descriptions and
mentioning side effects (I/O, file writes, device movement) so they comply with
pydocstyle conventions.

Source: Coding guidelines

60-402: 🏗️ Heavy lift

Add type hints to comply with Pyright guideline.

The coding guidelines require "Use Pyright for type checking in Python files following pyproject.toml configuration," but this file contains no type annotations on function parameters or return values. Adding type hints improves IDE support, catches type errors early, and documents expected types.

Example:

📝 Type hint examples for key functions

 def _add_sae_args(p):
+def _add_sae_args(p: argparse.ArgumentParser) -> None:
     """Args common to both modes: the SAE, labels, layer, device, output, UMAP knobs."""

-def parse_args():
+def parse_args() -> argparse.Namespace:
     """Parse the `atlas` / `examples` subcommand and its options."""

-def _load_sae_only(args):
+def _load_sae_only(args: argparse.Namespace) -> tuple[torch.nn.Module, int, dict[int, str]]:
     """Load just the SAE (+ labels) by reusing the engine's loaders — no 7B / megatron.

As per coding guidelines: "Use Pyright for type checking in Python files following pyproject.toml configuration" (applies to **/*.py).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/dashboard.py`
around lines 60 - 402, Add explicit Pyright-friendly type hints to the functions
in this file: annotate _add_sae_args(p: argparse.ArgumentParser) -> None;
parse_args() -> argparse.Namespace; _load_sae_only(args) ->
Tuple[torch.nn.Module, int, Dict[int, str]]; _write_label_columns(n_features:
int, labels: Mapping[int, str]) -> Tuple[List[int], List[str]];
_iter_sampled_activations(shards: Sequence[Path], sample_tokens: int,
batch_size: int) -> Iterator[torch.Tensor]; _compute_layout(sae, args) ->
Tuple[np.ndarray, np.ndarray, str]; run_atlas(args) -> None;
_pass1_max_acts(eng, seqs: Sequence[str], tag: str, tag_len: int, batch_size:
int) -> torch.Tensor; _pass2_examples(eng, seqs: Sequence[str], ids:
Sequence[str], tag: str, tag_len: int, top_idx: torch.Tensor, peak:
torch.Tensor, labels: Mapping[int, str], max_example_bp: int, batch_size: int)
-> List[Dict[str, Any]]; run_examples(args) -> None; main() -> None. Import
required typing names (Iterator, Tuple, List, Dict, Mapping, Sequence, Any) and
consider adding "from __future__ import annotations" to keep runtime cost low;
update signatures only (no logic changes).

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/dashboard.py`:
- Around line 162-167: The try/except around importing umap currently catches
all exceptions and masks real errors; change the broad except Exception to
except ImportError so only a missing dependency switches method to "pca" while
other import errors still surface; update the try block that imports umap and
sets method = "umap" to only fall back to method = "pca" on ImportError.

---

Duplicate comments:
In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/GenerativeSteering.jsx`:
- Around line 39-59: The generate handler must validate nTokens before sending:
parse nTokens into a numeric integer (e.g., const tokens = Number(nTokens) or
parseInt(nTokens, 10)), verify Number.isInteger(tokens) && tokens >= 1, and if
invalid call setError(...) and early-return (and setBusy(false) if needed) to
prevent submission; then use that validated tokens value for the request body as
n_tokens. Update the generate function (references: generate, nTokens,
setNTokens, setError, setBusy, setResult, postJSON) to perform this check and
avoid sending Number('') -> 0.

---

Nitpick comments:
In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/dashboard.py`:
- Around line 60-402: Several top-level functions (e.g., _add_sae_args,
parse_args, _load_sae_only, _write_label_columns, _iter_sampled_activations,
_compute_layout, run_atlas, _pass1_max_acts, _pass2_examples, run_examples,
main) have short or non-Google-style docstrings; update each to a full
Google-style docstring including a one-line summary plus Args: (describe
parameters and types), Returns: (describe return values and types) and Raises:
(list raised exceptions) where applicable, preserving existing descriptions and
mentioning side effects (I/O, file writes, device movement) so they comply with
pydocstyle conventions.
- Around line 60-402: Add explicit Pyright-friendly type hints to the functions
in this file: annotate _add_sae_args(p: argparse.ArgumentParser) -> None;
parse_args() -> argparse.Namespace; _load_sae_only(args) ->
Tuple[torch.nn.Module, int, Dict[int, str]]; _write_label_columns(n_features:
int, labels: Mapping[int, str]) -> Tuple[List[int], List[str]];
_iter_sampled_activations(shards: Sequence[Path], sample_tokens: int,
batch_size: int) -> Iterator[torch.Tensor]; _compute_layout(sae, args) ->
Tuple[np.ndarray, np.ndarray, str]; run_atlas(args) -> None;
_pass1_max_acts(eng, seqs: Sequence[str], tag: str, tag_len: int, batch_size:
int) -> torch.Tensor; _pass2_examples(eng, seqs: Sequence[str], ids:
Sequence[str], tag: str, tag_len: int, top_idx: torch.Tensor, peak:
torch.Tensor, labels: Mapping[int, str], max_example_bp: int, batch_size: int)
-> List[Dict[str, Any]]; run_examples(args) -> None; main() -> None. Import
required typing names (Iterator, Tuple, List, Dict, Mapping, Sequence, Any) and
consider adding "from __future__ import annotations" to keep runtime cost low;
update signatures only (no logic changes).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c9062082-12bd-4acb-872a-6b64eb650205

📥 Commits

Reviewing files that changed from the base of the PR and between 3284d12 and f68f730.

📒 Files selected for processing (12)

bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/index.html
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/package.json
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/App.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/EmbeddingView.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/GenerativeSteering.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/SequenceInspector.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/utils.js
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/pyproject.toml
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/dashboard.py
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/launch_dashboard.py
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/src/evo2_sae/server.py
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/tests/test_server.py

🚧 Files skipped from review as they are similar to previous changes (8)

bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/utils.js
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/package.json
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/src/evo2_sae/server.py
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/tests/test_server.py
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/launch_dashboard.py
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/SequenceInspector.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/EmbeddingView.jsx
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/feature_explorer/src/App.jsx

coderabbitai · 2026-06-12T05:02:18Z

+        try:
+            import umap  # noqa: F401
+
+            method = "umap"
+        except Exception:
+            method = "pca"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Narrow the exception handler to ImportError.

The broad except Exception: on line 166 will catch all exceptions during the umap import attempt, potentially masking real errors (e.g., syntax errors, attribute errors, or compatibility issues in the umap module itself) rather than just a missing dependency. This can hide failures that should surface during development or deployment.

🔒 Proposed fix

if method == "auto": try: import umap # noqa: F401 method = "umap" - except Exception: + except ImportError: method = "pca" print("[atlas] umap-learn unavailable (NumPy/numba) — falling back to --layout pca")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/dashboard.py` around lines 162 - 167, The try/except around importing umap currently catches all exceptions and masks real errors; change the broad except Exception to except ImportError so only a missing dependency switches method to "pca" while other import errors still surface; update the try block that imports umap and sets method = "umap" to only fall back to method = "pca" on ImportError.

jwilber · 2026-06-12T15:36:39Z

This used to be static, right? Why is it on a live backend now? Is that optional?

We mentioned rolling the Atlas out into it's own separate library, would this support that or require rewriting the setup of the other Atlas?

(Also, can you add some screenshots? :) )

polinabinder1 · 2026-06-12T15:40:07Z

@jwilber There is a vizualization demo here: https://5176-shzs9ka8n.brevlab.com/
This has a live back end because we have the steering and generation demo, which require that now.

#1637 was re-landed on the migrated #1622 (new top-level layout); #1623 is stacked on it, so it's layered onto the new #1637 by hand (the old dashboard branch predates the #1637 hardening, so a whole-file diff would revert it). - Clean adds: feature_explorer/ (the React dashboard, incl. the gene_embed firing-column reduction, request timeouts, per-pane descriptions, shared components.jsx, crash-safe localStorage), scripts/{dashboard,launch_dashboard}.py, tests/test_launch_dashboard.py. - server.py: add GeneEmbedRequest + /gene_embed (uses core.clean_dna + _require_ready). - core.py: add Evo2SAE.embed_bundle (ships only firing columns + feature_ids, not the full 65536-wide matrix). - tests: gene_embed contract test in test_server.py; bind embed_bundle on the conftest FakeEngine. - pyproject: add scikit-learn (scripts/dashboard.py atlas PCA/TSNE). Validated in the evo2_megatron venv: CPU 44 passed; frontend `vite build` clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

Move the API routes under an /api prefix (one APIRouter + include_router) and, when a built frontend is configured (build_app(static_dir=...) or DASHBOARD_DIST env), mount it at / via StaticFiles(html=True). This lets a single container serve both the dashboard and the API on one origin: the frontend always calls /api/* (in dev via the Vite proxy, in prod from the same server). The static mount is generic — it serves whatever dir it's pointed at and knows nothing about the dashboard; the dashboard recipe (#1623) supplies the dir + the Docker build. With no frontend configured the server is API-only and / 404s (never crashes). Startup already tolerates a load failure (stays not-ready -> 503), so a frontend+API smoke needs no GPU/checkpoints. Tests: re-point existing contract tests to /api/*, add SPA-index/asset served, API-reachable-under- prefix, unknown-/api-is-404-not-SPA, and API-only-when-no-frontend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

it's layered onto the new #1637 by hand (the old dashboard branch predates the #1637 hardening, so a whole-file diff would revert it). - Clean adds: feature_explorer/ (the React dashboard, incl. the gene_embed firing-column reduction, request timeouts, per-pane descriptions, shared components.jsx, crash-safe localStorage), scripts/{dashboard,launch_dashboard}.py, tests/test_launch_dashboard.py. - server.py: add GeneEmbedRequest + /gene_embed (uses core.clean_dna + _require_ready). - core.py: add Evo2SAE.embed_bundle (ships only firing columns + feature_ids, not the full 65536-wide matrix). - tests: gene_embed contract test in test_server.py; bind embed_bundle on the conftest FakeEngine. - pyproject: add scikit-learn (scripts/dashboard.py atlas PCA/TSNE). Validated in the evo2_megatron venv: CPU 44 passed; frontend `vite build` clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

no bionemo-recipes/ prefix) and squash-replayed, so #1637 is layered on by hand rather than rebased. - Clean adds: src/evo2_sae/{server,cli}.py, scripts/launch_inference.sh, tests/test_{cli,server}.py. - pyproject: add pandas/fastapi/uvicorn/anyio. - tests/conftest.py: keep #1622's 1B GPU fixtures + bionemo.common loader; append the serve-layer FakeEngine + fake_engine fixture. - core.py (semantic merge): keep #1622's _sanitize_steering (all CPU sanitize tests) and fold in the explicit non-finite-strength guard (no min/max arg-order reliance); add the shared annotate() + parse_clamp_spec() (CLI strings ⇄ API dicts) and feed parse_clamp_spec in front of _sanitize_steering; add _is_unrecoverable_cuda + flip the engine not-ready on an unrecoverable CUDA fault in generate(). (Kept _sanitize_steering rather than swapping in _normalize_clamps — non-redundant and preserves #1622's sampler hardening + tests.) - test_steering.py: keep #1622's sanitize + GPU tests; add the _is_unrecoverable_cuda test. Preserved from #1622: clamp_hook canonical encode/decode, TopKSAE-only _load_sae (no ReLU), bionemo.common(/core fallback) loader. Validated in the evo2_megatron venv: CPU 38 passed, GPU test_steering 13 passed on the 1B (ran, not skipped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

… int port, fake shape 1. /annotate pick mode now range-checks user-supplied feature_ids -> 400 (was: out-of-range IndexError -> 500, negative id silently indexed the wrong feature via torch negative-index). + test_annotate_pick_rejects_out_of_range_id. 2. core.generate rejects an over-context prompt ("too long" -> server 413), instead of letting tokenize() silently truncate it — makes the /generate 413 branch live and matches /annotate. + test_generate_rejects_overlong_prompt. 3. cli.py: int() the env-var defaults (PORT/EMBEDDING_LAYER/MAX_SEQ_LEN) — argparse type= only coerces command-line values, so `serve` was handing uvicorn a str port. 4. conftest FakeEngine.generate now returns features keyed {id, label, strength} (the real feat_meta shape the dashboard consumes), not {feature_id, strength}; test_cli updated so the contract test pins the real API shape. 5. Note body-size limit is advisory (Content-Length only; chunked/lying bypasses). 6. Note the CUDA-wedge guard depends on a readiness-based recycler (else 503 until manual restart). Validated in the evo2_megatron venv: CPU 40 passed (was 38), GPU unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

…art loop) A device-side assert poisons the process's CUDA context (unrecoverable in-process), so ready=False alone only recovers under a readiness-based recycler. Add restart-on-exit recovery, which almost every host provides: - core.generate: on an unrecoverable CUDA fault, if EXIT_ON_CUDA_WEDGE=1, os._exit(1) the worker (after ready=False). Default unset -> just fail-closed at 503 (safe for library/CLI/test use). - launch_inference.sh: for `serve`, export EXIT_ON_CUDA_WEDGE=1 and wrap in a restart loop (respawn on crash/wedge exit; stop on clean exit / Ctrl-C 130 / SIGTERM 143). Recovery now works with no external orchestrator (and composes with docker --restart / systemd / k8s). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

… 413 test - launch_inference.sh: stop managing the venv — assume it's already active (Docker: on PATH; bare metal: source the evo2_megatron .venv first, like the tests). Drops the messy VENV= passing; adds a clear "bionemo.evo2 not importable" preflight. - Restart loop signal fix (was a graceful-shutdown regression): run the worker in the background and `wait`, with a trap that forwards SIGTERM/SIGINT to it (uvicorn graceful shutdown) and stops the loop — so `docker stop`/k8s on PID 1 no longer orphans the worker. Adds a 10-restart cap + backoff so a persistent crash (e.g. port already bound) doesn't loop forever. Smoke-tested: SIGTERM stops in ~1s, not the worker's full lifetime. - /generate 413 now pinned at the server layer: FakeEngine raises "too long" past max_seq_len and test_generate_rejects_too_long drives POST /generate -> 413 (was only covered via test_core). - Reframe the CUDA-wedge comment: it's PURELY DEFENSIVE — _sanitize_steering neutralizes every client-reachable assert trigger, so a wedge implies a hardware/driver fault, not a crafted request (exit+restart is not a remote DoS). New triggers must extend _sanitize_steering. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

Move the API routes under an /api prefix (one APIRouter + include_router) and, when a built frontend is configured (build_app(static_dir=...) or DASHBOARD_DIST env), mount it at / via StaticFiles(html=True). This lets a single container serve both the dashboard and the API on one origin: the frontend always calls /api/* (in dev via the Vite proxy, in prod from the same server). The static mount is generic — it serves whatever dir it's pointed at and knows nothing about the dashboard; the dashboard recipe (#1623) supplies the dir + the Docker build. With no frontend configured the server is API-only and / 404s (never crashes). Startup already tolerates a load failure (stays not-ready -> 503), so a frontend+API smoke needs no GPU/checkpoints. Tests: re-point existing contract tests to /api/*, add SPA-index/asset served, API-reachable-under- prefix, unknown-/api-is-404-not-SPA, and API-only-when-no-frontend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

it's layered onto the new #1637 by hand (the old dashboard branch predates the #1637 hardening, so a whole-file diff would revert it). - Clean adds: feature_explorer/ (the React dashboard, incl. the gene_embed firing-column reduction, request timeouts, per-pane descriptions, shared components.jsx, crash-safe localStorage), scripts/{dashboard,launch_dashboard}.py, tests/test_launch_dashboard.py. - server.py: add GeneEmbedRequest + /gene_embed (uses core.clean_dna + _require_ready). - core.py: add Evo2SAE.embed_bundle (ships only firing columns + feature_ids, not the full 65536-wide matrix). - tests: gene_embed contract test in test_server.py; bind embed_bundle on the conftest FakeEngine. - pyproject: add scikit-learn (scripts/dashboard.py atlas PCA/TSNE). Validated in the evo2_megatron venv: CPU 44 passed; frontend `vite build` clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

…o image) Make the dashboard + API deployable as ONE container so a public-repo user (or coworker) can launch it with a single 'docker run', no Node and no second process: * Dockerfile: add a Node build stage that compiles feature_explorer -> static dist/, COPY it into the runtime image, and set DASHBOARD_DIST so server.py serves it at / (Node lives only in the build stage, never in the runtime GPU image). * vite.config.js: drop the /api rewrite — proxy /api straight through to :8001/api. Dev now hits the same paths as production (the server serves the API under /api), so there's no dev/prod drift. * move the /gene_embed endpoint onto the /api router (rebased onto the new server.py); frontend already calls ${BACKEND}/gene_embed (= /api/gene_embed), so no frontend code change. * re-point the gene_embed contract test to /api/gene_embed. Dev mode (vite + launch_dashboard.py) is unchanged for UI iteration; the baked-in static build is the deploy path. Frontend compile is exercised by the image build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

…ting Two related transparency fixes: * Each tab now shows a collapsible 'Limitations' disclosure under its description, with caveats grounded in real behavior (atlas is a static snapshot + 2-D UMAP; steering clamp capped ±300 and only affects the continuation; inspector reads one layer and rejects over-length input; sequence UMAP is stochastic 2-D, mean/max pooled, min_firing-filtered, 1000-seq cap). * /gene_embed no longer silently truncates/drops: sequences past the 1000 cap, shorter than 3 nt, or longer than the context limit are skipped (not truncated into a misleading vector) and the counts are returned (n_received/n_skipped_short/n_skipped_too_long/n_dropped_over_cap/max_seq_len/max_genes). The Sequence-UMAP tab surfaces a warning banner; an all-invalid request 400s with an accounting message. Matches /annotate's reject-don't-truncate behavior. Tests: assert the new accounting fields, that too-short/over-length sequences are reported (not embedded), and that an all-invalid request 400s. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

… modes Document the actual ways to run now that the API is under /api and the frontend can be baked into the image: (1) single container (docker run, UI + /api on one port, no Node at runtime) for sharing; (2) local dev (Vite + backend, Node >=18, tsh tunnel of :5176, /api proxied straight through with no rewrite); (3) offline/static (precompute artifacts, atlas always + sequence-UMAP from a bundle). Adds a tab summary and notes the in-app per-tab Limitations. Touches only this PR's README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

…red helper App.jsx read the featureTitle_<id> localStorage key directly in 5 places; route them through components.userLabel (single source of truth for the in-UI rename convention, and crash-safe via its try/catch). No behavior change. Build verified (npm run build green). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

…P banner reads The Sequence-UMAP warning banner interpolates max_genes and max_seq_len from the response, but the contract test only asserted the four counts. Assert those two fields too so the UI message can't silently break on a field rename. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

Add tests for the error branches that lacked coverage: /api/annotate and /api/gene_embed reject an unknown organism with 400 (not a 500 from a None tag), and build_app with a bogus static_dir degrades to API-only (/ 404s, /api still serves) instead of crashing at mount time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

…ngine-only The multi-stage Dockerfile previously always ran the Node front-end build, coupling the SAE engine image to the dashboard toolchain (an SAE-only user paid the npm build and a broken/networkless front-end blocked the image build). Restructure into engine / engine-with-dashboard stages selected by --build-arg WITH_DASHBOARD (default 0): the default build is engine + server only and never pulls Node or builds the front-end; WITH_DASHBOARD=1 also bakes the static dashboard in and sets DASHBOARD_DIST so one container serves UI + API. No runtime change either way. README updated. Note: not build-validated here (no docker in this env); the FROM final-${WITH_DASHBOARD} stage selection is a standard BuildKit idiom. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

The atlas tab loads its three parquets from the served dir; they aren't baked in (per-SAE generated data). Document that they just need to land in $DASHBOARD_DIST — generate them in-container via dashboard.py --output-dir $DASHBOARD_DIST, or cp a pre-made set in before serve. No code/staging mechanism needed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

## Summary FastAPI **server + CLI** over the Evo2SAE engine (#1622). Thin wrappers — all model work lives in `core.py` — plus the input validation, resource governance, and recovery needed for a shared backend (runs behind NVIDIA SSO on Brev, reachable by many users). API routes live under **`/api`**, and the server can mount a prebuilt front-end at `/`, so the dashboard (#1623) and the API can be served from **one origin / one container**. **Rebased onto the single-engine #1622** (one inference engine serves both encode and generate; new top-level layout `interpretability/sparse_autoencoders/…`). ## Contents (new layout) - `…/src/evo2_sae/server.py` — `/api/health`, `/api/features`, `/api/annotate`, `/api/generate` (+ optional static-frontend mount at `/`) - `…/src/evo2_sae/cli.py` — `serve` / `encode` / `batch` / `generate` - `…/scripts/launch_inference.sh`; CPU contract tests `tests/test_cli.py`, `tests/test_server.py` + the shared `FakeEngine` appended to #1622's `tests/conftest.py` ## Shared logic (CLI ⇄ server live in `core`) - **`core.annotate(engine, …)`** — clean → resolve-tag → encode → tag-len, behind both CLI `encode` and server `/api/annotate`. - **`core.parse_clamp_spec(spec)`** — one parser for clamps as CLI `"ID[:STRENGTH]"` strings or server `FeatureClamp` JSON; fed in front of #1622's `_sanitize_steering` so both surfaces validate identically. ## Single-origin serving (`/api` + optional static mount) - API routes are grouped under **`/api`** (one `APIRouter` + `include_router`). - `build_app(engine, static_dir=None)` mounts a prebuilt front-end at `/` via `StaticFiles(html=True)` when `static_dir` (or the `DASHBOARD_DIST` env) points at a real directory; otherwise the server is **API-only** and `/` 404s (never crashes). The mount is generic — it serves whatever dir it's pointed at and knows nothing about the dashboard; #1623 supplies the dir + the Docker build that produces it. - This is what lets a single container serve UI + API on one port. Dev hits the same `/api/*` paths (the Vite proxy forwards `/api` without rewriting), so there's no dev/prod path drift. ## Reliability & governance - **`/api/health` 503 until ready** so readiness probes don't route to a still-loading pod; a startup load failure is caught and leaves the engine not-ready (503) rather than crashing. - **Length limits** — `/api/annotate` and `/api/generate` reject input longer than `max_seq_len` (**413**) instead of silently truncating (which would misalign the per-base `activations`/`bases` the viz plots). Generation length is otherwise auto-capped to the remaining context (no fixed token cap). - **Pick-id validation** — `/api/annotate` `mode=pick` range-checks user-supplied `feature_ids` → **400** (an out-of-range id would otherwise 500 on `IndexError`, a negative one would silently return the wrong feature). - **Steering sanitation** — out-of-range ids, extreme/non-finite strengths, `temperature<=0`, negative `top_k` are all rejected/coerced before the GPU (`_sanitize_steering`). - **CUDA-wedge recovery** — a device-side assert poisons the process's CUDA context (unrecoverable in-process). Not client-inducible (sanitation covers the reachable triggers — purely defensive), but if it happens `generate()` flips the engine not-ready (→ 503) and, when `EXIT_ON_CUDA_WEDGE=1` (set by `serve`), exits the worker so any restart-on-exit supervisor respawns it — host-independent recovery. - **Signal-safe serve** — `launch_inference.sh serve` runs the worker in the background, forwards `SIGTERM`/`SIGINT` (uvicorn graceful shutdown) before respawning, with a retry cap + backoff, so `docker stop`/k8s shuts down cleanly instead of orphaning the worker. - **Request body-size limit** (`MAX_BODY_BYTES`, default 16 MiB) → 413 — advisory (trusts `Content-Length`). - **Bounded concurrency** — Starlette's sync-endpoint threadpool capped (`MAX_CONCURRENCY`, default 8); the engine lock already serializes the single GPU. ## Architectural decisions - **Two layers: engine vs. surface.** All model work stays in `core.Evo2SAE` (#1622); `server.py`/`cli.py` are thin and share `core.annotate` + `core.parse_clamp_spec`, so the HTTP API and the CLI can't drift and there's one validated path. - **FastAPI, not raw/Flask.** We get pydantic structural validation + an async threadpool we can bound (`MAX_CONCURRENCY`) for almost no code; the domain validation that matters (`_sanitize_steering`, pick-id range) is manual either way. Raw Python would hand-roll routing/validation/concurrency; Flask would add the threadpool governance by hand. - **No app-level auth.** Deployed behind NVIDIA SSO on Brev; auth is the proxy's job, not duplicated here (CORS removed too — calls are same-origin). - **Single GPU, serialized.** The engine lock + bounded threadpool match one GPU; data-parallel replicas behind a balancer are a deferred follow-up (touches no engine code). - **`/api` prefix + generic static mount** (above) so one origin/container can serve both UI and API. ## How to run Run **inside the evo2_megatron venv** (provides `bionemo.evo2` + megatron); in the Docker image it's already active. Full dashboard run modes are in #1623's `feature_explorer/README.md`. ```bash export EVO2_CKPT_DIR=<mbridge> SAE_CKPT_PATH=<sae.pt> export FEATURE_ANNOTATIONS=<feature_metadata.parquet> EMBEDDING_LAYER=26 scripts/launch_inference.sh serve # API on :8001 (+ UI at / if DASHBOARD_DIST set) scripts/launch_inference.sh encode --sequence ATGC... # one sequence -> top features (JSON) scripts/launch_inference.sh batch --fasta in.fa --out out.parquet # many -> parquet scripts/launch_inference.sh generate --prompt ATGC... --clamp 29244:300 # steered generation ``` Tunables (env): `MAX_BODY_BYTES`, `MAX_CONCURRENCY`, `MAX_SEQ_LEN`, `PORT`, `EXIT_ON_CUDA_WEDGE`, `DASHBOARD_DIST`. ## Tests No dedicated CI lane (deferred — see #1622). Run them via the recipe's build script: ```bash cd interpretability/sparse_autoencoders/recipes/evo2 bash .ci_build.sh && source .ci_test_env.sh pytest tests/ ``` - **CPU (no model):** `test_cli.py` + `test_server.py` (FastAPI `TestClient` + `FakeEngine`: response shapes, 400/413/503, pick out-of-range → 400, `/api/generate` too-long → 413, body-size, k-bounds, clamp validation, **static-frontend mount: SPA at `/`, asset served, API reachable under `/api`, unknown `/api/*` → 404, API-only when no frontend**), plus #1622's `test_core.py` + `test_steering.py` sanitize guards. - **GPU:** `test_steering.py` — encode, in-distribution generation, steering changes the continuation, batched/empty encode, max-clamp finite, **highlight↔steer interleaving** (single-engine state-bleed check). Gated by `@pytest.mark.skipif(not torch.cuda.is_available())` — runs on a GPU box, skips otherwise. Validated on the 1B; the single-engine backend also serves the **7B at layer 26** live. ## Deferred follow-up Multi-GPU **data-parallel replicas** (one worker per GPU behind a `least_conn` balancer) for concurrent throughput — touches no engine code; left until concurrency is an observed need. **Stacked on #1622.** The dashboard (#1623) builds on this. --------- Signed-off-by: Polina Binder <pbinder@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Re-add .github/workflows/unit-tests-interpretability-recipes.yaml (removed in c31d4e9 "defer CI for now"), recovered verbatim from 9bedf2b. Path-gated GPU lane (L4 + megatron squashed image) that builds the evo2 SAE recipe via .ci_build.sh and runs pytest tests/ — including the GPU steering/encode tests. Rooted on this PR (#1622), so the stacked SAE PRs (#1623/#1629/#1635) inherit it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vite.config.js binds 0.0.0.0 for remote access but omitted server.allowedHosts, so reaching the dev server via a public hostname (brev/ngrok/Codespaces) instead of an ssh -L localhost tunnel hit Vite's Host check ('host not allowed'). Pre-allow the common tunnel domains and document it in the README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: polinabinder1 <pbinder@nvidia.com> # Conflicts: # interpretability/sparse_autoencoders/recipes/evo2/pyproject.toml # interpretability/sparse_autoencoders/recipes/evo2/src/evo2_sae/server.py # interpretability/sparse_autoencoders/recipes/evo2/tests/conftest.py # interpretability/sparse_autoencoders/recipes/evo2/tests/test_server.py

Signed-off-by: polinabinder1 <pbinder@nvidia.com>

polinabinder1 requested review from jstjohn, jwilber, pstjohn and trvachov as code owners June 10, 2026 21:27

polinabinder1 mentioned this pull request Jun 10, 2026

evo2 SAE feature-explorer dashboard #1604

Closed

polinabinder1 marked this pull request as draft June 10, 2026 21:28

polinabinder1 mentioned this pull request Jun 10, 2026

evo2 SAE: inference engine + steering + server/CLI, tests, Dockerfile #1622

Open

polinabinder1 force-pushed the pbinder/evo2-sae-dashboard branch from cee51a3 to 2c4e69e Compare June 11, 2026 00:38

polinabinder1 changed the base branch from main to pbinder/evo2-sae-serve June 11, 2026 00:38

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

polinabinder1 changed the base branch from pbinder/evo2-sae-serve to pbinder/evo2-steering June 11, 2026 22:06

polinabinder1 changed the base branch from pbinder/evo2-steering to pbinder/evo2-sae-serve June 11, 2026 22:30

This was referenced Jun 12, 2026

[DRAFT] Evo 2 SAE feature explorer — visualization mockup #1582

Closed

evo2 SAE serve: FastAPI server + CLI (on the engine) #1637

Merged

polinabinder1 changed the base branch from pbinder/evo2-sae-serve to pbinder/evo2-sae-server June 12, 2026 04:07

coderabbitai Bot reviewed Jun 12, 2026

View reviewed changes

polinabinder1 marked this pull request as ready for review June 12, 2026 05:32

polinabinder1 requested a review from savitha-eng as a code owner June 12, 2026 05:32

polinabinder1 force-pushed the pbinder/evo2-sae-dashboard branch from a47088c to 8325741 Compare June 23, 2026 06:35

polinabinder1 force-pushed the pbinder/evo2-sae-server branch from a819d98 to dc46ad5 Compare June 23, 2026 18:50

polinabinder1 force-pushed the pbinder/evo2-sae-dashboard branch from 8325741 to b7843e3 Compare June 23, 2026 18:53

polinabinder1 force-pushed the pbinder/evo2-sae-dashboard branch from b7843e3 to dbd1792 Compare June 23, 2026 19:28

polinabinder1 and others added 5 commits June 24, 2026 03:57

polinabinder1 force-pushed the pbinder/evo2-sae-server branch from cdda2f7 to 28e49be Compare June 24, 2026 03:58

polinabinder1 and others added 9 commits June 24, 2026 03:59

polinabinder1 force-pushed the pbinder/evo2-sae-dashboard branch from ba7434b to 6919590 Compare June 24, 2026 04:00

Base automatically changed from pbinder/evo2-sae-server to pbinder/evo2-sae-serve June 29, 2026 20:27

polinabinder1 mentioned this pull request Jun 30, 2026

ci: GPU matrix lane for interpretability SAE recipes (merge-first, presence-guarded) #1667

Merged

polinabinder1 and others added 3 commits June 30, 2026 03:36

Merge remote-tracking branch 'origin/pbinder/evo2-sae-serve' into HEAD

d558bae

Signed-off-by: polinabinder1 <pbinder@nvidia.com>

		setSelectedFeatureIds(ids.size > 0 && ids.size < totalFeatures ? ids : null)
		} catch (err) {

Uh oh!

Conversation

polinabinder1 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How to run

What this PR adds

Architectural decisions

Running the tests

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

copy-pr-bot Bot commented Jun 10, 2026

Uh oh!

polinabinder1 commented Jun 11, 2026

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

polinabinder1 commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

jwilber commented Jun 12, 2026

Uh oh!

polinabinder1 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

polinabinder1 commented Jun 10, 2026 •

edited

Loading

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

polinabinder1 commented Jun 12, 2026 •

edited

Loading