fix(unofficial-run): auto-switch model when URL loads run for unselected model (#243)

Oseltamivir · claude · github-actions[bot] · web-flow · commit e7bb1d0cefe9 · 2026-05-06T17:20:33.000-05:00
* feat: support multiple comma-separated run IDs for unofficial runs

Accept `?unofficialrun=123,456,789` on the dashboard URL to merge
benchmark and evaluation data from multiple GitHub Actions runs into
a single view. Each run's benchmarks are tagged with their originating
run_url for per-point traceability, and eval config ids are offset
per-run to avoid collisions in the merged set. A NON-OFFICIAL banner
is rendered per run.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* feat(unofficial-run): hue-shift overlay strokes per run index

When multiple unofficial runs are loaded, overlay points/rooflines for
the same GPU were rendered in identical colors, making it impossible to
tell runs apart. Derive a per-run hue rotation from the run's position
in the loaded set and apply it via CSS filter — run 0 unchanged, each
subsequent run shifted by 55°. Roofline grouping now includes runIndex
so each run gets its own Pareto front.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* feat(unofficial-run): extend per-run hue shift to evaluation overlays

BarChartD3's X-mark overlay points and their error-bar groups now use
the same per-run hue rotation as the inference scatter overlay, so runs
loaded via a comma-separated unofficialrun= list are visually separable
on the evaluation tab too. Extracts the shared filter and runIndex
helpers into lib/overlay-run-style.ts to avoid duplication.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* fix: map dsv4pro benchmark prefix to dsv4 DB key

Benchmark artifacts for DeepSeek-V4-Pro runs (e.g. run 24884703163)
emit `infmax_model_prefix: "dsv4pro"` while the canonical DB key is
`dsv4`. Without an alias the prefix resolver fell through all three
strategies (direct match, alias table, precision-suffix strip) and
every row was dropped as `unmappedModel`, so unofficial-run queries
for these runs returned an empty benchmark set.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* fix(unofficial-run): make per-run hue shift actually visible

Three stacked fixes so multiple unofficial runs don't all look the same:

1. Include overlay hw keys in the vendor-color active set so overlay
   strokes get a real hue instead of the muted-foreground fallback —
   hue-rotate on gray is a no-op, which was the main reason runs
   appeared identical.
2. Strengthen the per-run CSS filter: saturate(2.2) hue-rotate brightness(1.1),
   and widen the hue step from 55° to 80° for more separation.
3. Use a different stroke-dasharray per run index on overlay rooflines so
   runs stay distinguishable even when the filter can't produce a shift.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* fix(unofficial-run): use explicit per-run palette instead of CSS filter

The CSS-filter approach made the legend and chart diverge: the legend
rendered each overlay hwKey's vendor color (red for MI355X), while the
chart stroke got the same base color *plus* a hue-rotate filter that
shifted it to an unrelated hue. Since the legend's colored dot is a
direct backgroundColor style, there was no clean way to apply the same
filter to it.

Switch to an explicit OKLch palette indexed by run order — both the
overlay stroke and the legend swatch read from the same palette, so
they match exactly. Restructure the overlay legend section to show one
entry per loaded run (branch name) rather than per-hardware, since N
runs × M hardware keys can't collapse to a single color per hw.

Hardware identity for overlay points is still visible in the point
label and tooltip; the X-mark shape and legend branch labels carry the
run identity. Roofline dash-pattern per run is kept as a secondary
(colorblind-friendly) encoding.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* fix(unofficial-run): auto-switch model when URL loads a run for an unselected model

Navigating to ?unofficialrun=&lt;id&gt; when `g_model` isn't set in the URL
used to silently leave the dashboard on the default DeepSeek-R1 model.
If the run only contained data for a different model (e.g. the
DeepSeek-V4-Pro run 24889121634 on MI355X), the user saw no overlay
and had to know to manually switch the model dropdown.

Now, when an unofficial run is loaded and `g_model` wasn't provided,
auto-switch to the first model the run contributes data for — once,
so subsequent manual selections stick.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* fix(unofficial-run): re-trigger auto-switch when run set changes

The previous auto-switch used a one-shot ref, so navigating from one
unofficial run to another in the same session (e.g. swapping the runId
in the URL) wouldn't re-evaluate which model to land on. If a user had
been viewing run A on DeepSeek-V4-Pro and then navigated to run B that
also has DeepSeek-V4-Pro data, that's fine — but if run B has data for
a different model and the user happens to currently sit on a model
that B doesn't cover, they'd see an empty chart with no overlay.

Switch the guard to a stringified key of the (model, sequence) set
from the current unofficial run, so each new run set re-evaluates the
switch. Manual model changes while the same run is loaded still stick
because the key doesn't change.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* fix(normalizers): strip -cw hw suffix so gb300-cw maps to gb300

Run 24936260529 uses hw: "gb300-cw" which wasn't recognized.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;

* test(unofficial-run): extract auto-switch decision into pure helper + cover with tests

Address review suggestions on the model auto-switch effect:

- Extract decision logic into `computeAutoSwitchDecision` (pure, testable).
- Drop `sequence` from the dedupe key — the decision only branches on
  model, so encoding sequence just causes spurious re-evaluations on
  sequence-only deltas.
- Sort the unique model list before picking the auto-target so the
  choice is deterministic regardless of `Object.keys` ordering in
  `parseAvailableModelsAndSequences`.
- Clarify the comment so future readers know the URL-param check is the
  primary guard once URL-sync has fired; the dedupe ref only matters in
  the narrow window before sync runs and across run-set transitions.
- Add unit tests covering: empty overlay, switch on uncovered model,
  explicit `g_model` respected, current model already covered, manual
  model change after auto-switch sticks, ref re-arms on clear, sequence
  deltas don't re-fire, deterministic pick across insertion orders.

Co-authored-by: Alec Ibarra &lt;adibarra@users.noreply.github.com&gt;

---------

Co-authored-by: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
Co-authored-by: claude[bot] &lt;41898282+claude[bot]@users.noreply.github.com&gt;
Co-authored-by: Bryan Shan &lt;Oseltamivir@users.noreply.github.com&gt;
Co-authored-by: Alec Ibarra &lt;adibarra@users.noreply.github.com&gt;
diff --git a/packages/app/src/components/GlobalFilterContext.tsx b/packages/app/src/components/GlobalFilterContext.tsx
@@ -25,6 +25,7 @@ import {
   Sequence,
   SEQUENCE_OPTIONS,
 } from '@/lib/data-mappings';
+import { computeAutoSwitchDecision } from '@/lib/unofficial-run-auto-switch';
 import type { AvailabilityRow, WorkflowInfoResponse } from '@/lib/api';
 
 interface RunInfo {
@@ -172,6 +173,34 @@ export function GlobalFilterProvider({ children }: { children: ReactNode }) {
     });
   }, [availabilityRows, unofficialAvailable]);
 
+  // Auto-switch the selected model when an unofficial run is loaded that
+  // doesn't include the currently selected model. Without this, navigating
+  // to `?unofficialrun=<id>` while the default `g_model=DeepSeek-R1` sticks
+  // leaves the user staring at a chart with no overlay points — they'd have
+  // to know to open the dropdown and pick the run's model themselves.
+  //
+  // Precedence on first load: the `if (urlModel)` early-bail in
+  // `computeAutoSwitchDecision` is the primary guard for explicit `g_model`
+  // intent. The dedupe ref is a secondary guard for the narrow window after
+  // an auto-switch fires but before the URL-sync effect (below) writes
+  // `g_model` back to the URL — once that runs, `urlModel` is set on every
+  // subsequent render and the ref check is effectively redundant. The ref
+  // still matters across navigations between unofficial runs because it is
+  // reset whenever the overlay set goes empty.
+  const lastAutoSwitchKeyRef = useRef<string>('');
+  useEffect(() => {
+    const decision = computeAutoSwitchDecision(
+      unofficialAvailable,
+      getUrlParam('g_model'),
+      selectedModel,
+      lastAutoSwitchKeyRef.current,
+    );
+    lastAutoSwitchKeyRef.current = decision.nextKey;
+    if (decision.modelToSet !== null) {
+      setSelectedModel(decision.modelToSet);
+    }
+  }, [unofficialAvailable, selectedModel]);
+
   // Sequences available for the selected model (DB ∪ unofficial run for this model)
   const availableSequences = useMemo(() => {
     const unofficialSeqs = unofficialAvailable
diff --git a/packages/app/src/components/unofficial-run-provider.tsx b/packages/app/src/components/unofficial-run-provider.tsx
@@ -43,7 +43,7 @@ type UnofficialChartData = Record<
 
 const UNOFFICIAL_RUN_PARAM_RE = /^unofficialruns?$/i;
 
-interface AvailableModelSequence {
+export interface AvailableModelSequence {
   model: Model;
   sequence: Sequence;
   precisions: string[];
diff --git a/packages/app/src/lib/unofficial-run-auto-switch.test.ts b/packages/app/src/lib/unofficial-run-auto-switch.test.ts
@@ -0,0 +1,114 @@
+import { describe, expect, it } from 'vitest';
+
+import type { AvailableModelSequence } from '@/components/unofficial-run-provider';
+import { Model, Sequence } from '@/lib/data-mappings';
+
+import { computeAutoSwitchDecision } from './unofficial-run-auto-switch';
+
+function entry(model: Model, sequence: Sequence): AvailableModelSequence {
+  return { model, sequence, precisions: [] };
+}
+
+describe('computeAutoSwitchDecision', () => {
+  it('returns no-op and resets the key when no unofficial run is loaded', () => {
+    expect(computeAutoSwitchDecision([], undefined, Model.DeepSeek_R1, 'stale-key')).toEqual({
+      nextKey: '',
+      modelToSet: null,
+    });
+  });
+
+  it('switches to the run model when g_model is not pinned and current model is not covered', () => {
+    const run = [entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK)];
+    const decision = computeAutoSwitchDecision(run, undefined, Model.DeepSeek_R1, '');
+    expect(decision.modelToSet).toBe(Model.DeepSeek_V4_Pro);
+    expect(decision.nextKey).toBe(Model.DeepSeek_V4_Pro);
+  });
+
+  it('respects an explicit g_model URL param even when the run lacks that model', () => {
+    const run = [entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK)];
+    const decision = computeAutoSwitchDecision(run, Model.DeepSeek_R1, Model.DeepSeek_R1, '');
+    expect(decision.modelToSet).toBeNull();
+    // Ref must not be advanced — if the URL is later cleared we still want
+    // a fresh load of the same run to be able to fire the switch.
+    expect(decision.nextKey).toBe('');
+  });
+
+  it('does not switch when the current model is already covered by the overlay', () => {
+    const run = [
+      entry(Model.DeepSeek_R1, Sequence.OneK_OneK),
+      entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK),
+    ];
+    const decision = computeAutoSwitchDecision(run, undefined, Model.DeepSeek_R1, '');
+    expect(decision.modelToSet).toBeNull();
+    // Key still advances so we don't keep re-evaluating on every render.
+    expect(decision.nextKey).toBe([Model.DeepSeek_R1, Model.DeepSeek_V4_Pro].toSorted().join(','));
+  });
+
+  it('does not re-fire after a manual model change against the same run set', () => {
+    // Simulate the post-auto-switch state: ref already holds the run's key,
+    // user manually switched back to a model the run does not cover.
+    const run = [entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK)];
+    const lastKey = Model.DeepSeek_V4_Pro;
+    const decision = computeAutoSwitchDecision(run, undefined, Model.DeepSeek_R1, lastKey);
+    expect(decision.modelToSet).toBeNull();
+    expect(decision.nextKey).toBe(lastKey);
+  });
+
+  it('re-arms after the overlay set is cleared so a subsequent load can switch again', () => {
+    // Step 1: a run is loaded, switch fires.
+    const run = [entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK)];
+    const first = computeAutoSwitchDecision(run, undefined, Model.DeepSeek_R1, '');
+    expect(first.modelToSet).toBe(Model.DeepSeek_V4_Pro);
+
+    // Step 2: user dismisses the run, overlay set goes empty — ref resets.
+    const cleared = computeAutoSwitchDecision([], undefined, Model.DeepSeek_V4_Pro, first.nextKey);
+    expect(cleared).toEqual({ nextKey: '', modelToSet: null });
+
+    // Step 3: a *different* run is loaded with a different model. The cleared
+    // ref allows the switch to fire again.
+    const run2 = [entry(Model.Kimi_K2_5, Sequence.OneK_OneK)];
+    const second = computeAutoSwitchDecision(
+      run2,
+      undefined,
+      Model.DeepSeek_V4_Pro,
+      cleared.nextKey,
+    );
+    expect(second.modelToSet).toBe(Model.Kimi_K2_5);
+  });
+
+  it('ignores sequence-only changes in the dedupe key', () => {
+    // Same model, two sequences appearing across renders. The decision logic
+    // only branches on model, so the key should not change when a new
+    // sequence arrives for an already-covered model — otherwise the effect
+    // would re-evaluate (and bail) on every sequence delta.
+    const oneK = [entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK)];
+    const both = [
+      entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK),
+      entry(Model.DeepSeek_V4_Pro, Sequence.EightK_OneK),
+    ];
+    const first = computeAutoSwitchDecision(oneK, undefined, Model.DeepSeek_R1, '');
+    const second = computeAutoSwitchDecision(both, undefined, Model.DeepSeek_V4_Pro, first.nextKey);
+    expect(first.nextKey).toBe(second.nextKey);
+    expect(second.modelToSet).toBeNull();
+  });
+
+  it('picks the first model deterministically across insertion orders', () => {
+    // Same set of models in two different orders should produce the same
+    // auto-picked target — protecting against `Object.keys`-driven nondeterminism
+    // in `parseAvailableModelsAndSequences`.
+    const orderA = [
+      entry(Model.MiniMax_M2_5, Sequence.OneK_OneK),
+      entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK),
+      entry(Model.Kimi_K2_5, Sequence.OneK_OneK),
+    ];
+    const orderB = [
+      entry(Model.Kimi_K2_5, Sequence.OneK_OneK),
+      entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK),
+      entry(Model.MiniMax_M2_5, Sequence.OneK_OneK),
+    ];
+    const a = computeAutoSwitchDecision(orderA, undefined, Model.DeepSeek_R1, '');
+    const b = computeAutoSwitchDecision(orderB, undefined, Model.DeepSeek_R1, '');
+    expect(a.modelToSet).toBe(b.modelToSet);
+    expect(a.nextKey).toBe(b.nextKey);
+  });
+});
diff --git a/packages/app/src/lib/unofficial-run-auto-switch.ts b/packages/app/src/lib/unofficial-run-auto-switch.ts
@@ -0,0 +1,48 @@
+import type { AvailableModelSequence } from '@/components/unofficial-run-provider';
+import type { Model } from '@/lib/data-mappings';
+
+export interface AutoSwitchDecision {
+  /** New value the caller should write into the dedupe ref. */
+  nextKey: string;
+  /** Model to switch to, or null when no switch is needed. */
+  modelToSet: Model | null;
+}
+
+/**
+ * Pure decision helper for the unofficial-run auto-switch effect in
+ * `GlobalFilterContext`. Given the unofficial run's available models, the URL
+ * `g_model` param, the currently selected model, and the previous dedupe key,
+ * returns whether to swap `selectedModel` and what the new dedupe key should be.
+ *
+ * - When the overlay set is empty, the dedupe key is reset so the next load
+ *   re-arms the effect.
+ * - When the URL pinned `g_model` explicitly, no switch fires (respect intent).
+ * - Otherwise the dedupe key is the sorted unique list of overlay models — the
+ *   sequence dimension is intentionally excluded so a sequence-only delta does
+ *   not invalidate a manual model pick the user made earlier.
+ * - The first model is taken from a sorted unique list to keep the choice
+ *   deterministic across renders (insertion order from `Object.keys` is not
+ *   guaranteed for multi-model runs).
+ */
+export function computeAutoSwitchDecision(
+  unofficialAvailable: AvailableModelSequence[],
+  urlModel: string | undefined,
+  selectedModel: Model,
+  lastKey: string,
+): AutoSwitchDecision {
+  if (unofficialAvailable.length === 0) {
+    return { nextKey: '', modelToSet: null };
+  }
+  if (urlModel) {
+    return { nextKey: lastKey, modelToSet: null };
+  }
+  const sortedModels = [...new Set(unofficialAvailable.map((a) => a.model))].toSorted();
+  const key = sortedModels.join(',');
+  if (lastKey === key) {
+    return { nextKey: lastKey, modelToSet: null };
+  }
+  if (sortedModels.includes(selectedModel)) {
+    return { nextKey: key, modelToSet: null };
+  }
+  return { nextKey: key, modelToSet: sortedModels[0] };
+}