Skip to content

Commit e7bb1d0

Browse files
Oseltamivirclaudegithub-actions[bot]adibarra
authored
fix(unofficial-run): auto-switch model when URL loads run for unselected model (#243)
* feat: support multiple comma-separated run IDs for unofficial runs Accept `?unofficialrun=123,456,789` on the dashboard URL to merge benchmark and evaluation data from multiple GitHub Actions runs into a single view. Each run's benchmarks are tagged with their originating run_url for per-point traceability, and eval config ids are offset per-run to avoid collisions in the merged set. A NON-OFFICIAL banner is rendered per run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(unofficial-run): hue-shift overlay strokes per run index When multiple unofficial runs are loaded, overlay points/rooflines for the same GPU were rendered in identical colors, making it impossible to tell runs apart. Derive a per-run hue rotation from the run's position in the loaded set and apply it via CSS filter — run 0 unchanged, each subsequent run shifted by 55°. Roofline grouping now includes runIndex so each run gets its own Pareto front. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(unofficial-run): extend per-run hue shift to evaluation overlays BarChartD3's X-mark overlay points and their error-bar groups now use the same per-run hue rotation as the inference scatter overlay, so runs loaded via a comma-separated unofficialrun= list are visually separable on the evaluation tab too. Extracts the shared filter and runIndex helpers into lib/overlay-run-style.ts to avoid duplication. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: map dsv4pro benchmark prefix to dsv4 DB key Benchmark artifacts for DeepSeek-V4-Pro runs (e.g. run 24884703163) emit `infmax_model_prefix: "dsv4pro"` while the canonical DB key is `dsv4`. Without an alias the prefix resolver fell through all three strategies (direct match, alias table, precision-suffix strip) and every row was dropped as `unmappedModel`, so unofficial-run queries for these runs returned an empty benchmark set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(unofficial-run): make per-run hue shift actually visible Three stacked fixes so multiple unofficial runs don't all look the same: 1. Include overlay hw keys in the vendor-color active set so overlay strokes get a real hue instead of the muted-foreground fallback — hue-rotate on gray is a no-op, which was the main reason runs appeared identical. 2. Strengthen the per-run CSS filter: saturate(2.2) hue-rotate brightness(1.1), and widen the hue step from 55° to 80° for more separation. 3. Use a different stroke-dasharray per run index on overlay rooflines so runs stay distinguishable even when the filter can't produce a shift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(unofficial-run): use explicit per-run palette instead of CSS filter The CSS-filter approach made the legend and chart diverge: the legend rendered each overlay hwKey's vendor color (red for MI355X), while the chart stroke got the same base color *plus* a hue-rotate filter that shifted it to an unrelated hue. Since the legend's colored dot is a direct backgroundColor style, there was no clean way to apply the same filter to it. Switch to an explicit OKLch palette indexed by run order — both the overlay stroke and the legend swatch read from the same palette, so they match exactly. Restructure the overlay legend section to show one entry per loaded run (branch name) rather than per-hardware, since N runs × M hardware keys can't collapse to a single color per hw. Hardware identity for overlay points is still visible in the point label and tooltip; the X-mark shape and legend branch labels carry the run identity. Roofline dash-pattern per run is kept as a secondary (colorblind-friendly) encoding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(unofficial-run): auto-switch model when URL loads a run for an unselected model Navigating to ?unofficialrun=<id> when `g_model` isn't set in the URL used to silently leave the dashboard on the default DeepSeek-R1 model. If the run only contained data for a different model (e.g. the DeepSeek-V4-Pro run 24889121634 on MI355X), the user saw no overlay and had to know to manually switch the model dropdown. Now, when an unofficial run is loaded and `g_model` wasn't provided, auto-switch to the first model the run contributes data for — once, so subsequent manual selections stick. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(unofficial-run): re-trigger auto-switch when run set changes The previous auto-switch used a one-shot ref, so navigating from one unofficial run to another in the same session (e.g. swapping the runId in the URL) wouldn't re-evaluate which model to land on. If a user had been viewing run A on DeepSeek-V4-Pro and then navigated to run B that also has DeepSeek-V4-Pro data, that's fine — but if run B has data for a different model and the user happens to currently sit on a model that B doesn't cover, they'd see an empty chart with no overlay. Switch the guard to a stringified key of the (model, sequence) set from the current unofficial run, so each new run set re-evaluates the switch. Manual model changes while the same run is loaded still stick because the key doesn't change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(normalizers): strip -cw hw suffix so gb300-cw maps to gb300 Run 24936260529 uses hw: "gb300-cw" which wasn't recognized. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test(unofficial-run): extract auto-switch decision into pure helper + cover with tests Address review suggestions on the model auto-switch effect: - Extract decision logic into `computeAutoSwitchDecision` (pure, testable). - Drop `sequence` from the dedupe key — the decision only branches on model, so encoding sequence just causes spurious re-evaluations on sequence-only deltas. - Sort the unique model list before picking the auto-target so the choice is deterministic regardless of `Object.keys` ordering in `parseAvailableModelsAndSequences`. - Clarify the comment so future readers know the URL-param check is the primary guard once URL-sync has fired; the dedupe ref only matters in the narrow window before sync runs and across run-set transitions. - Add unit tests covering: empty overlay, switch on uncovered model, explicit `g_model` respected, current model already covered, manual model change after auto-switch sticks, ref re-arms on clear, sequence deltas don't re-fire, deterministic pick across insertion orders. Co-authored-by: Alec Ibarra <adibarra@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Bryan Shan <Oseltamivir@users.noreply.github.com> Co-authored-by: Alec Ibarra <adibarra@users.noreply.github.com>
1 parent b00742b commit e7bb1d0

4 files changed

Lines changed: 192 additions & 1 deletion

File tree

packages/app/src/components/GlobalFilterContext.tsx

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ import {
2525
Sequence,
2626
SEQUENCE_OPTIONS,
2727
} from '@/lib/data-mappings';
28+
import { computeAutoSwitchDecision } from '@/lib/unofficial-run-auto-switch';
2829
import type { AvailabilityRow, WorkflowInfoResponse } from '@/lib/api';
2930

3031
interface RunInfo {
@@ -172,6 +173,34 @@ export function GlobalFilterProvider({ children }: { children: ReactNode }) {
172173
});
173174
}, [availabilityRows, unofficialAvailable]);
174175

176+
// Auto-switch the selected model when an unofficial run is loaded that
177+
// doesn't include the currently selected model. Without this, navigating
178+
// to `?unofficialrun=<id>` while the default `g_model=DeepSeek-R1` sticks
179+
// leaves the user staring at a chart with no overlay points — they'd have
180+
// to know to open the dropdown and pick the run's model themselves.
181+
//
182+
// Precedence on first load: the `if (urlModel)` early-bail in
183+
// `computeAutoSwitchDecision` is the primary guard for explicit `g_model`
184+
// intent. The dedupe ref is a secondary guard for the narrow window after
185+
// an auto-switch fires but before the URL-sync effect (below) writes
186+
// `g_model` back to the URL — once that runs, `urlModel` is set on every
187+
// subsequent render and the ref check is effectively redundant. The ref
188+
// still matters across navigations between unofficial runs because it is
189+
// reset whenever the overlay set goes empty.
190+
const lastAutoSwitchKeyRef = useRef<string>('');
191+
useEffect(() => {
192+
const decision = computeAutoSwitchDecision(
193+
unofficialAvailable,
194+
getUrlParam('g_model'),
195+
selectedModel,
196+
lastAutoSwitchKeyRef.current,
197+
);
198+
lastAutoSwitchKeyRef.current = decision.nextKey;
199+
if (decision.modelToSet !== null) {
200+
setSelectedModel(decision.modelToSet);
201+
}
202+
}, [unofficialAvailable, selectedModel]);
203+
175204
// Sequences available for the selected model (DB ∪ unofficial run for this model)
176205
const availableSequences = useMemo(() => {
177206
const unofficialSeqs = unofficialAvailable

packages/app/src/components/unofficial-run-provider.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ type UnofficialChartData = Record<
4343

4444
const UNOFFICIAL_RUN_PARAM_RE = /^unofficialruns?$/i;
4545

46-
interface AvailableModelSequence {
46+
export interface AvailableModelSequence {
4747
model: Model;
4848
sequence: Sequence;
4949
precisions: string[];
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
import { describe, expect, it } from 'vitest';
2+
3+
import type { AvailableModelSequence } from '@/components/unofficial-run-provider';
4+
import { Model, Sequence } from '@/lib/data-mappings';
5+
6+
import { computeAutoSwitchDecision } from './unofficial-run-auto-switch';
7+
8+
function entry(model: Model, sequence: Sequence): AvailableModelSequence {
9+
return { model, sequence, precisions: [] };
10+
}
11+
12+
describe('computeAutoSwitchDecision', () => {
13+
it('returns no-op and resets the key when no unofficial run is loaded', () => {
14+
expect(computeAutoSwitchDecision([], undefined, Model.DeepSeek_R1, 'stale-key')).toEqual({
15+
nextKey: '',
16+
modelToSet: null,
17+
});
18+
});
19+
20+
it('switches to the run model when g_model is not pinned and current model is not covered', () => {
21+
const run = [entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK)];
22+
const decision = computeAutoSwitchDecision(run, undefined, Model.DeepSeek_R1, '');
23+
expect(decision.modelToSet).toBe(Model.DeepSeek_V4_Pro);
24+
expect(decision.nextKey).toBe(Model.DeepSeek_V4_Pro);
25+
});
26+
27+
it('respects an explicit g_model URL param even when the run lacks that model', () => {
28+
const run = [entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK)];
29+
const decision = computeAutoSwitchDecision(run, Model.DeepSeek_R1, Model.DeepSeek_R1, '');
30+
expect(decision.modelToSet).toBeNull();
31+
// Ref must not be advanced — if the URL is later cleared we still want
32+
// a fresh load of the same run to be able to fire the switch.
33+
expect(decision.nextKey).toBe('');
34+
});
35+
36+
it('does not switch when the current model is already covered by the overlay', () => {
37+
const run = [
38+
entry(Model.DeepSeek_R1, Sequence.OneK_OneK),
39+
entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK),
40+
];
41+
const decision = computeAutoSwitchDecision(run, undefined, Model.DeepSeek_R1, '');
42+
expect(decision.modelToSet).toBeNull();
43+
// Key still advances so we don't keep re-evaluating on every render.
44+
expect(decision.nextKey).toBe([Model.DeepSeek_R1, Model.DeepSeek_V4_Pro].toSorted().join(','));
45+
});
46+
47+
it('does not re-fire after a manual model change against the same run set', () => {
48+
// Simulate the post-auto-switch state: ref already holds the run's key,
49+
// user manually switched back to a model the run does not cover.
50+
const run = [entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK)];
51+
const lastKey = Model.DeepSeek_V4_Pro;
52+
const decision = computeAutoSwitchDecision(run, undefined, Model.DeepSeek_R1, lastKey);
53+
expect(decision.modelToSet).toBeNull();
54+
expect(decision.nextKey).toBe(lastKey);
55+
});
56+
57+
it('re-arms after the overlay set is cleared so a subsequent load can switch again', () => {
58+
// Step 1: a run is loaded, switch fires.
59+
const run = [entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK)];
60+
const first = computeAutoSwitchDecision(run, undefined, Model.DeepSeek_R1, '');
61+
expect(first.modelToSet).toBe(Model.DeepSeek_V4_Pro);
62+
63+
// Step 2: user dismisses the run, overlay set goes empty — ref resets.
64+
const cleared = computeAutoSwitchDecision([], undefined, Model.DeepSeek_V4_Pro, first.nextKey);
65+
expect(cleared).toEqual({ nextKey: '', modelToSet: null });
66+
67+
// Step 3: a *different* run is loaded with a different model. The cleared
68+
// ref allows the switch to fire again.
69+
const run2 = [entry(Model.Kimi_K2_5, Sequence.OneK_OneK)];
70+
const second = computeAutoSwitchDecision(
71+
run2,
72+
undefined,
73+
Model.DeepSeek_V4_Pro,
74+
cleared.nextKey,
75+
);
76+
expect(second.modelToSet).toBe(Model.Kimi_K2_5);
77+
});
78+
79+
it('ignores sequence-only changes in the dedupe key', () => {
80+
// Same model, two sequences appearing across renders. The decision logic
81+
// only branches on model, so the key should not change when a new
82+
// sequence arrives for an already-covered model — otherwise the effect
83+
// would re-evaluate (and bail) on every sequence delta.
84+
const oneK = [entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK)];
85+
const both = [
86+
entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK),
87+
entry(Model.DeepSeek_V4_Pro, Sequence.EightK_OneK),
88+
];
89+
const first = computeAutoSwitchDecision(oneK, undefined, Model.DeepSeek_R1, '');
90+
const second = computeAutoSwitchDecision(both, undefined, Model.DeepSeek_V4_Pro, first.nextKey);
91+
expect(first.nextKey).toBe(second.nextKey);
92+
expect(second.modelToSet).toBeNull();
93+
});
94+
95+
it('picks the first model deterministically across insertion orders', () => {
96+
// Same set of models in two different orders should produce the same
97+
// auto-picked target — protecting against `Object.keys`-driven nondeterminism
98+
// in `parseAvailableModelsAndSequences`.
99+
const orderA = [
100+
entry(Model.MiniMax_M2_5, Sequence.OneK_OneK),
101+
entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK),
102+
entry(Model.Kimi_K2_5, Sequence.OneK_OneK),
103+
];
104+
const orderB = [
105+
entry(Model.Kimi_K2_5, Sequence.OneK_OneK),
106+
entry(Model.DeepSeek_V4_Pro, Sequence.OneK_OneK),
107+
entry(Model.MiniMax_M2_5, Sequence.OneK_OneK),
108+
];
109+
const a = computeAutoSwitchDecision(orderA, undefined, Model.DeepSeek_R1, '');
110+
const b = computeAutoSwitchDecision(orderB, undefined, Model.DeepSeek_R1, '');
111+
expect(a.modelToSet).toBe(b.modelToSet);
112+
expect(a.nextKey).toBe(b.nextKey);
113+
});
114+
});
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
import type { AvailableModelSequence } from '@/components/unofficial-run-provider';
2+
import type { Model } from '@/lib/data-mappings';
3+
4+
export interface AutoSwitchDecision {
5+
/** New value the caller should write into the dedupe ref. */
6+
nextKey: string;
7+
/** Model to switch to, or null when no switch is needed. */
8+
modelToSet: Model | null;
9+
}
10+
11+
/**
12+
* Pure decision helper for the unofficial-run auto-switch effect in
13+
* `GlobalFilterContext`. Given the unofficial run's available models, the URL
14+
* `g_model` param, the currently selected model, and the previous dedupe key,
15+
* returns whether to swap `selectedModel` and what the new dedupe key should be.
16+
*
17+
* - When the overlay set is empty, the dedupe key is reset so the next load
18+
* re-arms the effect.
19+
* - When the URL pinned `g_model` explicitly, no switch fires (respect intent).
20+
* - Otherwise the dedupe key is the sorted unique list of overlay models — the
21+
* sequence dimension is intentionally excluded so a sequence-only delta does
22+
* not invalidate a manual model pick the user made earlier.
23+
* - The first model is taken from a sorted unique list to keep the choice
24+
* deterministic across renders (insertion order from `Object.keys` is not
25+
* guaranteed for multi-model runs).
26+
*/
27+
export function computeAutoSwitchDecision(
28+
unofficialAvailable: AvailableModelSequence[],
29+
urlModel: string | undefined,
30+
selectedModel: Model,
31+
lastKey: string,
32+
): AutoSwitchDecision {
33+
if (unofficialAvailable.length === 0) {
34+
return { nextKey: '', modelToSet: null };
35+
}
36+
if (urlModel) {
37+
return { nextKey: lastKey, modelToSet: null };
38+
}
39+
const sortedModels = [...new Set(unofficialAvailable.map((a) => a.model))].toSorted();
40+
const key = sortedModels.join(',');
41+
if (lastKey === key) {
42+
return { nextKey: lastKey, modelToSet: null };
43+
}
44+
if (sortedModels.includes(selectedModel)) {
45+
return { nextKey: key, modelToSet: null };
46+
}
47+
return { nextKey: key, modelToSet: sortedModels[0] };
48+
}

0 commit comments

Comments
 (0)