Skip to content

Commit 980ebf6

Browse files
committed
Merge master into perf/decouple-toggle-from-chart-rebuild: reconcile useStableValue scale configs with replay fixed-axes (#434) and showPointLabels rename (#474)
2 parents 880cd41 + 1392807 commit 980ebf6

108 files changed

Lines changed: 57448 additions & 32943 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/claude.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -278,7 +278,7 @@ jobs:
278278
- name: Run Claude Code
279279
id: claude
280280
if: ${{ always() }}
281-
uses: anthropics/claude-code-action@fbda2eb1bdc90d319b8d853f5deb53bca199a7c1 # v1.0.140
281+
uses: anthropics/claude-code-action@d5726de019ec4498aa667642bc3a80fca83aa102 # v1.0.148
282282
env:
283283
GH_TOKEN: ${{ secrets.PAT }}
284284
GITHUB_TOKEN: ${{ secrets.PAT }}
@@ -331,7 +331,7 @@ jobs:
331331
fetch-depth: 0
332332

333333
- name: PR Review with Claude
334-
uses: anthropics/claude-code-action@fbda2eb1bdc90d319b8d853f5deb53bca199a7c1 # v1.0.140
334+
uses: anthropics/claude-code-action@d5726de019ec4498aa667642bc3a80fca83aa102 # v1.0.148
335335
with:
336336
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
337337
trigger_phrase: '@claude review'

.oxlintrc.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
"unicorn/no-null": "off",
4141
"unicorn/no-useless-undefined": "off",
4242
"unicorn/numeric-separators-style": "off",
43+
"unicorn/prefer-export-from": "off",
4344
"unicorn/prefer-global-this": "off",
4445
"unicorn/prefer-top-level-await": "off"
4546
}

AGENTS.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,7 @@ Authoritative total / active parameter counts for every model in the dashboard.
148148
| DeepSeek-V4-Pro | 1.6T | 49B | `deepseek-ai/DeepSeek-V4-Pro` | HF model card |
149149
| Kimi-K2.5 | 1T | 32B | `moonshotai/Kimi-K2.5` | HF model card |
150150
| Kimi-K2.6 | 1T | 32B | `moonshotai/Kimi-K2.6` | HF model card |
151+
| Kimi-K2.7-Code | 1T | 32B | `moonshotai/Kimi-K2.7-Code` | HF model card |
151152
| Qwen3.5-397B-A17B | 397B | 17B | `Qwen/Qwen3.5-397B-A17B` | HF model card |
152153
| GLM-5 | 744B | 40B | `zai-org/GLM-5` | HF model card |
153154
| GLM-5.1 | 744B | 40B | `zai-org/GLM-5.1-FP8` | HF model card (same base as GLM-5) |
@@ -161,7 +162,7 @@ Authoritative total / active parameter counts for every model in the dashboard.
161162
- **GLM-5 ≠ 355B.** 355B is GLM-4.5. GLM-5 jumped to 744B / 40B active (256-expert MoE with DSA).
162163
- **MiniMax-M2.5/M2.7 ≠ 456B.** 456B is the older MiniMax-Text-01 / M1 (32 large experts). The M2 series is a different architecture: 230B / 10B active, 256 small experts.
163164
- **DeepSeek-R1 is 671B, not 685B.** HF metadata shows 685B because the bundled MTP head adds ~14B; the core MoE is 671B / 37B active.
164-
- **Kimi K2.5 and K2.6 are post-training refinements**, not new pre-trained sizes. Same 1T / 32B / 384-expert backbone as the original K2.
165+
- **Kimi K2.5, K2.6, and K2.7-Code are post-training refinements**, not new pre-trained sizes. Same 1T / 32B / 384-expert backbone as the original K2. K2.7-Code is a coding-focused refinement of the same backbone.
165166

166167
## Common Development Tasks
167168

docs/adding-entities.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,105 @@ Present what you inferred and get confirmation + category in a single step. Incl
7171

7272
Everything else (`MODEL_OPTIONS`, `DEFAULT_MODELS`, `EXPERIMENTAL_MODELS`, `DEPRECATED_MODELS`, `MODEL_PREFIX_MAPPING`, `getModelLabel()`) is derived automatically.
7373

74+
**`packages/app/src/lib/compare-slug.ts`** (easy to miss — the /compare and /compare-per-dollar pages do NOT derive from `MODEL_CONFIG`):
75+
76+
- `COMPARE_MODEL_SLUGS` — add an entry with `{ slug, displayName, dbKeys, label }`. `displayName` must match the `Model` enum value; `dbKeys` lists the DB buckets to query. Place it per the ordering comment (Chinese-lab flagships first, newer family member leads). Without this entry the model is absent from /compare, /compare-per-dollar, the sitemap, and their OG images.
77+
- `COMPARE_MODEL_ALIASES` — only if a family-level or older-version slug should 308 to the new entry.
78+
79+
**`packages/app/src/lib/compare-ssr.ts`**:
80+
81+
- `KNOWN_MODELS` — add the display name so `?g_model=` URL overrides validate on compare pages.
82+
83+
**`packages/app/src/app/compare/page.tsx`** and **`packages/app/src/app/compare-per-dollar/page.tsx`**:
84+
85+
- `DESCRIPTION` — these SEO meta strings hardcode a sample model list ("…, Qwen 3.5 397B-A17B, and more"). Add the new model if it should appear in the catalog blurb.
86+
87+
**`packages/app/src/lib/model-architectures.ts`** (optional — powers the per-model architecture diagram on the inference tab):
88+
89+
- `MODEL_ARCHITECTURES` — add a `[Model.X]` entry with verified config.json values. Omitted models simply render no diagram (`getModelArchitecture` returns `undefined`), so this is non-blocking but expected for parity with other models.
90+
91+
`/about` needs no change — its model list derives from `DB_MODEL_TO_DISPLAY` and includes the new key automatically once `models.ts` is updated.
92+
93+
---
94+
95+
## Featuring a Day-0 Model
96+
97+
When a new model launches and we want to give it the headline treatment, swap the **promotion surfaces** to it. This is separate from [Adding a New Model](#adding-a-new-model) above — the model must **already exist** (`Model.*` enum, `MODEL_CONFIG`, DB mapping) before it can be featured. The promotion surfaces are:
98+
99+
- **Launch banner** — the dismissible bar at the top of the landing page
100+
- **Launch modal** — the "X is live" popup on the landing page
101+
- **Quick Comparisons preset** — the "X — First Look" card (first entry in `FAVORITE_PRESETS`)
102+
- **Default model** (optional) — the model the dashboard opens on (`g_model`)
103+
104+
### The "retire old, new IDs" pattern
105+
106+
Each launch **replaces** the previous day-0 model's surfaces rather than editing them in place. This is deliberate:
107+
108+
- **New storage keys** (`inferencex-<slug>-{banner,modal}-dismissed`) so users who dismissed the _previous_ launch banner/modal still see the new one.
109+
- **Keep the old preset, hide it** (`hidden: true`) instead of deleting it — existing `?preset=<old-slug>-launch` links (old banners, modals, external shares, blog `DashboardCTA`s) must keep resolving.
110+
- **Generic testIds** (`launch-banner`, `launch-modal`) — launch-agnostic so Cypress selectors don't change every launch.
111+
112+
> The current day-0 model is **whatever the single visible (`hidden` unset) `*-launch` preset points to** — detect it, don't assume. As of MiniMax M3 it was DeepSeek V4 Pro.
113+
114+
### Derive the identifiers
115+
116+
From the model name, derive (MiniMax M3 shown as the worked example):
117+
118+
| Token | Example | Used in |
119+
| --------- | ------------------ | ---------------------------------------------- |
120+
| `SLUG` | `minimax-m3` | preset id, nudge ids, storage keys, `?preset=` |
121+
| `SLUG_` | `minimax_m3` | analytics event names |
122+
| `ENUM` | `Model.MiniMax_M3` | preset `config.model` |
123+
| `DISPLAY` | `MiniMax M3` | all user-facing copy |
124+
| `G_MODEL` | `MiniMax-M3` | `g_model` default (the `Model.*` string value) |
125+
126+
### Then apply
127+
128+
**`packages/app/src/components/favorites/favorite-presets.ts`**:
129+
130+
1. On the outgoing visible `*-launch` preset, add `hidden: true` and update its comment (retired, kept for link compat — same pattern as the existing `dsv4-launch-nvidia` entry).
131+
2. Prepend a new visible preset as the **first** element of `FAVORITE_PRESETS`:
132+
```ts
133+
{
134+
id: 'SLUG-launch',
135+
title: 'DISPLAY — First Look',
136+
description:
137+
'First benchmarks of DISPLAY across every available GPU. New configurations appear here as they come online.',
138+
tags: ['<Vendor>', '<Version>', 'New'], // e.g. ['MiniMax', 'M3', 'New']
139+
category: 'comparison',
140+
wide: true,
141+
config: {
142+
model: ENUM,
143+
sequence: Sequence.EightK_OneK,
144+
precisions: ['fp4', 'fp4fp8', 'fp8'],
145+
yAxisMetric: 'y_tpPerGpu',
146+
hwFilter: ['h100', 'h200', 'b200', 'b300', 'gb200', 'gb300', 'mi300x', 'mi325x', 'mi355x'],
147+
},
148+
}
149+
```
150+
Narrow `hwFilter` only for a restricted launch (e.g. NVIDIA-only). The broad filter + "as they come online" copy is the intended self-filling behavior even when data is still partial at launch.
151+
152+
**`packages/app/src/lib/nudges/registry.tsx`** — rewrite the two launch nudges (only one banner + one modal exist at a time):
153+
154+
- **Modal** (under "Landing modals"): `id: 'SLUG-launch-modal'`, `storageKey: 'inferencex-SLUG-modal-dismissed'`, `title: 'DISPLAY is live'`, day-zero `description`, `testId: 'launch-modal'`, `primaryAction.onClick``/inference?preset=SLUG-launch`, analytics `SLUG_modal_shown`/`_dismissed`/`_explored`.
155+
- **Banner** (under "Landing banner"): `id: 'SLUG-launch-banner'`, `storageKey: 'inferencex-SLUG-banner-dismissed'`, `title: 'DISPLAY benchmarks are live'`, `testId: 'launch-banner'`, `href`/`onLinkClick``/inference?preset=SLUG-launch`, keep the generic `launch_banner_*` analytics events but set `properties: { banner_id: 'SLUG-launch', preset_id: 'SLUG-launch' }`.
156+
157+
**`packages/app/src/lib/url-state.ts`** _(only if making it the site default)_:
158+
159+
- Set `PARAM_DEFAULTS.g_model` to `'G_MODEL'`. Most launches **leave this unchanged** — only change it for a true flagship (DeepSeek V4 Pro got it; MiniMax M3 did not).
160+
161+
### Sync tests
162+
163+
- **`packages/app/src/lib/nudges/registry.test.ts`** — update the **sorted** expected-ids array ("contains the expected set of migrated nudges") to the new `SLUG-launch-banner`/`SLUG-launch-modal` ids.
164+
- **`packages/app/cypress/e2e/nudge-system.cy.ts`** and **`navigation.cy.ts`** — replace the old `inferencex-<old-slug>-{modal,banner}-dismissed` storage keys with the new ones. TestId selectors stay generic (`launch-modal`, `launch-banner`); update any `it(...)` titles that name the old model.
165+
- **`packages/app/src/lib/url-state.test.ts`** _(only if the default changed)_ — two specs hardcode the default `g_model`; update both.
166+
167+
> **Don't touch:** blog MDX `?g_model=…` / `?preset=<old-slug>-launch` links (historical, correct), `packages/constants/src/models.ts` DB-key maps, or the outgoing model's data-mapping / architecture entries — it still exists, it's just no longer the headline.
168+
169+
### Verify
170+
171+
`pnpm typecheck && pnpm lint && pnpm fmt && pnpm test:unit`, then `rg` for the old slug to confirm only the intentional hidden preset + blog links remain. Final gate: `pnpm test:e2e` and a manual `pnpm dev` check that the banner/modal/preset read `DISPLAY` and `/inference?preset=SLUG-launch` renders data.
172+
74173
---
75174

76175
## Adding a New GPU

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Design rationale and non-obvious conventions. See [CLAUDE.md](../CLAUDE.md) for
1010
- [Pitfalls](./pitfalls.md) — Failure modes: token type consistency, schema evolution, empty objects, zoom loss, stale closures, disaggregated metrics, negative splines, date stamping, ref stability, cost inheritance
1111
- [GPU Specs](./gpu-specs.md) — Unit conventions, topology invariants, SVG layout rationale, hardware gotchas
1212
- [TCO Calculator](./tco-calculator.md) — Why interpolation, composite keys, cost matrix, token type bugs, badge logic, state design
13-
- [Adding Entities](./adding-entities.md) — Step-by-step checklists for adding new models, GPUs, precisions, sequences, frameworks (ingest + constants + frontend)
13+
- [Adding Entities](./adding-entities.md) — Step-by-step checklists for adding new models, GPUs, precisions, sequences, frameworks (ingest + constants + frontend), plus featuring a day-0 model (launch banner, modal, Quick Comparisons preset)
1414
- [Testing](./testing.md) — Requirements, quality standards, pre-commit checklist
1515
- [Data Transforms](./data-transforms.md) — Full pipeline from BenchmarkRow to RenderableGraph: type hierarchy, hardware key construction, derived metrics, memoization strategy
1616
- [State Ownership](./state-ownership.md) — Which context owns which state, availability filtering cascade, comparison date mechanics, URL param sync

docs/state-ownership.md

Lines changed: 33 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ Depends on: `GlobalFilterProvider` (reads all filter state and availability, inc
7373

7474
- `selectedYAxisMetric` (`i_metric`), `selectedXAxisMetric` (`i_xmetric`), `selectedE2eXAxisMetric` (`i_e2e_xmetric`)
7575
- `scaleType``auto | linear | log` (`i_scale`)
76-
- `hideNonOptimal` (`i_optimal`), `hidePointLabels` (`i_nolabel`), `logScale` (`i_log`)
76+
- `hideNonOptimal` (`i_optimal`), `showPointLabels` (`i_label`), `logScale` (`i_log`)
7777
- `highContrast` (`i_hc`), `isLegendExpanded` (`i_legend`)
7878
- `useAdvancedLabels` (`i_advlabel`), `showGradientLabels` (`i_gradlabel`)
7979
- `colorShuffleSeed` — no URL param; ephemeral
@@ -260,34 +260,35 @@ Historical Trends and TCO Calculator share the inference tab's URL path (`/infer
260260

261261
### Full parameter list
262262

263-
| Param | Owner | Default |
264-
| --------------- | ------------------- | -------------------------------- |
265-
| `g_model` | GlobalFilterContext | `DeepSeek-R1-0528` |
266-
| `g_rundate` | GlobalFilterContext | `''` |
267-
| `g_runid` | GlobalFilterContext | `''` |
268-
| `i_seq` | GlobalFilterContext | `8k/1k` |
269-
| `i_prec` | GlobalFilterContext | `fp4` |
270-
| `i_metric` | InferenceProvider | `y_tpPerGpu` |
271-
| `i_xmetric` | InferenceProvider | `p99_ttft` |
272-
| `i_e2e_xmetric` | InferenceProvider | `''` |
273-
| `i_scale` | InferenceProvider | `auto` |
274-
| `i_gpus` | InferenceProvider | `''` |
275-
| `i_dates` | InferenceProvider | `''` |
276-
| `i_dstart` | InferenceProvider | `''` |
277-
| `i_dend` | InferenceProvider | `''` |
278-
| `i_optimal` | InferenceProvider | `''` (truthy = hide non-optimal) |
279-
| `i_nolabel` | InferenceProvider | `''` |
280-
| `i_hc` | InferenceProvider | `''` |
281-
| `i_log` | InferenceProvider | `''` |
282-
| `i_legend` | InferenceProvider | `''` |
283-
| `i_advlabel` | InferenceProvider | `''` |
284-
| `i_gradlabel` | InferenceProvider | `''` |
285-
| `e_rundate` | EvaluationProvider | `''` |
286-
| `e_bench` | EvaluationProvider | `''` |
287-
| `e_hc` | EvaluationProvider | `''` |
288-
| `e_labels` | EvaluationProvider | `''` |
289-
| `e_legend` | EvaluationProvider | `''` |
290-
| `r_range` | ReliabilityProvider | `last-3-months` |
291-
| `r_pct` | ReliabilityProvider | `''` |
292-
| `r_hc` | ReliabilityProvider | `''` |
293-
| `r_legend` | ReliabilityProvider | `''` |
263+
| Param | Owner | Default |
264+
| --------------- | ------------------- | --------------------------------- |
265+
| `g_model` | GlobalFilterContext | `DeepSeek-R1-0528` |
266+
| `g_rundate` | GlobalFilterContext | `''` |
267+
| `g_runid` | GlobalFilterContext | `''` |
268+
| `i_seq` | GlobalFilterContext | `8k/1k` |
269+
| `i_prec` | GlobalFilterContext | `fp4` |
270+
| `i_metric` | InferenceProvider | `y_tpPerGpu` |
271+
| `i_xmetric` | InferenceProvider | `p99_ttft` |
272+
| `i_e2e_xmetric` | InferenceProvider | `''` |
273+
| `i_scale` | InferenceProvider | `auto` |
274+
| `i_gpus` | InferenceProvider | `''` |
275+
| `i_dates` | InferenceProvider | `''` |
276+
| `i_dstart` | InferenceProvider | `''` |
277+
| `i_dend` | InferenceProvider | `''` |
278+
| `i_optimal` | InferenceProvider | `''` (truthy = hide non-optimal) |
279+
| `i_label` | InferenceProvider | `''` (truthy = show point labels) |
280+
| `i_nolabel` | InferenceProvider | `''` (legacy, read-only) |
281+
| `i_hc` | InferenceProvider | `''` |
282+
| `i_log` | InferenceProvider | `''` |
283+
| `i_legend` | InferenceProvider | `''` |
284+
| `i_advlabel` | InferenceProvider | `''` |
285+
| `i_gradlabel` | InferenceProvider | `''` |
286+
| `e_rundate` | EvaluationProvider | `''` |
287+
| `e_bench` | EvaluationProvider | `''` |
288+
| `e_hc` | EvaluationProvider | `''` |
289+
| `e_labels` | EvaluationProvider | `''` |
290+
| `e_legend` | EvaluationProvider | `''` |
291+
| `r_range` | ReliabilityProvider | `last-3-months` |
292+
| `r_pct` | ReliabilityProvider | `''` |
293+
| `r_hc` | ReliabilityProvider | `''` |
294+
| `r_legend` | ReliabilityProvider | `''` |

package.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,14 +44,14 @@
4444
"audit-ci": "^7.1.0",
4545
"is-ci": "^4.1.0",
4646
"lefthook": "^2.1.9",
47-
"oxfmt": "^0.54.0",
48-
"oxlint": "^1.69.0",
47+
"oxfmt": "^0.55.0",
48+
"oxlint": "^1.70.0",
4949
"rimraf": "^6.1.3",
5050
"typescript": "^6.0.3"
5151
},
5252
"engines": {
5353
"node": ">=18.0.0",
5454
"pnpm": ">=10.0.0"
5555
},
56-
"packageManager": "pnpm@11.5.2"
56+
"packageManager": "pnpm@11.7.0"
5757
}

0 commit comments

Comments
 (0)