You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
edit(blog,skill): hoist chart figure to top of post (after lede CTA)
Blog: moves the <Figure> block from below the iso-interactivity
table up to immediately after the top DashboardCTA so the hero chart
is the first thing readers see. Removes the duplicate Figure
placement. Reworks the [Live chart] line below the iso-interactivity
table to call out that it's the interactive version of the figure
at the top.
Skill: adds a new "<Figure> hero image immediately after the top
DashboardCTA" subsection in Step 4 specifying the new placement
convention, and rewrites the "<Figure> with the chart image"
section further down into a "[Live chart] link after the
iso-interactivity tables" section so the structure stops implying
two Figure blocks per post.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: .claude/skills/write-inferencex-blog/SKILL.md
+20-8Lines changed: 20 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -128,6 +128,21 @@ Bold the peak ratio in the lede. Second paragraph: name the upstream PRs that ma
128
128
129
129
Use the preset URL the user provided so clicking lands on the exact comparison view, not the bare dashboard. Format: `https://inferencex.semianalysis.com/inference?g_model=...&i_prec=...&g_rundate=...&g_runid=...&i_active={hw1}_{fw1}%2C{hw2}_{fw2}&i_metric=y_costh&i_linelabel=1`.
130
130
131
+
### `<Figure>` hero image immediately after the top DashboardCTA
132
+
133
+
The chart image is the **hero** of the post — it goes right after the top `<DashboardCTA>`, **before** the model / architecture paragraph, so readers see the curves before they read the prose. Do not bury the figure halfway down the post next to the iso-interactivity table.
134
+
135
+
```mdx
136
+
<Figure
137
+
srcLight="/images/{slug}/benchmark-light.png"
138
+
srcDark="/images/{slug}/benchmark-dark.png"
139
+
alt="Plain-English description of the chart including model, precision, ISL/OSL, both compared SKUs/frameworks, and any toggles (MTP/non-MTP)"
140
+
caption="Short caption. Note any non-obvious labeling convention used on the chart (e.g. 'Labels denote GPU count per config.')."
141
+
/>
142
+
```
143
+
144
+
Use the chart asset only once in the body — show it at the top and don't repeat it lower down. Below the iso-interactivity table, place a small `[Live chart](...)` link that points at the same preset URL and tells the reader the figure at the top is interactive when clicked through. That's where readers will go to drill into specific points.
145
+
131
146
### Model / architecture paragraph
132
147
133
148
One paragraph naming the model, vendor, release date (use it to compute "N weeks after release" if it sharpens the cadence framing), total/active parameters, expert count + top-K routing, attention mechanism (MLA, NSA/DSA, GQA, etc.), and context window. **Always WebSearch to verify these numbers** — don't carry over from a prior generation. Cite a source URL inline if the number is non-obvious.
@@ -169,18 +184,15 @@ Columns: `Interactivity (tok/s/user) | {NVIDIA} $/M tok | {AMD} $/M tok | {NVIDI
169
184
170
185
Follow with one paragraph explaining _why_ the gap peaks where it does (e.g. "the MI355X 4-GPU TP=4 recipe plateaus at $0.22 while B200 is still climbing"), and one sentence noting where the gap inverts (e.g. "Above 90 tok/s/user the comparison flips marginally back to B200 because there is no MI355X recipe matching B200's TP=8 conc 4 at 100+ tok/s/user."). **Don't paper over the inversion** — call it out.
171
186
172
-
### `<Figure>` with the chart image
187
+
### `[Live chart]` link after the iso-interactivity tables
188
+
189
+
The hero `<Figure>` already shipped at the top of the post. Down here, just a one-line link that points at the same preset URL so readers can drill into the interactive version of what they saw at the top:
173
190
174
191
```mdx
175
-
<Figure
176
-
srcLight="/images/{slug}/benchmark-light.png"
177
-
srcDark="/images/{slug}/benchmark-dark.png"
178
-
alt="Plain-English description of the chart including model, precision, ISL/OSL, both compared SKUs/frameworks, and any toggles (MTP/non-MTP)"
179
-
caption="Short caption. Note any non-obvious labeling convention used on the chart (e.g. 'Labels denote GPU count per config.')."
180
-
/>
192
+
[Live chart](https://inferencex.semianalysis.com/inference?...) — same view as the figure at the top, pre-filtered to {hardware/framework/model/precision} and interactive.
181
193
```
182
194
183
-
Immediately followed by a `[Live chart]({preset URL})` link with the same preset as the `DashboardCTA` so readers can drill into a single point.
195
+
Do not embed a second `<Figure>` here. One chart asset, shown once at the top.
184
196
185
197
### `## What's Next for {SKU/framework} on {Model}` (or similar)
alt="Qwen3.5 FP8 8k/1k tok/s/GPU vs interactivity on MI355X SGLang across three dates: 2026-02-20 (v0.5.8.post1), 2026-04-16 (v0.5.10rc0), 2026-05-19 (v0.5.12). Each curve labeled with its date and the TP value at each point."
29
+
caption="Qwen3.5-397B-A17B FP8 8k/1k on MI355X SGLang. Three runs over 3 months: v0.5.8.post1 (Feb 20, TP=8), v0.5.10rc0 (Apr 16, TP=2/4), v0.5.12 (May 19, TP=2/4). Point labels denote the TP value used for that config."
30
+
/>
31
+
25
32
Qwen3.5-397B-A17B is Alibaba's MoE flagship, released 2026-02-16 is an 397B total parameters with 17B activated per token across **512 experts** (top-K routing), with a hybrid attention stack interleaving Gated DeltaNet and Gated Attention layers. The first InferenceX benchmark ran on MI355X four days after the release.
26
33
27
34
## What Shipped to Make This Happen
@@ -103,14 +110,7 @@ Each date is interpolated on its Pareto frontier (the lower of TP=2 and TP=4 thr
103
110
104
111
The 19x peak at 40 tok/s/user is partly a regime extension — the Feb TP=8 recipe had a 24.5 ms TPOT floor at conc 4 (40.86 tok/s/user) and couldn't run cheaper than that on this workload, so the comparison band tops out where the old recipe was already in collapse. By 50 tok/s/user the v0.5.8 curve doesn't exist at all; by 75 tok/s/user only the v0.5.12 curve still has a point. The May v0.5.12 image alone adds 1.44x to 1.68x on top of the April baseline across the entire shared band — a clean version-bump win.
alt="Qwen3.5 FP8 8k/1k tok/s/GPU vs interactivity on MI355X SGLang across three dates: 2026-02-20 (v0.5.8.post1), 2026-04-16 (v0.5.10rc0), 2026-05-19 (v0.5.12). Each curve labeled with its date and the TP value at each point."
110
-
caption="Qwen3.5-397B-A17B FP8 8k/1k on MI355X SGLang. Three runs over 3 months: v0.5.8.post1 (Feb 20, TP=8), v0.5.10rc0 (Apr 16, TP=2/4), v0.5.12 (May 19, TP=2/4). Point labels denote the TP value used for that config."
111
-
/>
112
-
113
-
[Live chart](https://inferencex.semianalysis.com/inference?g_model=Qwen-3.5-397B-A17B&g_rundate=2026-05-19&i_gpus=mi355x_sglang&i_dstart=2026-02-20&i_dend=2026-05-19&i_prec=fp8), pre-filtered to MI355X SGLang Qwen3.5 FP8 across all three runs.
113
+
[Live chart](https://inferencex.semianalysis.com/inference?g_model=Qwen-3.5-397B-A17B&g_rundate=2026-05-19&i_gpus=mi355x_sglang&i_dstart=2026-02-20&i_dend=2026-05-19&i_prec=fp8) — same view as the figure at the top, pre-filtered to MI355X SGLang Qwen3.5 FP8 across all three runs and interactive.
0 commit comments