feat(blog): MI355X Qwen3.5 SGLang v0.5.12 up-to-19x + spline interp helper#380
Conversation
…elper Blog post: 13 weeks after the 2026-02-16 Qwen3.5-397B-A17B release, AMD MI355X SGLang FP8 throughput per GPU on 8k/1k has moved up to 19.0x at iso-interactivity at 40 tok/s/user (192 -> 3,660 tok/s/GPU between v0.5.8.post1 baseline and v0.5.12), with peak per-GPU throughput climbing 1.3k -> 6.4k tok/s/GPU. Three AITER MoE PRs (sglang #20736, #21188, #21421) drove the April jump; v0.5.12 alone adds 1.44-1.68x on top. Skill updates (.claude/skills/write-inferencex-blog/): - editor.mjs: portable Node http server for browser-based MDX editing with auto-save, ~/-normalized path display, and 127.0.0.1-only bind. Takes the file path as argv[2] so any draft can be edited on any machine. - iso-interactivity.py: Python port of the dashboard's chart interpolation pipeline (paretoFrontUpperLeft + Steffen 1990 monotone cubic Hermite slopes + Hermite evaluation, no extrapolation, Y-clamped to prevent spline overshoot). 1:1 with the canonical TS at packages/app/src/components/calculator/ interpolation.ts. Blog iso-interactivity tables must use this helper so published numbers match the rendered chart exactly. - SKILL.md: replaces the linear-interpolation guidance with the spline mandate and the helper invocation; new "How the Pareto frontier behaves between the knots" subsection explains why the Hermite cubic with Steffen tangents never overshoots and why blog tables can diverge from naive linear by 10%+ on steep segments. Adds the human-review gate before PR creation and the browser-editor launch step. AGENTS.md: new "Chart Interpolation - TS and Python Helpers MUST Stay in Sync" section. Any PR touching paretoFrontUpperLeft, monotoneSlopes, hermiteInterpolate, or interpolateMetricAtInteractivity in TypeScript MUST also update iso-interactivity.py in the same commit, otherwise blog tables silently drift from the chart. .vercelignore: excludes .claude/ from Vercel uploads (belt-and-suspenders since the project root is packages/app/). .gitignore: excludes Python bytecode (__pycache__/, *.pyc) so ad-hoc imports of iso-interactivity.py don't leak. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Mandates reading every packages/app/content/blog/*.mdx file before writing a new post (not just "when in doubt"). Highlights two foundational posts that must be read heavily for tone and structure: - inferencex-v2-nvidia-blackwell-vs-amd-vs-hopper.mdx — the v2 launch piece that sets the editorial voice (composability framing, rack-scale vs single-node, TCO discussion, first-name engineer acknowledgments). - inferencemax-open-source-inference-benchmarking.mdx — the origin story for the open-source benchmark and the "speed is the moat" framing about software cadence. Also adds the Qwen3.5 post itself to the template reference list as the canonical "three-date version-bump time series" template, with the spline iso-interactivity comparison and the `_unreachable_` cell convention for out-of-frontier interactivities. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Blog: moves the <Figure> block from below the iso-interactivity table up to immediately after the top DashboardCTA so the hero chart is the first thing readers see. Removes the duplicate Figure placement. Reworks the [Live chart] line below the iso-interactivity table to call out that it's the interactive version of the figure at the top. Skill: adds a new "<Figure> hero image immediately after the top DashboardCTA" subsection in Step 4 specifying the new placement convention, and rewrites the "<Figure> with the chart image" section further down into a "[Live chart] link after the iso-interactivity tables" section so the structure stops implying two Figure blocks per post. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…table Earlier commit deleted the bottom Figure when hoisting the chart to the top — the intent was both placements, not move. The same chart asset now appears twice: once as the hero immediately after the top DashboardCTA so readers see the curves before the prose, and once again directly below the iso-interactivity table so readers don't have to scroll back up to map the data rows to the chart. Skill mandates both placements explicitly with copy-paste identical <Figure> blocks; the [Live chart] link stays below the second Figure unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five issues from Cursor Bugbot review:
1. Blog: fix "lower of TP=2 and TP=4 throughput" -> "higher of"
in the iso-interactivity intro. Throughput is higher-is-better;
the Pareto frontier picks the maximum throughput at each
interactivity, not the minimum.
2. Helper: rename iso-interactivity.py -> iso_interactivity.py
so `import iso_interactivity` actually works (Python identifiers
can't contain hyphens). The CLI invocation works either way but
the module-import pattern shown in the docstring was broken.
Updates the docstring's CLI example, SKILL.md, and AGENTS.md to
reference the new filename. git mv preserves history.
3. Helper: guard against KeyError when a frontier point is missing
the requested metric_key. Now uses p.get(metric_key) with a
fallback to 0, matching the TS `extractMetric(...) ?? 0`
behavior in interpolateMetricAtInteractivity. CLI returns
`{"value": 0}` cleanly instead of dying with a traceback.
4. Helper: clarify the clamp asymmetry between the two TS analogs.
The Python helper keeps the tighter [min(ys), max(ys)] clamp
that matches interpolateForGPU in the calculator (the closest
analog to blog iso-interactivity tables), and the docstring
now explicitly notes the asymmetry with the trend-chart hook
which only does max(0, raw) and lets the spline overshoot up.
5. Editor: fix Cmd+S race condition. doSave() now reschedules a
debounced save instead of silently dropping the keypress when
another save is already in flight. Latest buffer never gets
stranded.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 01f0a62. Configure here.
| from __future__ import annotations | ||
| import json | ||
| import sys | ||
| from typing import Callable, Iterable, Optional |
There was a problem hiding this comment.
Unused Iterable import added
Low Severity
The new module imports Iterable from typing but never references it anywhere in the file, leaving dead import noise in a helper that is meant to stay a clean 1:1 port of the TypeScript sources.
Reviewed by Cursor Bugbot for commit 01f0a62. Configure here.
| if (editor.getValue() !== lastSaved) { | ||
| scheduleSave(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Stale save can overwrite edits
Medium Severity
The MDX editor’s doSave posts a snapshot captured at save start with no generation check. If the user keeps typing while that request is in flight, a later successful response can write the older buffer after a beforeunload sendBeacon already persisted newer text, reverting the file on disk.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 01f0a62. Configure here.
| - `packages/app/content/blog/mi355x-kimi-k2-5-vllm-aiter-7x-speedup.mdx` — Single-PR speedup story, 25-day cadence, iso-throughput interpolation. Closest template for "one PR moved the curve" posts. | ||
| - `packages/app/content/blog/sglang-0-5-6-b200-deepseek-r1-fp4-up-to-1-8x.mdx` — Same-hardware version-bump story. Closest template for "framework release X is N% faster than X-1" posts. | ||
| - `packages/app/content/blog/gb200-nvl72-kimi-k2-5-vllm-wide-ep-3x-vs-b200.mdx` — Rack-scale wide EP story. Closest template for "scale-up fabric unlocks a new operating regime" posts. | ||
| - `packages/app/content/blog/mi355x-qwen3-5-sglang-v0-5-12-up-to-17x.mdx` — Three-date version-bump time series with the spline iso-interactivity comparison and the `_unreachable_` cell convention for out-of-frontier interactivities. |
There was a problem hiding this comment.
Chart-only path still linear
Low Severity
This PR adds a mandatory spline-based iso_interactivity.py workflow for iso-interactivity tables, but the “When the user has only a chart image” section still instructs authors to linearly interpolate comparison points, which produces different numbers than the dashboard Hermite/Pareto pipeline the rest of the skill requires.
Reviewed by Cursor Bugbot for commit 01f0a62. Configure here.


Summary
Two logical changes bundled together — both touch the blog-writing workflow.
Blog post
New post: 13 weeks after the 2026-02-16 Qwen3.5-397B-A17B release, MI355X SGLang FP8 throughput per GPU on 8k/1k moved up to 19.0x at iso-interactivity at 40 tok/s/user (192 → 3,660 tok/s/GPU between the v0.5.8.post1 baseline and v0.5.12), with peak per-GPU throughput climbing 1.3k → 6.4k tok/s/GPU. Three AMD-authored upstream SGLang PRs (#20736, #21188, #21421) drove the April jump; v0.5.12 alone adds 1.44–1.68x on top.
Skill / infra updates
.claude/skills/write-inferencex-blog/iso-interactivity.py— Python port of the dashboard's interpolation pipeline (Pareto upper-left frontier + Steffen 1990 monotone cubic Hermite + no extrapolation + Y-clamping). 1:1 withpackages/app/src/components/calculator/interpolation.ts. Blog tables now produce exactly the same numbers readers see when they hover the rendered chart..claude/skills/write-inferencex-blog/editor.mjs— portable browser-based MDX editor with auto-save,~/-normalized path display,127.0.0.1-only bind. Takes the file path as argv[2] so it works for any draft on any machine..claude/skills/write-inferencex-blog/SKILL.md— replaces the linear-interp guidance with the spline mandate; adds a "How the Pareto frontier behaves between the knots" subsection explaining why the Hermite cubic with Steffen tangents can't overshoot between knots; gates PR creation on human review; documents the browser-editor launch step.AGENTS.md— new "Chart Interpolation — TS and Python Helpers MUST Stay in Sync" section. Any PR touchingparetoFrontUpperLeft,monotoneSlopes,hermiteInterpolate, orinterpolateMetricAtInteractivityMUST also updateiso-interactivity.pyin the same commit, otherwise blog tables silently drift from the chart..vercelignore— excludes.claude/from Vercel uploads (belt-and-suspenders since Vercel project root ispackages/app/)..gitignore— excludes Python bytecode so ad-hoc imports ofiso-interactivity.pydon't leak__pycache__/into the working tree.Test plan
benchmark-dark.pngis currently a copy of the light theme. Drop a real dark export from the dashboard at the linked preset before merging.g_model=Qwen-3.5-397B-A17B&g_rundate=2026-05-19&i_dstart=2026-02-20&i_dend=2026-05-19&i_prec=fp8) lands on the right cross-date MI355X view.iso-interactivity.pyand confirming the spline output matches the table.🤖 Generated with Claude Code
Note
Medium Risk
Published benchmark ratios and iso-interactivity tables are high-visibility claims; drift between TS chart code and the new Python helper would mislead readers, though runtime app behavior is unchanged.
Overview
Adds a new MI355X Qwen3.5 / SGLang v0.5.12 benchmark post (19x iso-interactivity headline, three-date tables, spline-derived comparison with
_unreachable_rows) and expands the write-inferencex-blog workflow so authors match the live dashboard.Blog workflow & tooling:
iso_interactivity.pyports the chart’s Pareto + Steffen Hermite pipeline;SKILL.mdmandates that helper (replacing linear interp), documents frontier behavior, requires a hero + repeated<Figure>, human approval before git/PR, and a localeditor.mjspreview server.AGENTS.mdrequires TS interpolation changes to update the Python helper in the same commit..vercelignore/.gitignorekeep.claude/and__pycache__out of deploys and the tree.Reviewed by Cursor Bugbot for commit 01f0a62. Bugbot is set up for automated code reviews on this repo. Configure here.