Skip to content

feat(blog): MI355X Qwen3.5 SGLang v0.5.12 up-to-19x + spline interp helper#380

Merged
functionstackx merged 6 commits into
masterfrom
blog/qwen3-5-mi355x-sglang-v0-5-12-19x
May 26, 2026
Merged

feat(blog): MI355X Qwen3.5 SGLang v0.5.12 up-to-19x + spline interp helper#380
functionstackx merged 6 commits into
masterfrom
blog/qwen3-5-mi355x-sglang-v0-5-12-19x

Conversation

@functionstackx
Copy link
Copy Markdown
Contributor

@functionstackx functionstackx commented May 25, 2026

Summary

Two logical changes bundled together — both touch the blog-writing workflow.

Blog post

New post: 13 weeks after the 2026-02-16 Qwen3.5-397B-A17B release, MI355X SGLang FP8 throughput per GPU on 8k/1k moved up to 19.0x at iso-interactivity at 40 tok/s/user (192 → 3,660 tok/s/GPU between the v0.5.8.post1 baseline and v0.5.12), with peak per-GPU throughput climbing 1.3k → 6.4k tok/s/GPU. Three AMD-authored upstream SGLang PRs (#20736, #21188, #21421) drove the April jump; v0.5.12 alone adds 1.44–1.68x on top.

Skill / infra updates

  • .claude/skills/write-inferencex-blog/iso-interactivity.py — Python port of the dashboard's interpolation pipeline (Pareto upper-left frontier + Steffen 1990 monotone cubic Hermite + no extrapolation + Y-clamping). 1:1 with packages/app/src/components/calculator/interpolation.ts. Blog tables now produce exactly the same numbers readers see when they hover the rendered chart.
  • .claude/skills/write-inferencex-blog/editor.mjs — portable browser-based MDX editor with auto-save, ~/-normalized path display, 127.0.0.1-only bind. Takes the file path as argv[2] so it works for any draft on any machine.
  • .claude/skills/write-inferencex-blog/SKILL.md — replaces the linear-interp guidance with the spline mandate; adds a "How the Pareto frontier behaves between the knots" subsection explaining why the Hermite cubic with Steffen tangents can't overshoot between knots; gates PR creation on human review; documents the browser-editor launch step.
  • AGENTS.md — new "Chart Interpolation — TS and Python Helpers MUST Stay in Sync" section. Any PR touching paretoFrontUpperLeft, monotoneSlopes, hermiteInterpolate, or interpolateMetricAtInteractivity MUST also update iso-interactivity.py in the same commit, otherwise blog tables silently drift from the chart.
  • .vercelignore — excludes .claude/ from Vercel uploads (belt-and-suspenders since Vercel project root is packages/app/).
  • .gitignore — excludes Python bytecode so ad-hoc imports of iso-interactivity.py don't leak __pycache__/ into the working tree.

Test plan

  • Replace the dark-theme chart exportbenchmark-dark.png is currently a copy of the light theme. Drop a real dark export from the dashboard at the linked preset before merging.
  • Verify the dashboard preset URL (g_model=Qwen-3.5-397B-A17B&g_rundate=2026-05-19&i_dstart=2026-02-20&i_dend=2026-05-19&i_prec=fp8) lands on the right cross-date MI355X view.
  • Spot-check 2–3 cells of the iso-interactivity table by piping a JSON request through iso-interactivity.py and confirming the spline output matches the table.
  • Click-through preview when Vercel builds — verify the rendered Figure shows the chart, the tables render properly, and the JsonLd FAQ doesn't appear inline.

🤖 Generated with Claude Code


Note

Medium Risk
Published benchmark ratios and iso-interactivity tables are high-visibility claims; drift between TS chart code and the new Python helper would mislead readers, though runtime app behavior is unchanged.

Overview
Adds a new MI355X Qwen3.5 / SGLang v0.5.12 benchmark post (19x iso-interactivity headline, three-date tables, spline-derived comparison with _unreachable_ rows) and expands the write-inferencex-blog workflow so authors match the live dashboard.

Blog workflow & tooling: iso_interactivity.py ports the chart’s Pareto + Steffen Hermite pipeline; SKILL.md mandates that helper (replacing linear interp), documents frontier behavior, requires a hero + repeated <Figure>, human approval before git/PR, and a local editor.mjs preview server. AGENTS.md requires TS interpolation changes to update the Python helper in the same commit. .vercelignore / .gitignore keep .claude/ and __pycache__ out of deploys and the tree.

Reviewed by Cursor Bugbot for commit 01f0a62. Bugbot is set up for automated code reviews on this repo. Configure here.

…elper

Blog post: 13 weeks after the 2026-02-16 Qwen3.5-397B-A17B release,
AMD MI355X SGLang FP8 throughput per GPU on 8k/1k has moved up to
19.0x at iso-interactivity at 40 tok/s/user (192 -> 3,660 tok/s/GPU
between v0.5.8.post1 baseline and v0.5.12), with peak per-GPU
throughput climbing 1.3k -> 6.4k tok/s/GPU. Three AITER MoE PRs
(sglang #20736, #21188, #21421) drove the April jump; v0.5.12 alone
adds 1.44-1.68x on top.

Skill updates (.claude/skills/write-inferencex-blog/):

- editor.mjs: portable Node http server for browser-based MDX
  editing with auto-save, ~/-normalized path display, and
  127.0.0.1-only bind. Takes the file path as argv[2] so any draft
  can be edited on any machine.

- iso-interactivity.py: Python port of the dashboard's chart
  interpolation pipeline (paretoFrontUpperLeft + Steffen 1990
  monotone cubic Hermite slopes + Hermite evaluation, no
  extrapolation, Y-clamped to prevent spline overshoot). 1:1 with
  the canonical TS at packages/app/src/components/calculator/
  interpolation.ts. Blog iso-interactivity tables must use this
  helper so published numbers match the rendered chart exactly.

- SKILL.md: replaces the linear-interpolation guidance with the
  spline mandate and the helper invocation; new "How the Pareto
  frontier behaves between the knots" subsection explains why the
  Hermite cubic with Steffen tangents never overshoots and why
  blog tables can diverge from naive linear by 10%+ on steep
  segments. Adds the human-review gate before PR creation and the
  browser-editor launch step.

AGENTS.md: new "Chart Interpolation - TS and Python Helpers MUST
Stay in Sync" section. Any PR touching paretoFrontUpperLeft,
monotoneSlopes, hermiteInterpolate, or interpolateMetricAtInteractivity
in TypeScript MUST also update iso-interactivity.py in the same
commit, otherwise blog tables silently drift from the chart.

.vercelignore: excludes .claude/ from Vercel uploads
(belt-and-suspenders since the project root is packages/app/).
.gitignore: excludes Python bytecode (__pycache__/, *.pyc) so
ad-hoc imports of iso-interactivity.py don't leak.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
inferencemax-app Ready Ready Preview, Comment May 26, 2026 12:09am

Request Review

functionstackx and others added 2 commits May 25, 2026 19:55
Mandates reading every packages/app/content/blog/*.mdx file before
writing a new post (not just "when in doubt"). Highlights two
foundational posts that must be read heavily for tone and structure:

- inferencex-v2-nvidia-blackwell-vs-amd-vs-hopper.mdx — the v2 launch
  piece that sets the editorial voice (composability framing,
  rack-scale vs single-node, TCO discussion, first-name engineer
  acknowledgments).
- inferencemax-open-source-inference-benchmarking.mdx — the origin
  story for the open-source benchmark and the "speed is the moat"
  framing about software cadence.

Also adds the Qwen3.5 post itself to the template reference list as
the canonical "three-date version-bump time series" template, with
the spline iso-interactivity comparison and the `_unreachable_` cell
convention for out-of-frontier interactivities.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread .claude/skills/write-inferencex-blog/iso_interactivity.py
Comment thread packages/app/content/blog/mi355x-qwen3-5-sglang-v0-5-12-up-to-17x.mdx Outdated
Comment thread .claude/skills/write-inferencex-blog/iso_interactivity.py
Blog: moves the <Figure> block from below the iso-interactivity
table up to immediately after the top DashboardCTA so the hero chart
is the first thing readers see. Removes the duplicate Figure
placement. Reworks the [Live chart] line below the iso-interactivity
table to call out that it's the interactive version of the figure
at the top.

Skill: adds a new "<Figure> hero image immediately after the top
DashboardCTA" subsection in Step 4 specifying the new placement
convention, and rewrites the "<Figure> with the chart image"
section further down into a "[Live chart] link after the
iso-interactivity tables" section so the structure stops implying
two Figure blocks per post.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread .claude/skills/write-inferencex-blog/editor.mjs Outdated
…table

Earlier commit deleted the bottom Figure when hoisting the chart to
the top — the intent was both placements, not move. The same chart
asset now appears twice: once as the hero immediately after the top
DashboardCTA so readers see the curves before the prose, and once
again directly below the iso-interactivity table so readers don't
have to scroll back up to map the data rows to the chart.

Skill mandates both placements explicitly with copy-paste identical
<Figure> blocks; the [Live chart] link stays below the second
Figure unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread .claude/skills/write-inferencex-blog/iso-interactivity.py Outdated
Five issues from Cursor Bugbot review:

1. Blog: fix "lower of TP=2 and TP=4 throughput" -> "higher of"
   in the iso-interactivity intro. Throughput is higher-is-better;
   the Pareto frontier picks the maximum throughput at each
   interactivity, not the minimum.

2. Helper: rename iso-interactivity.py -> iso_interactivity.py
   so `import iso_interactivity` actually works (Python identifiers
   can't contain hyphens). The CLI invocation works either way but
   the module-import pattern shown in the docstring was broken.
   Updates the docstring's CLI example, SKILL.md, and AGENTS.md to
   reference the new filename. git mv preserves history.

3. Helper: guard against KeyError when a frontier point is missing
   the requested metric_key. Now uses p.get(metric_key) with a
   fallback to 0, matching the TS `extractMetric(...) ?? 0`
   behavior in interpolateMetricAtInteractivity. CLI returns
   `{"value": 0}` cleanly instead of dying with a traceback.

4. Helper: clarify the clamp asymmetry between the two TS analogs.
   The Python helper keeps the tighter [min(ys), max(ys)] clamp
   that matches interpolateForGPU in the calculator (the closest
   analog to blog iso-interactivity tables), and the docstring
   now explicitly notes the asymmetry with the trend-chart hook
   which only does max(0, raw) and lets the spline overshoot up.

5. Editor: fix Cmd+S race condition. doSave() now reschedules a
   debounced save instead of silently dropping the keypress when
   another save is already in flight. Latest buffer never gets
   stranded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@functionstackx functionstackx merged commit 6ceb08d into master May 26, 2026
14 of 15 checks passed
@functionstackx functionstackx deleted the blog/qwen3-5-mi355x-sglang-v0-5-12-19x branch May 26, 2026 00:09
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 01f0a62. Configure here.

from __future__ import annotations
import json
import sys
from typing import Callable, Iterable, Optional
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused Iterable import added

Low Severity

The new module imports Iterable from typing but never references it anywhere in the file, leaving dead import noise in a helper that is meant to stay a clean 1:1 port of the TypeScript sources.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 01f0a62. Configure here.

if (editor.getValue() !== lastSaved) {
scheduleSave();
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale save can overwrite edits

Medium Severity

The MDX editor’s doSave posts a snapshot captured at save start with no generation check. If the user keeps typing while that request is in flight, a later successful response can write the older buffer after a beforeunload sendBeacon already persisted newer text, reverting the file on disk.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 01f0a62. Configure here.

- `packages/app/content/blog/mi355x-kimi-k2-5-vllm-aiter-7x-speedup.mdx` — Single-PR speedup story, 25-day cadence, iso-throughput interpolation. Closest template for "one PR moved the curve" posts.
- `packages/app/content/blog/sglang-0-5-6-b200-deepseek-r1-fp4-up-to-1-8x.mdx` — Same-hardware version-bump story. Closest template for "framework release X is N% faster than X-1" posts.
- `packages/app/content/blog/gb200-nvl72-kimi-k2-5-vllm-wide-ep-3x-vs-b200.mdx` — Rack-scale wide EP story. Closest template for "scale-up fabric unlocks a new operating regime" posts.
- `packages/app/content/blog/mi355x-qwen3-5-sglang-v0-5-12-up-to-17x.mdx` — Three-date version-bump time series with the spline iso-interactivity comparison and the `_unreachable_` cell convention for out-of-frontier interactivities.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chart-only path still linear

Low Severity

This PR adds a mandatory spline-based iso_interactivity.py workflow for iso-interactivity tables, but the “When the user has only a chart image” section still instructs authors to linearly interpolate comparison points, which produces different numbers than the dashboard Hermite/Pareto pipeline the rest of the skill requires.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 01f0a62. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant