feat(blog): MI355X Qwen3.5 SGLang v0.5.12 up-to-17x in 3 months by functionstackx · Pull Request #379 · SemiAnalysisAI/InferenceX-app

functionstackx · 2026-05-25T23:29:04Z

Summary

New blog post: 13 weeks after Alibaba's Qwen3.5-397B-A17B release, AMD MI355X SGLang FP8 throughput per GPU on the 8k/1k workload moved up to 17.7x at iso-interactivity at 40 tok/s/user (210 → 3,709 tok/s/GPU), peak per-GPU throughput 1.3k → 6.4k tok/s/GPU
Three-date comparison: v0.5.8.post1 (Feb 20) → v0.5.10rc0 (Apr 16) → v0.5.12 (May 19) on the same MI355X silicon
Walks through the three AMD-authored upstream PRs that drove the April jump:
- sgl-project/sglang#20736 — fuses shared expert with routed experts in Qwen2/Qwen3.5 MoE as topk+1 in a single AITER dispatch
- sgl-project/sglang#21188 — adds forward_hip path to GemmaRMSNorm (AITER fused kernels instead of native fallback)
- sgl-project/sglang#21421 — integrates AITER fused_topk into SGLang's softmax MoE top-K
Covers Qwen3.5 architecture (397B / 17B active, 512 experts, hybrid Gated DeltaNet + Gated Attention, 256K context) and the TP=8 → TP=2/TP=4 retune that bundled with the April image bump
All numbers sourced from the InferenceX 2026-05-19 run; chart preset linked from both DashboardCTA blocks

Test plan

Replace benchmark-dark.png — I copied the light chart you shared into both light and dark slots so the post renders; a real dark-theme export should land before merging
Verify chart preset URL resolves to the GLM-5… err, Qwen3.5 FP8 view across all 3 dates on MI355X SGLang
Spot-check the iso-interactivity ratios at one or two band points (interpolation is linear on the Pareto frontier of each date)
Confirm the canonical dashboard URL is inferencex.semianalysis.com (your link was a Vercel preview — I normalized to canonical)

🤖 Generated with Claude Code

Note

Low Risk
Content-only addition (MDX); no application logic, auth, or data pipeline changes.

Overview
Adds a new InferenceX blog post (mi355x-qwen3-5-sglang-v0-5-12-up-to-17x.mdx) documenting ~17.7x MI355X SGLang FP8 throughput-per-GPU gains on Qwen3.5-397B-A17B (8k/1k) from Feb → May 2026 on unchanged silicon.

The article walks three SGLang image milestones (v0.5.8.post1, v0.5.10rc0, v0.5.12), ties the April jump to three AITER-gated upstream SGLang PRs (#20736 shared-expert MoE fusion, #21188 HIP GemmaRMSNorm, #21421 fused_topk), and the bundled TP=8 → TP=2/TP=4 InferenceX recipe retune. It includes concurrency tables, iso-interactivity Pareto ratios, a Figure pointing at /images/mi355x-qwen3-5-sglang-v0-5-12-up-to-17x/, twin DashboardCTA links to the canonical InferenceX preset, and JsonLd FAQ schema.

Note: chart assets are referenced but not in this diff; the PR description calls out replacing benchmark-dark.png before merge.

^{Reviewed by Cursor Bugbot for commit b13f595. Bugbot is set up for automated code reviews on this repo. Configure here.}

Three SGLang releases (v0.5.8.post1 -> v0.5.10rc0 -> v0.5.12) plus a TP=8 -> TP=2/TP=4 retune push MI355X Qwen3.5 FP8 8k/1k throughput per GPU up to 17.7x at iso-interactivity at 40 tok/s/user over 3 months, with peak per-GPU throughput climbing 1.3k -> 6.4k tok/s/GPU. The May v0.5.12 bump alone adds 1.43-1.63x on top of the April baseline. Walks through the three AMD-authored upstream PRs that drove the April jump (#20736 shared-expert+routed fusion, #21188 GemmaRMSNorm HIP, #21421 AITER fused_topk), each gated on SGLANG_USE_AITER=1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-25T23:29:10Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
inferencemax-app	Ready	Preview, Comment	May 25, 2026 11:29pm

vercel Bot deployed to Preview May 25, 2026 23:29 View deployment

functionstackx closed this May 25, 2026

functionstackx deleted the blog/mi355x-qwen3-5-sglang-v0-5-12-up-to-17x branch May 25, 2026 23:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(blog): MI355X Qwen3.5 SGLang v0.5.12 up-to-17x in 3 months#379

feat(blog): MI355X Qwen3.5 SGLang v0.5.12 up-to-17x in 3 months#379
functionstackx wants to merge 1 commit into
masterfrom
blog/mi355x-qwen3-5-sglang-v0-5-12-up-to-17x

functionstackx commented May 25, 2026 •

edited by cursor Bot

Loading

Uh oh!

vercel Bot commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented May 25, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

vercel Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

functionstackx commented May 25, 2026 •

edited by cursor Bot

Loading

vercel Bot commented May 25, 2026 •

edited

Loading