Skip to content

feat(blog): MI355X Qwen3.5 SGLang v0.5.12 up-to-17x in 3 months#379

Closed
functionstackx wants to merge 1 commit into
masterfrom
blog/mi355x-qwen3-5-sglang-v0-5-12-up-to-17x
Closed

feat(blog): MI355X Qwen3.5 SGLang v0.5.12 up-to-17x in 3 months#379
functionstackx wants to merge 1 commit into
masterfrom
blog/mi355x-qwen3-5-sglang-v0-5-12-up-to-17x

Conversation

@functionstackx
Copy link
Copy Markdown
Contributor

@functionstackx functionstackx commented May 25, 2026

Summary

  • New blog post: 13 weeks after Alibaba's Qwen3.5-397B-A17B release, AMD MI355X SGLang FP8 throughput per GPU on the 8k/1k workload moved up to 17.7x at iso-interactivity at 40 tok/s/user (210 → 3,709 tok/s/GPU), peak per-GPU throughput 1.3k → 6.4k tok/s/GPU
  • Three-date comparison: v0.5.8.post1 (Feb 20) → v0.5.10rc0 (Apr 16) → v0.5.12 (May 19) on the same MI355X silicon
  • Walks through the three AMD-authored upstream PRs that drove the April jump:
  • Covers Qwen3.5 architecture (397B / 17B active, 512 experts, hybrid Gated DeltaNet + Gated Attention, 256K context) and the TP=8 → TP=2/TP=4 retune that bundled with the April image bump
  • All numbers sourced from the InferenceX 2026-05-19 run; chart preset linked from both DashboardCTA blocks

Test plan

  • Replace benchmark-dark.png — I copied the light chart you shared into both light and dark slots so the post renders; a real dark-theme export should land before merging
  • Verify chart preset URL resolves to the GLM-5… err, Qwen3.5 FP8 view across all 3 dates on MI355X SGLang
  • Spot-check the iso-interactivity ratios at one or two band points (interpolation is linear on the Pareto frontier of each date)
  • Confirm the canonical dashboard URL is inferencex.semianalysis.com (your link was a Vercel preview — I normalized to canonical)

🤖 Generated with Claude Code


Note

Low Risk
Content-only addition (MDX); no application logic, auth, or data pipeline changes.

Overview
Adds a new InferenceX blog post (mi355x-qwen3-5-sglang-v0-5-12-up-to-17x.mdx) documenting ~17.7x MI355X SGLang FP8 throughput-per-GPU gains on Qwen3.5-397B-A17B (8k/1k) from Feb → May 2026 on unchanged silicon.

The article walks three SGLang image milestones (v0.5.8.post1, v0.5.10rc0, v0.5.12), ties the April jump to three AITER-gated upstream SGLang PRs (#20736 shared-expert MoE fusion, #21188 HIP GemmaRMSNorm, #21421 fused_topk), and the bundled TP=8 → TP=2/TP=4 InferenceX recipe retune. It includes concurrency tables, iso-interactivity Pareto ratios, a Figure pointing at /images/mi355x-qwen3-5-sglang-v0-5-12-up-to-17x/, twin DashboardCTA links to the canonical InferenceX preset, and JsonLd FAQ schema.

Note: chart assets are referenced but not in this diff; the PR description calls out replacing benchmark-dark.png before merge.

Reviewed by Cursor Bugbot for commit b13f595. Bugbot is set up for automated code reviews on this repo. Configure here.

Three SGLang releases (v0.5.8.post1 -> v0.5.10rc0 -> v0.5.12) plus a
TP=8 -> TP=2/TP=4 retune push MI355X Qwen3.5 FP8 8k/1k throughput per
GPU up to 17.7x at iso-interactivity at 40 tok/s/user over 3 months,
with peak per-GPU throughput climbing 1.3k -> 6.4k tok/s/GPU. The May
v0.5.12 bump alone adds 1.43-1.63x on top of the April baseline.

Walks through the three AMD-authored upstream PRs that drove the
April jump (#20736 shared-expert+routed fusion, #21188 GemmaRMSNorm
HIP, #21421 AITER fused_topk), each gated on SGLANG_USE_AITER=1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
inferencemax-app Ready Ready Preview, Comment May 25, 2026 11:29pm

Request Review

@functionstackx functionstackx deleted the blog/mi355x-qwen3-5-sglang-v0-5-12-up-to-17x branch May 25, 2026 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant