feat(blog): B200 NVFP4 vs H100 FP8 on MiniMax-M2.5 — up to 8.2x better perf/$ by functionstackx · Pull Request #387 · SemiAnalysisAI/InferenceX-app

functionstackx · 2026-05-26T03:33:58Z

Summary

New blog post comparing B200 vLLM NVFP4 vs H100 vLLM FP8 on MiniMax-M2.5 8K/1K (GHA run 26306422380, measured 2026-05-22).
Headline: up to 8.2x better performance per dollar at 110 tok/s/user (H100 $0.74/M vs B200 NVFP4 $0.091/M); lift grows monotonically from 4.0x at 22 tok/s/user as H100's curve falls off faster than B200 NVFP4's at high interactivity.
Decomposes at the peak into a 2.94x generation step (Blackwell vs Hopper at FP8) and a 2.77x precision step (B200 FP8 → B200 NVFP4), unlocked by vllm-project/vllm #36307 — the trtllm-gen FP8 MoE modular kernel that finally accepts MiniMax's routing-logits dtype.
On-Paper Specs section grounds the gap in silicon: B200 has 2.27x more FP8 compute, 4.55x more FP4 compute than H100 has FP8 compute, and 2.39x more HBM bandwidth at 1.50x the TCO. Silicon ceiling 1.51x–3.03x; measured 8.2x is ~2.7x above that, the gap is the kernel.
Quality benchmarks figure (new): MiniMax M2.5 vs Claude Opus 4.5/4.6 / Gemini 3 Pro / GPT-5.2 on SWE-Bench family + Terminal Bench + VIBE-Pro. Within 1–4 points of Opus across the board, leads on Multi-SWE-Bench.
MoE kernel comparison figure (supplementary, 1K/1K) showing trtllm vs deep_gemm vs triton MoE backends on B200.
M2.7 transferability called out in body and chart captions.
‘What's Next’ honestly notes wide-EP on NVL72 is NOT the right next lever for a 10B-active model — points at B300 NVFP4, FP4 KV cache, MTP instead.

Test plan

Vercel preview renders at `/blog/b200-minimax-m2-5-vllm-nvfp4-vs-h100-fp8-perf-per-dollar`
Hero benchmark, GPU-specs radar, quality-benchmarks, and MoE-kernel-comparison `` blocks render in both themes (same image is in both light + dark slots — drop real dark exports later if desired)
Iso-interactivity table values match the live cost view for spot-checked rows (22 / 50 / 80 / 110)
OG image generates
Sitemap + RSS pick up the new slug
FAQ JSON-LD passes a schema validator

🤖 Generated with Claude Code

Note

Low Risk
Content-only MDX addition; no application logic, auth, or data-path changes.

Overview
Adds a new InferenceX blog article at packages/app/content/blog/b200-minimax-m2-5-vllm-nvfp4-vs-h100-fp8-perf-per-dollar.mdx comparing B200 vLLM NVFP4 vs H100 vLLM FP8 on MiniMax-M2.5 at 8K/1K (2026-05-22 run), with headline up to ~8.2× better perf/$ at iso-interactivity and a decomposition into ~2.94× generation (FP8) plus ~2.77× precision (NVFP4), tied to vLLM PR #36307 (modular trtllm-gen FP8 MoE kernel for MiniMax routing).

The post includes benchmark and cost tables, iso-interactivity perf/$ analysis, Figure assets (throughput, GPU-specs radar, coding-quality bars, MoE-kernel comparison), DashboardCTA links to the filtered InferenceX views, and FAQPage JSON-LD for SEO.

^{Reviewed by Cursor Bugbot for commit d4e9e43. Bugbot is set up for automated code reviews on this repo. Configure here.}

…r perf/$ New post comparing B200 vLLM NVFP4 vs H100 vLLM FP8 on MiniMax-M2.5 8K/1K (GHA run 26306422380, measured 2026-05-22). Headline: up to 8.2x better performance per dollar at 110 tok/s/user, growing monotonically from 4.0x at 22 tok/s/user. Decomposes at the peak into a 2.94x generation step (Blackwell vs Hopper at FP8) and a 2.77x precision step (B200 FP8 → B200 NVFP4) unlocked by vLLM PR #36307 (the trtllm-gen FP8 MoE modular kernel that finally accepts MiniMax's routing-logits dtype). Sections: lede with decomposition, hero throughput chart, model architecture + M2.7 transferability note, "Why MiniMax-M2.5 Is Worth Optimizing For" with quality benchmarks vs Claude Opus 4.5/4.6 / Gemini 3 Pro / GPT-5.2, On-Paper H100 vs B200 specs (radar + table + silicon ratios that bound the perf/$ ceiling), TRT-LLM MoE Kernel Integration into vLLM (PR #36307 + per-kernel comparison figure), per-config tables, iso-interactivity perf/$ table, What's Next (MTP, why NVL72 wide-EP isn't the right lever at 10B active, H100 stack room), FAQ. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-26T03:34:00Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
inferencemax-app	Ready	Preview, Comment	May 26, 2026 3:37am

Wider expert parallelism doesn't compound on a 10B-active / 256-small-expert model the way it does on DeepSeek R1 or Kimi K2.5, but disaggregated prefill + decode on NVL72 is still a valid next lever for MiniMax-M2.5 (KV between pools over NVLink 5, decode pool absorbs more concurrency past the single-node saturation knee). Drops the speculative FP4 KV cache and "see MTP bullet" trailers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d4e9e43. Configure here.}

vercel Bot deployed to Preview May 26, 2026 03:34 View deployment

vercel Bot deployed to Preview May 26, 2026 03:37 View deployment

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread packages/app/content/blog/b200-minimax-m2-5-vllm-nvfp4-vs-h100-fp8-perf-per-dollar.mdx

functionstackx merged commit 36be82b into master May 26, 2026
20 checks passed

functionstackx deleted the blog/b200-minimax-m2-5-vllm-nvfp4-vs-h100-fp8-perf-per-dollar branch May 26, 2026 03:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(blog): B200 NVFP4 vs H100 FP8 on MiniMax-M2.5 — up to 8.2x better perf/$#387

feat(blog): B200 NVFP4 vs H100 FP8 on MiniMax-M2.5 — up to 8.2x better perf/$#387
functionstackx merged 2 commits into
masterfrom
blog/b200-minimax-m2-5-vllm-nvfp4-vs-h100-fp8-perf-per-dollar

functionstackx commented May 26, 2026 •

edited by cursor Bot

Loading

Uh oh!

vercel Bot commented May 26, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented May 26, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

vercel Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

functionstackx commented May 26, 2026 •

edited by cursor Bot

Loading

vercel Bot commented May 26, 2026 •

edited

Loading