Commit 66a1670
feat(blog): B200 NVFP4 vs H100 FP8 on MiniMax-M2.5 — up to 8.2x better perf/$
New post comparing B200 vLLM NVFP4 vs H100 vLLM FP8 on MiniMax-M2.5 8K/1K
(GHA run 26306422380, measured 2026-05-22). Headline: up to 8.2x better
performance per dollar at 110 tok/s/user, growing monotonically from
4.0x at 22 tok/s/user. Decomposes at the peak into a 2.94x generation
step (Blackwell vs Hopper at FP8) and a 2.77x precision step (B200 FP8
→ B200 NVFP4) unlocked by vLLM PR #36307 (the trtllm-gen FP8 MoE
modular kernel that finally accepts MiniMax's routing-logits dtype).
Sections: lede with decomposition, hero throughput chart, model
architecture + M2.7 transferability note, "Why MiniMax-M2.5 Is Worth
Optimizing For" with quality benchmarks vs Claude Opus 4.5/4.6 / Gemini
3 Pro / GPT-5.2, On-Paper H100 vs B200 specs (radar + table + silicon
ratios that bound the perf/$ ceiling), TRT-LLM MoE Kernel Integration
into vLLM (PR #36307 + per-kernel comparison figure), per-config
tables, iso-interactivity perf/$ table, What's Next (MTP, why NVL72
wide-EP isn't the right lever at 10B active, H100 stack room), FAQ.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 13e9dad commit 66a1670
9 files changed
Lines changed: 248 additions & 0 deletions
File tree
- packages/app
- content/blog
- public/images/b200-minimax-m2-5-vllm-nvfp4-vs-h100-fp8-perf-per-dollar
Lines changed: 248 additions & 0 deletions
Large diffs are not rendered by default.
0 commit comments