Skip to content

Commit 66a1670

Browse files
feat(blog): B200 NVFP4 vs H100 FP8 on MiniMax-M2.5 — up to 8.2x better perf/$
New post comparing B200 vLLM NVFP4 vs H100 vLLM FP8 on MiniMax-M2.5 8K/1K (GHA run 26306422380, measured 2026-05-22). Headline: up to 8.2x better performance per dollar at 110 tok/s/user, growing monotonically from 4.0x at 22 tok/s/user. Decomposes at the peak into a 2.94x generation step (Blackwell vs Hopper at FP8) and a 2.77x precision step (B200 FP8 → B200 NVFP4) unlocked by vLLM PR #36307 (the trtllm-gen FP8 MoE modular kernel that finally accepts MiniMax's routing-logits dtype). Sections: lede with decomposition, hero throughput chart, model architecture + M2.7 transferability note, "Why MiniMax-M2.5 Is Worth Optimizing For" with quality benchmarks vs Claude Opus 4.5/4.6 / Gemini 3 Pro / GPT-5.2, On-Paper H100 vs B200 specs (radar + table + silicon ratios that bound the perf/$ ceiling), TRT-LLM MoE Kernel Integration into vLLM (PR #36307 + per-kernel comparison figure), per-config tables, iso-interactivity perf/$ table, What's Next (MTP, why NVL72 wide-EP isn't the right lever at 10B active, H100 stack room), FAQ. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 13e9dad commit 66a1670

9 files changed

Lines changed: 248 additions & 0 deletions

File tree

packages/app/content/blog/b200-minimax-m2-5-vllm-nvfp4-vs-h100-fp8-perf-per-dollar.mdx

Lines changed: 248 additions & 0 deletions
Large diffs are not rendered by default.
188 KB
Loading
188 KB
Loading
224 KB
Loading
224 KB
Loading
584 KB
Loading
584 KB
Loading
108 KB
Loading
108 KB
Loading

0 commit comments

Comments
 (0)