feat(blog): B200 NVFP4 vs H100 FP8 on MiniMax-M2.5 — up to 8.2x better perf/$#387
Merged
functionstackx merged 2 commits intoMay 26, 2026
Conversation
…r perf/$ New post comparing B200 vLLM NVFP4 vs H100 vLLM FP8 on MiniMax-M2.5 8K/1K (GHA run 26306422380, measured 2026-05-22). Headline: up to 8.2x better performance per dollar at 110 tok/s/user, growing monotonically from 4.0x at 22 tok/s/user. Decomposes at the peak into a 2.94x generation step (Blackwell vs Hopper at FP8) and a 2.77x precision step (B200 FP8 → B200 NVFP4) unlocked by vLLM PR #36307 (the trtllm-gen FP8 MoE modular kernel that finally accepts MiniMax's routing-logits dtype). Sections: lede with decomposition, hero throughput chart, model architecture + M2.7 transferability note, "Why MiniMax-M2.5 Is Worth Optimizing For" with quality benchmarks vs Claude Opus 4.5/4.6 / Gemini 3 Pro / GPT-5.2, On-Paper H100 vs B200 specs (radar + table + silicon ratios that bound the perf/$ ceiling), TRT-LLM MoE Kernel Integration into vLLM (PR #36307 + per-kernel comparison figure), per-config tables, iso-interactivity perf/$ table, What's Next (MTP, why NVL72 wide-EP isn't the right lever at 10B active, H100 stack room), FAQ. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Wider expert parallelism doesn't compound on a 10B-active / 256-small-expert model the way it does on DeepSeek R1 or Kimi K2.5, but disaggregated prefill + decode on NVL72 is still a valid next lever for MiniMax-M2.5 (KV between pools over NVLink 5, decode pool absorbs more concurrency past the single-node saturation knee). Drops the speculative FP4 KV cache and "see MTP bullet" trailers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit d4e9e43. Configure here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Test plan
🤖 Generated with Claude Code
Note
Low Risk
Content-only MDX addition; no application logic, auth, or data-path changes.
Overview
Adds a new InferenceX blog article at
packages/app/content/blog/b200-minimax-m2-5-vllm-nvfp4-vs-h100-fp8-perf-per-dollar.mdxcomparing B200 vLLM NVFP4 vs H100 vLLM FP8 on MiniMax-M2.5 at 8K/1K (2026-05-22 run), with headline up to ~8.2× better perf/$ at iso-interactivity and a decomposition into ~2.94× generation (FP8) plus ~2.77× precision (NVFP4), tied to vLLM PR #36307 (modular trtllm-gen FP8 MoE kernel for MiniMax routing).The post includes benchmark and cost tables, iso-interactivity perf/$ analysis, Figure assets (throughput, GPU-specs radar, coding-quality bars, MoE-kernel comparison), DashboardCTA links to the filtered InferenceX views, and FAQPage JSON-LD for SEO.
Reviewed by Cursor Bugbot for commit d4e9e43. Bugbot is set up for automated code reviews on this repo. Configure here.