Commit cc2330c
feat(blog): B200 NVFP4 vs H200 INT4 on Kimi K2.5/K2.6 — up to 2.95x better perf/$
On 8K/1K with vllm/vllm-openai:v0.21.0, B200 NVFP4 is 2.71x-2.95x cheaper
per million tokens than H200 INT4 across the 30-90 tok/s/user serving band
(peak 2.95x at 32 tok/s/user, .140/M vs .413/M). The cost gap decomposes
into B200's silicon ratios over H200 (1.67x HBM BW, 1.28x HBM capacity that
unlocks TP=4 vs TP=8, no FP4 tensor cores on Hopper at all) composed with
the NVFP4 precision unlock, divided by B200's 1.38x TCO penalty. Kimi K2.5
and K2.6 are the open-weights models powering xAI's Cursor Composer 2 and
Composer 2.5, leading SWE-Bench Pro at 58.6% over GPT-5.4 / Opus 4.6 /
Gemini 3.1 Pro. Same backbone across both releases — K2.6 is a post-training
refinement of K2.5 — so every serving curve applies one-to-one to both.
Also adds an X-not-Y antithesis ban to the write-inferencex-blog SKILL
house style ("the gap is silicon x precision, not framework" etc.). Reads
as performatively contrarian AI flexing and was getting reflexively cut on
review; codifying so future drafts don't repeat it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 19ae49e commit cc2330c
8 files changed
Lines changed: 208 additions & 0 deletions
File tree
- .claude/skills/write-inferencex-blog
- packages/app
- content/blog
- public/images/b200-nvfp4-vs-h200-int4-kimi-k2-vllm-perf-per-dollar
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
362 | 362 | | |
363 | 363 | | |
364 | 364 | | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
365 | 372 | | |
366 | 373 | | |
367 | 374 | | |
| |||
Lines changed: 201 additions & 0 deletions
Large diffs are not rendered by default.
0 commit comments