Commit 975194f
Update kimik2.5-fp4-b200-vllm vLLM image to v0.21.0 (#1395)
* $Update kimik2.5-fp4-b200-vllm vLLM image to v0.20.2\n\nRef #1154\n\nCo-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>
* fix(kimik2.5_fp4_b200.sh): raise --gpu-memory-utilization 0.90 -> 0.98
vLLM v0.20.2's CUDA-graph memory profiling subtracts an aggressive
chunk from the requested utilization, leaving negative space for the
KV cache (-39.49 GiB observed). Raising to 0.98 gives the profiler
enough headroom to land KV cache positive while still keeping ~2% as
hard buffer.
Alternative would have been setting VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=0,
but raising the cap is the minimum-blast-radius fix and matches what
similar B200 recipes use.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(kimik2.5_fp4_b200.sh): disable CUDA-graph memory estimator + restore 0.90 mem-util
Raising --gpu-memory-utilization to 0.98 wasn't enough — vLLM v0.20.2's
CUDA-graph memory profiler still pre-reserves ~57 GB/GPU upfront, leaving
the effective utilization at ~0.66 and the KV cache at -25 GiB (engine
won't start).
Disable the estimator with VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=0
(same pattern as benchmarks/single_node/agentic/kimik2.5_fp4_b200.sh:65)
and revert --gpu-memory-utilization back to 0.90. The 0.90 reservation
already leaves ~18 GB/GPU as the same safety net the estimator was
trying to enforce.
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Bryan Shan <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: claude-fix-bot <claude-fix-bot@local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>1 parent 82914ea commit 975194f
3 files changed
Lines changed: 15 additions & 1 deletion
File tree
- .github/configs
- benchmarks/single_node
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2632 | 2632 | | |
2633 | 2633 | | |
2634 | 2634 | | |
2635 | | - | |
| 2635 | + | |
2636 | 2636 | | |
2637 | 2637 | | |
2638 | 2638 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
36 | 43 | | |
37 | 44 | | |
38 | 45 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2704 | 2704 | | |
2705 | 2705 | | |
2706 | 2706 | | |
| 2707 | + | |
| 2708 | + | |
| 2709 | + | |
| 2710 | + | |
| 2711 | + | |
| 2712 | + | |
| 2713 | + | |
0 commit comments