[NVIDIA] Upgrade vLLM to v0.11.2 by ankursingh-nv · Pull Request #273 · SemiAnalysisAI/InferenceX

ankursingh-nv · 2025-12-03T17:31:22Z

Updated configs:

Use FP8 kv-cache for GPT-OSS B200.
Remove "custom_ops" from compilation-config for GPT-OSS.
Remove "cudagraph_mode" from compilation-config for GPT-OSS.
Remove VLLM_FLASHINFER_ALLREDUCE_FUSION_THRESHOLDS_MB env var for GPT-OSS.
Remove deprecated "--disable-log-requests" flag.
Rename "cuda-graph-sizes" flag.

Test sweep: https://github.com/InferenceMAX/InferenceMAX/actions/runs/19946962635

Updated configs: - Use FP8 kv-cache for GPT-OSS B200. - Remove "custom_ops" from compilation-config for GPT-OSS. - Remove "cudagraph_mode" from compilation-config for GPT-OSS. - Remove VLLM_FLASHINFER_ALLREDUCE_FUSION_THRESHOLDS_MB env var for GPT-OSS. - Remove deprecated "--disable-log-requests" flag. - Rename "cuda-graph-sizes" flag. Signed-off-by: Po-Han Huang <pohanh@nvidia.com>

cquil11 · 2025-12-03T20:29:50Z

vLLM releasin too fast @mgoin we can't keep up

cquil11 · 2025-12-04T16:35:11Z

@ankursingh-nv is this ready to go once the checks pass?

ankursingh-nv · 2025-12-04T18:57:21Z

@cquil11 should be ready once we get a successful E2E run.

Currently some h100 and h200 jobs are failing (refer https://github.com/InferenceMAX/InferenceMAX/actions/runs/19904064308)

cquil11 · 2025-12-04T19:46:14Z

@ankursingh-nv with vllm 0.11.2 it seems vllm server process attempts to write to host fs, so I added --container-writable to the CoreWeave runners as part of this PR

ankursingh-nv · 2025-12-04T19:48:13Z

@cquil11 interesting.

cquil11 · 2025-12-04T23:33:42Z

@ankursingh-nv @kedarpotdar-nv full sweep here https://github.com/InferenceMAX/InferenceMAX/actions/runs/19946962635

cquil11

lgtm -- thank you!

cquil11 · 2025-12-05T16:30:34Z

@Oseltamivir + viz
they are using FP8 KV cache for GPT OSS now too haha

ankursingh-nv requested a review from a team as a code owner December 3, 2025 17:31

ankursingh-nv marked this pull request as draft December 3, 2025 17:41

functionstackx added h100_gptoss labels Dec 4, 2025

functionstackx temporarily deployed to fork-pr-validation December 4, 2025 03:47 — with GitHub Actions Inactive

functionstackx temporarily deployed to fork-pr-validation December 4, 2025 03:48 — with GitHub Actions Inactive

cquil11 marked this pull request as ready for review December 4, 2025 15:47

cquil11 marked this pull request as draft December 4, 2025 15:47

Merge branch 'main' into dev-pohanh-vllm-v0.11.2

4cf2b16

cquil11 temporarily deployed to fork-pr-validation December 4, 2025 15:48 — with GitHub Actions Inactive

Merge branch 'main' into dev-pohanh-vllm-v0.11.2

f916ccc

cquil11 added h100_gptoss and removed h100_gptoss labels Dec 4, 2025

cquil11 temporarily deployed to fork-pr-validation December 4, 2025 17:18 — with GitHub Actions Inactive

Merge branch 'main' into dev-pohanh-vllm-v0.11.2

78451f2

make cw runners container writable

f1e9e0d

cquil11 marked this pull request as ready for review December 4, 2025 19:46

cquil11 added 3 commits December 4, 2025 16:37

undo make cw runners container writable

bfacf45

coreweave cleanup

15343fe

coreweave cleanup pt 2

2d4316b

cquil11 approved these changes Dec 5, 2025

View reviewed changes

cquil11 reviewed Dec 5, 2025

View reviewed changes

Comment thread benchmarks/gptoss_fp4_h100_docker.sh

cquil11 reviewed Dec 5, 2025

View reviewed changes

Comment thread benchmarks/gptoss_fp4_h100_slurm.sh

cquil11 reviewed Dec 5, 2025

View reviewed changes

Comment thread benchmarks/gptoss_fp4_h200_slurm.sh

cquil11 reviewed Dec 5, 2025

View reviewed changes

Comment thread runners/launch_h200-cw.sh

Merge branch 'main' into dev-pohanh-vllm-v0.11.2

86d013d

cquil11 merged commit 25506e8 into main Dec 5, 2025

cquil11 deleted the dev-pohanh-vllm-v0.11.2 branch December 5, 2025 21:49

cquil11 mentioned this pull request Dec 7, 2025

fix: various follow-up bug fixes #302

Merged

functionstackx added the NVIDIA label Dec 7, 2025

functionstackx temporarily deployed to fork-pr-validation December 7, 2025 21:32 — with GitHub Actions Inactive

cquil11 changed the title ~~Upgrade vLLM to v0.11.2~~ [NVIDIA] Upgrade vLLM to v0.11.2 Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Upgrade vLLM to v0.11.2#273

[NVIDIA] Upgrade vLLM to v0.11.2#273
cquil11 merged 9 commits into
mainfrom
dev-pohanh-vllm-v0.11.2

ankursingh-nv commented Dec 3, 2025 •

edited by cquil11

Loading

Uh oh!

cquil11 commented Dec 3, 2025

Uh oh!

cquil11 commented Dec 4, 2025

Uh oh!

ankursingh-nv commented Dec 4, 2025 •

edited

Loading

Uh oh!

cquil11 commented Dec 4, 2025

Uh oh!

ankursingh-nv commented Dec 4, 2025 •

edited

Loading

Uh oh!

cquil11 commented Dec 4, 2025

Uh oh!

cquil11 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cquil11 commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ankursingh-nv commented Dec 3, 2025 • edited by cquil11 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cquil11 commented Dec 3, 2025

Uh oh!

cquil11 commented Dec 4, 2025

Uh oh!

ankursingh-nv commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cquil11 commented Dec 4, 2025

Uh oh!

ankursingh-nv commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cquil11 commented Dec 4, 2025

Uh oh!

cquil11 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cquil11 commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ankursingh-nv commented Dec 3, 2025 •

edited by cquil11

Loading

ankursingh-nv commented Dec 4, 2025 •

edited

Loading

ankursingh-nv commented Dec 4, 2025 •

edited

Loading