chore: update vllm to 11.0 and make changes from PR 102#159
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR updates vLLM from version 0.10.2 to 0.11.0 and incorporates configuration changes that were previously made in PR 102. The changes include adding compilation configuration for CUDA graph mode and fixing command-line argument syntax.
- Updates vLLM Docker image versions from v0.10.2 to v0.11.0
- Adds CUDA graph compilation configuration with "PIECEWISE" mode
- Fixes command-line argument format for the config parameter
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| .github/configs/nvidia-master.yaml | Updates vLLM Docker image versions to v0.11.0 |
| benchmarks/gptoss_fp4_h100_slurm.sh | Adds compilation config and fixes --config argument format |
| benchmarks/gptoss_fp4_h100_docker.sh | Adds compilation config and fixes --config argument format |
| benchmarks/gptoss_fp4_h200_slurm.sh | Adds compilation config for CUDA graph mode |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ba211aa to
7054fad
Compare
|
@cquil11 can u screenshot the test command and send links to the h100, h200, b200 validation excited for p00 of https://github.com/InferenceMAX/InferenceMAX/issues/120 |
|
@cquil11 overall lgtm from reading the code but need validation links |
H100 https://github.com/InferenceMAX/InferenceMAX/actions/runs/19058823557 |
New PR with changes present in https://github.com/InferenceMAX/InferenceMAX/pull/102
Now post-refactor