Skip to content

Commit 8aa1c18

Browse files
committed
[NV] llm-d: switch base to vllm/vllm-openai:v0.22.0 (pin tag)
Signed-off-by: Ezra Silvera <ezra@il.ibm.com>
1 parent 098af75 commit 8aa1c18

1 file changed

Lines changed: 8 additions & 5 deletions

File tree

benchmarks/llm-d/Dockerfile

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,19 @@
11
# Combined image for the InferenceX llm-d-vllm framework.
22
#
3-
# Base = ghcr.io/llm-d/llm-d-cuda which already ships vLLM + DeepEP +
4-
# NVSHMEM + GDRCopy. We add the EPP, the routing-sidecar, and Envoy on top
5-
# so every node in a SLURM allocation can play any role (prefill, decode,
6-
# or coordinator) from a single image.
3+
# Base = vllm/vllm-openai (vLLM with the OpenAI-compatible API server).
4+
# We add the EPP, the routing-sidecar, and Envoy on top so every node in
5+
# a SLURM allocation can play any role (prefill, decode, or coordinator)
6+
# from a single image. DeepEP / NVSHMEM / GDRCopy are NOT bundled by
7+
# this base; they are not used by the simple 1P+1D recipe
8+
# (LWS_GROUP_SIZE=1 short-circuits the wide-EP NVSHMEM env in
9+
# server.sh). Wide-EP recipes will need a base that ships them.
710
#
811
# Configs (epp-config.yaml, envoy.yaml, per-topology recipes) are NOT
912
# baked in. They are mounted at runtime by job.slurm so config-only
1013
# iteration does not require an image rebuild. See
1114
# benchmarks/multi_node/llm-d/job.slurm for the expected mount layout.
1215

13-
FROM ghcr.io/llm-d/llm-d-cuda:v0.7.0
16+
FROM vllm/vllm-openai:v0.22.0
1417

1518
COPY --from=ghcr.io/llm-d/llm-d-router-endpoint-picker-dev:main \
1619
/app/epp /usr/local/bin/epp

0 commit comments

Comments
 (0)