File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change 11# Combined image for the InferenceX llm-d-vllm framework.
22#
3- # Base = ghcr.io/llm-d/llm-d-cuda which already ships vLLM + DeepEP +
4- # NVSHMEM + GDRCopy. We add the EPP, the routing-sidecar, and Envoy on top
5- # so every node in a SLURM allocation can play any role (prefill, decode,
6- # or coordinator) from a single image.
3+ # Base = vllm/vllm-openai (vLLM with the OpenAI-compatible API server).
4+ # We add the EPP, the routing-sidecar, and Envoy on top so every node in
5+ # a SLURM allocation can play any role (prefill, decode, or coordinator)
6+ # from a single image. DeepEP / NVSHMEM / GDRCopy are NOT bundled by
7+ # this base; they are not used by the simple 1P+1D recipe
8+ # (LWS_GROUP_SIZE=1 short-circuits the wide-EP NVSHMEM env in
9+ # server.sh). Wide-EP recipes will need a base that ships them.
710#
811# Configs (epp-config.yaml, envoy.yaml, per-topology recipes) are NOT
912# baked in. They are mounted at runtime by job.slurm so config-only
1013# iteration does not require an image rebuild. See
1114# benchmarks/multi_node/llm-d/job.slurm for the expected mount layout.
1215
13- FROM ghcr.io/llm-d/llm-d-cuda :v0.7 .0
16+ FROM vllm/vllm-openai :v0.22 .0
1417
1518COPY --from=ghcr.io/llm-d/llm-d-router-endpoint-picker-dev:main \
1619 /app/epp /usr/local/bin/epp
You can’t perform that action at this time.
0 commit comments