Skip to content

Commit 873fe04

Browse files
committed
[NV] llm-d: address PR #1660 review (C11 split MODEL into path and served-name)
Signed-off-by: Ezra Silvera <ezra@il.ibm.com>
1 parent fe5870b commit 873fe04

1 file changed

Lines changed: 8 additions & 2 deletions

File tree

benchmarks/multi_node/llm-d/server.sh

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,12 @@ EPP_GRPC_PORT=9002
3030
EPP_HEALTH_PORT=9003
3131
EPP_METRICS_PORT=9090
3232

33-
MODEL="${MODEL_DIR}/${MODEL_NAME}"
33+
# Filesystem path to the weights inside the container. job.slurm mounts
34+
# the host model directory at /models and sets MODEL_DIR=/models, so the
35+
# weights live directly under MODEL_DIR. MODEL_NAME is the OpenAI-API
36+
# served name passed via --served-model-name; it is not part of the
37+
# filesystem path.
38+
MODEL="${MODEL_DIR}"
3439
HOST_IP=$(ip route get 1.1.1.1 | awk '/src/ {print $7}')
3540
# Default NIC for NCCL / Gloo / NVSHMEM bootstrap. Pulled from the same
3641
# default route HOST_IP came from so the iface and the IP stay
@@ -135,6 +140,7 @@ KV_TRANSFER_CONFIG='{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_load
135140

136141
COMMON_ARGS=(
137142
--port "$VLLM_PORT"
143+
--served-model-name "$MODEL_NAME"
138144
--trust-remote-code
139145
--api-server-count 1
140146
--disable-access-log-for-endpoints=/health,/metrics
@@ -283,7 +289,7 @@ PY
283289
# Bench against Envoy. EPP routes to decode (and decode sidecar
284290
# pulls from prefill via NIXL).
285291
run_benchmark_serving \
286-
--model "$MODEL" \
292+
--model "$MODEL_NAME" \
287293
--port "$ENVOY_PORT" \
288294
--backend openai \
289295
--input-len "$BENCH_INPUT_LEN" \

0 commit comments

Comments
 (0)