Skip to content

Commit 48e459b

Browse files
committed
Add MoRI conn.py overlay for sglang v0.5.12.post1 PD-disaggregation
sglang v0.5.12.post1 ships an unmigrated MoRI backend (legacy singular `state_type` + flat-int wire format) that crashes hybrid-attention models at PD-disagg startup. Concretely, three sites in disaggregation/mori/conn.py break for any model whose state_pool is populated as `state_item_lens: List[List[int]]` (Qwen3.5-MoE, GLM-5, anything using state_types: List[StateType]): 1. _register_kv_args raises `struct.error: required argument is not an integer` because it packs state_item_lens with naked struct.pack("I", item_len), expecting the legacy `List[int]`. 2. send_state raises `state_type is 'none' but state_indices were provided` because it reads the legacy `kv_args.state_type` (singular), which the framework no longer populates for these models. The plural `kv_args.state_types: List[StateType]` -- the attribute Mooncake (mooncake/conn.py:912) and NIXL (nixl/conn.py:1381) already use -- is set correctly. 3. The downstream length check at L935 and per-tensor indexing at L1005/1014/1096 assume flat-int `state_item_lens[i]`, so the run fails with `local state_item_lens count (1) does not match state descriptor count (90)` once registration is unblocked. This commit carries an in-tree overlay of mori/conn.py with three conservative fixes (flatten on the sender, state_types[0] fallback in send_state, normalize state_item_lens once at send_state entry). The overlay is bind-mounted into the container via job.slurm's existing docker invocation, gated on the affected image tag and an opt-out env var: - benchmarks/multi_node/amd_utils/patches/mori_conn.py (new) - benchmarks/multi_node/amd_utils/patches/README.md (new) - benchmarks/multi_node/amd_utils/job.slurm (+1 hook, +1 auto-apply block; both no-op when DOCKER_IMAGE_NAME doesn't contain "v0.5.12.post1" or MORI_CONN_PATCH=skip is set) Validated end-to-end on lmsysorg/sglang-rocm:v0.5.12.post1-rocm720- mi35x-20260523 with Qwen3.5-397B-A17B-FP8 1P+1D TP=8 dp-attn=false: gsm8k strict-match exact_match: 0.9780 +/- 0.004 PASS gsm8k flexible-extract exact_match: 0.9780 +/- 0.004 PASS (Matches and slightly exceeds the 0.970 GSM8K reported by upstream PR sgl-project/sglang#22665, which introduced state-transfer for hybrid models but kept MoRI on the legacy singular API.) The proper upstream fix is to migrate MoRI to the plural state_types API that Mooncake and NIXL already use; full diff is in scripts/sglang_disagg/docs/03-upstream-pr-proposal.md (out-of-tree to keep this PR lean). Tracks sgl-project/sglang#21886.
1 parent 9b1260a commit 48e459b

3 files changed

Lines changed: 1735 additions & 0 deletions

File tree

benchmarks/multi_node/amd_utils/job.slurm

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,32 @@ fi
5252
# $(pwd) is amd_utils/ (the sbatch submit dir); go up 3 levels to reach the repo root.
5353
export DI_REPO_DIR=$(cd "$(pwd)/../../.." && pwd)
5454

55+
# ── In-tree sglang patches: auto-apply for known-affected images ──────
56+
# Some sglang releases ship known bugs in the MoRI PD-disaggregation
57+
# backend that crash hybrid-attention models (Qwen3.5-MoE, GLM-5,
58+
# anything using state_types: List[StateType]) at startup. We carry an
59+
# in-tree overlay of `mori/conn.py` that fixes the wire format + the
60+
# legacy state_type fallback (see benchmarks/multi_node/amd_utils/patches/README.md).
61+
#
62+
# Activate the overlay automatically when the image tag contains a
63+
# known-affected version string, unless the caller explicitly opts out
64+
# by setting MORI_CONN_PATCH=skip in the environment. The overlay is
65+
# appended to ${EXTRA_DOCKER_MOUNTS:-} so callers (CI workflow steps,
66+
# local drivers, etc.) can still inject additional mounts.
67+
#
68+
# Dedups: if EXTRA_DOCKER_MOUNTS already contains the target path,
69+
# don't double-mount (Docker rejects duplicate destinations).
70+
_MORI_PATCH_FILE="$DI_REPO_DIR/benchmarks/multi_node/amd_utils/patches/mori_conn.py"
71+
_MORI_PATCH_TARGET="/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py"
72+
if [[ "${MORI_CONN_PATCH:-auto}" != "skip" ]] \
73+
&& [[ -f "$_MORI_PATCH_FILE" ]] \
74+
&& [[ "${DOCKER_IMAGE_NAME:-}" == *"v0.5.12.post1"* ]] \
75+
&& [[ "${EXTRA_DOCKER_MOUNTS:-}" != *"$_MORI_PATCH_TARGET"* ]]; then
76+
EXTRA_DOCKER_MOUNTS="${EXTRA_DOCKER_MOUNTS:-} -v ${_MORI_PATCH_FILE}:${_MORI_PATCH_TARGET}:ro"
77+
export EXTRA_DOCKER_MOUNTS
78+
echo "[job.slurm] auto-applied MoRI conn.py overlay: ${_MORI_PATCH_FILE}"
79+
fi
80+
5581
xP="${xP:-1}" #-> Number of Prefill Workers
5682
yD="${yD:-1}" #-> Number of Decode Workers
5783

@@ -387,6 +413,7 @@ exec \$_SUDO docker run --rm \
387413
-v /tmp:/run_logs \
388414
-v ${BENCHMARK_LOGS_DIR}:/benchmark_logs \
389415
-v ${DI_REPO_DIR}:${DOCKER_MOUNT_PATH} \
416+
${EXTRA_DOCKER_MOUNTS:-} \
390417
-e SLURM_JOB_ID=\$SLURM_JOB_ID \
391418
-e SLURM_JOB_NODELIST=\$SLURM_JOB_NODELIST \
392419
-e NNODES=\$NNODES \
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# In-tree sglang patches for the MoRI PD-disagg path
2+
3+
This directory carries small Python overlays that get bind-mounted over
4+
the upstream sglang source inside the docker container at runtime.
5+
They are needed because some sglang releases ship known bugs in the
6+
MoRI disaggregation backend that block our benchmark + accuracy
7+
configs.
8+
9+
The mount is wired through the `EXTRA_DOCKER_MOUNTS` env var that
10+
`job.slurm` consumes (an opt-in `${EXTRA_DOCKER_MOUNTS:-}` after the
11+
existing `-v` block). The local-test driver scripts under
12+
`scripts/sglang_disagg/` pre-set this env var to the path of the
13+
relevant overlay; CI runners that need the patch can do the same.
14+
15+
## `mori_conn.py`
16+
17+
Overlays
18+
`/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py`.
19+
20+
Source: forked from the file shipped in
21+
`lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260523`
22+
(sglang [v0.5.12.post1](https://github.com/sgl-project/sglang/tree/v0.5.12.post1)).
23+
Three logical edits, all confined to `MoriKVReceiver.send_state` and
24+
`MoriKVReceiver._register_kv_args`:
25+
26+
1. **Sender flatten** — handle the framework's nested
27+
`state_item_lens: List[List[int]]` instead of crashing in the
28+
naked `struct.pack("I", item_len)` (the legacy `List[int]`
29+
assumption). Idempotent for legacy flat callers.
30+
2. **`state_type` legacy fallback** — when the legacy singular
31+
`kv_args.state_type` is `'none'` but `state_mem_descs` is non-empty,
32+
read `kv_args.state_types[0]` (the modern plural API that Mooncake
33+
and NIXL already use). Routes `MAMBA → _send_mamba_state` and
34+
`DSA/SWA → _send_swa_dsa_state` correctly.
35+
3. **Consumer normalization** — flatten `state_item_lens` and
36+
`state_dim_per_tensor` to flat `List[int]` once at the entry of
37+
`send_state`, so the existing per-tensor index arithmetic
38+
(`state_item_lens[i]`) and length checks
39+
(`len(state_item_lens) == len(state_mem_descs)`) keep working.
40+
41+
Verified passing GSM8K = 0.978 ± 0.004 on Qwen3.5-397B-A17B-FP8 1P+1D
42+
TP=8 dp-attn=false (matches and slightly exceeds upstream
43+
[PR #22665](https://github.com/sgl-project/sglang/pull/22665)'s
44+
reported 0.970 GSM8K on the bf16 baseline).
45+
46+
This is a stop-gap. The proper upstream fix is to migrate MoRI to the
47+
plural `state_types: List[StateType]` API (full design + diff in
48+
`scripts/sglang_disagg/docs/03-upstream-pr-proposal.md`).
49+
50+
## How to enable
51+
52+
```bash
53+
export EXTRA_DOCKER_MOUNTS="-v $DI_REPO_DIR/benchmarks/multi_node/amd_utils/patches/mori_conn.py:/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py:ro"
54+
```
55+
56+
`$DI_REPO_DIR` is the InferenceX checkout root that `job.slurm`
57+
already mounts into the container at `/workspace`.
58+
59+
When this env var is unset (CI default for runs that don't need the
60+
patch), `${EXTRA_DOCKER_MOUNTS:-}` expands to the empty string and
61+
container behavior is byte-identical to the unpatched path.
62+
63+
## When to use which patch
64+
65+
| Image / version | Need `mori_conn.py` overlay? |
66+
|---|---|
67+
| `lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260523` | yes (Qwen3.5-MoE-FP8, GLM-5, any hybrid model on this image) |
68+
| `lmsysorg/sglang-rocm:v0.5.10.post1-rocm720-mi35x-*` (used by `dsr1-fp4-*-disagg`) | not validated; same code path likely affected — try with the overlay if you hit the same `struct.error` |
69+
| `rocm/sgl-dev:sglang-0.5.9-rocm720-mi35x-mori-*` (used by `dsr1-fp8-*-disagg`, `glm5-*-disagg`) | predates [PR #22665](https://github.com/sgl-project/sglang/pull/22665); different code paths; **do not** apply this overlay |
70+
71+
When upstream merges the proper fix (see
72+
`scripts/sglang_disagg/docs/03-upstream-pr-proposal.md`) and that
73+
fix lands in a published image, retire this overlay and the
74+
`EXTRA_DOCKER_MOUNTS` knob can stay (still useful for future patches).

0 commit comments

Comments
 (0)