|
| 1 | +# In-tree sglang patches for the MoRI PD-disagg path |
| 2 | + |
| 3 | +This directory carries small Python overlays that get bind-mounted over |
| 4 | +the upstream sglang source inside the docker container at runtime. |
| 5 | +They are needed because some sglang releases ship known bugs in the |
| 6 | +MoRI disaggregation backend that block our benchmark + accuracy |
| 7 | +configs. |
| 8 | + |
| 9 | +The mount is wired through the `EXTRA_DOCKER_MOUNTS` env var that |
| 10 | +`job.slurm` consumes (an opt-in `${EXTRA_DOCKER_MOUNTS:-}` after the |
| 11 | +existing `-v` block). The local-test driver scripts under |
| 12 | +`scripts/sglang_disagg/` pre-set this env var to the path of the |
| 13 | +relevant overlay; CI runners that need the patch can do the same. |
| 14 | + |
| 15 | +## `mori_conn.py` |
| 16 | + |
| 17 | +Overlays |
| 18 | +`/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py`. |
| 19 | + |
| 20 | +Source: forked from the file shipped in |
| 21 | +`lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260523` |
| 22 | +(sglang [v0.5.12.post1](https://github.com/sgl-project/sglang/tree/v0.5.12.post1)). |
| 23 | +Four logical edits, all confined to `MoriKVReceiver.send_state`, |
| 24 | +`MoriKVReceiver._register_kv_args`, and |
| 25 | +`MoriKVReceiver._send_swa_dsa_state`: |
| 26 | + |
| 27 | +1. **Sender flatten** — handle the framework's nested |
| 28 | + `state_item_lens: List[List[int]]` instead of crashing in the |
| 29 | + naked `struct.pack("I", item_len)` (the legacy `List[int]` |
| 30 | + assumption). Idempotent for legacy flat callers. |
| 31 | +2. **`state_type` legacy fallback** — when the legacy singular |
| 32 | + `kv_args.state_type` is `'none'` but `state_mem_descs` is non-empty, |
| 33 | + read `kv_args.state_types[0]` (the modern plural API that Mooncake |
| 34 | + and NIXL already use). Routes `MAMBA → _send_mamba_state` and |
| 35 | + `DSA/SWA → _send_swa_dsa_state` correctly. |
| 36 | +3. **Consumer normalization** — flatten `state_item_lens` and |
| 37 | + `state_dim_per_tensor` to flat `List[int]` once at the entry of |
| 38 | + `send_state`, so the existing per-tensor index arithmetic |
| 39 | + (`state_item_lens[i]`) and length checks |
| 40 | + (`len(state_item_lens) == len(state_mem_descs)`) keep working. |
| 41 | +4. **DSA index rank+length normalization** — inside |
| 42 | + `_send_swa_dsa_state`, before the `group_concurrent_contiguous` |
| 43 | + call, ravel both `src_state_indices` and `dst_state_indices` to 1-D |
| 44 | + and re-truncate to common length. Upstream's existing truncation |
| 45 | + only slices the outer axis, leaving 2-D `(1, N)` arrays unchanged |
| 46 | + and triggering an `np.diff` broadcasting error |
| 47 | + (`shapes (1,12) (0,)`) for GLM-5 (single-DSA-component) prefill |
| 48 | + traffic. See |
| 49 | + `scripts/sglang_disagg/docs_glm5/01-bug-analysis.md` for the full |
| 50 | + write-up. |
| 51 | + |
| 52 | +Verified passing GSM8K = 0.978 ± 0.004 on Qwen3.5-397B-A17B-FP8 1P+1D |
| 53 | +TP=8 dp-attn=false (matches and slightly exceeds upstream |
| 54 | +[PR #22665](https://github.com/sgl-project/sglang/pull/22665)'s |
| 55 | +reported 0.970 GSM8K on the bf16 baseline). GLM-5 (DSA) verification |
| 56 | +in progress under |
| 57 | +`scripts/sglang_disagg/docs_glm5/02-fix-and-verification.md`. |
| 58 | + |
| 59 | +This is a stop-gap. The proper upstream fix is to migrate MoRI to the |
| 60 | +plural `state_types: List[StateType]` API (full design + diff in |
| 61 | +`scripts/sglang_disagg/docs/03-upstream-pr-proposal.md`). |
| 62 | + |
| 63 | +## How to enable |
| 64 | + |
| 65 | +```bash |
| 66 | +export EXTRA_DOCKER_MOUNTS="-v $DI_REPO_DIR/benchmarks/multi_node/amd_utils/patches/mori_conn.py:/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py:ro" |
| 67 | +``` |
| 68 | + |
| 69 | +`$DI_REPO_DIR` is the InferenceX checkout root that `job.slurm` |
| 70 | +already mounts into the container at `/workspace`. |
| 71 | + |
| 72 | +When this env var is unset (CI default for runs that don't need the |
| 73 | +patch), `${EXTRA_DOCKER_MOUNTS:-}` expands to the empty string and |
| 74 | +container behavior is byte-identical to the unpatched path. |
| 75 | + |
| 76 | +## When to use which patch |
| 77 | + |
| 78 | +| Image / version | Need `mori_conn.py` overlay? | |
| 79 | +|---|---| |
| 80 | +| `lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260523` | yes (Qwen3.5-MoE-FP8, GLM-5, any hybrid model on this image) | |
| 81 | +| `lmsysorg/sglang-rocm:v0.5.10.post1-rocm720-mi35x-*` (used by `dsr1-fp4-*-disagg`) | not validated; same code path likely affected — try with the overlay if you hit the same `struct.error` | |
| 82 | +| `rocm/sgl-dev:sglang-0.5.9-rocm720-mi35x-mori-*` (used by `dsr1-fp8-*-disagg`, `glm5-*-disagg`) | predates [PR #22665](https://github.com/sgl-project/sglang/pull/22665); different code paths; **do not** apply this overlay | |
| 83 | + |
| 84 | +When upstream merges the proper fix (see |
| 85 | +`scripts/sglang_disagg/docs/03-upstream-pr-proposal.md`) and that |
| 86 | +fix lands in a published image, retire this overlay and the |
| 87 | +`EXTRA_DOCKER_MOUNTS` knob can stay (still useful for future patches). |
0 commit comments