You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: skills/config/SKILL.md
+23Lines changed: 23 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -172,6 +172,29 @@ In TOML, an empty section header does the same:
172
172
[ckpt] # enables checkpointing with defaults
173
173
```
174
174
175
+
### Disaggregated inference
176
+
177
+
For `[deployment] type = "disaggregated"`, P/D NIXL transfer knobs live under `deployment.kv_transport_config`:
178
+
179
+
```toml
180
+
[deployment]
181
+
type = "disaggregated"
182
+
183
+
[deployment.kv_transport_config]
184
+
type = "nixl"
185
+
enable_bidirectional = true
186
+
num_threads = 1
187
+
kv_recompute_threshold = 64
188
+
abort_timeout_seconds = 480
189
+
router_cache_ttl_seconds = 456
190
+
```
191
+
192
+
`enable_bidirectional` defaults to `false`. When it is false, the Slurm templates pass `--pd-kv-cache-ttl-secs 0` to vllm-router so Decode-side KV metadata is not reused.
193
+
`router_cache_ttl_seconds` can be omitted; it defaults to 95% of `abort_timeout_seconds` and must remain lower than the abort timeout.
194
+
The Slurm templates export `abort_timeout_seconds` as both `NIXL_ABORT_TIMEOUT` and vLLM's `VLLM_NIXL_ABORT_REQUEST_TIMEOUT`.
195
+
196
+
P/D NIXL deployments need UCX 1.19 or newer for H200 CUDA buffer registration. The Slurm templates add `$PROJECT_DIR/third_party/ucx` to `LD_LIBRARY_PATH`.
197
+
175
198
## Key files
176
199
177
200
-`src/prime_rl/utils/config.py` — re-exports `BaseConfig` and `cli` from pydantic_config
0 commit comments