Commit dc7d203
authored
[feat] Enable openYuanrong RDMA support (#108)
## Description
For 910B nodes with an additional RoCE NIC (besides NPU-side RoCE),
openYuanrong datasystem supports host RDMA (H2H) transport via UCX.
Since TQ routes CPU tensors through KV client and NPU tensors through
tensor client by tensor location, H2H RDMA and RH2D can be enabled
simultaneously — they are **not mutually exclusive**.
Previously, enabling RDMA required manually adding `--enable_rdma true`
to `worker_args` and setting `UCX_TLS=rc_x` in the environment. This PR
introduces dedicated config options for one-click RDMA enablement.
## Changes
1. **`config.yaml`**: Added `enable_rdma` (default `false`) and
`ucx_env_vars` (default `{}`). When `enable_rdma=true`, TQ auto-adds
`--enable_rdma true` to dscli cmd and defaults `UCX_TLS=rc_x`.
`ucx_env_vars` lets users specify UCX env vars (UCX_TLS, UCX_LOG_FILE,
UCX_LOG_LEVEL, UCX_NET_DEVICES, UCX_TCP_CM_ROUTE) with highest priority
over parent env.
2. **`yuanrong_bootstrap.py`**: Wired `enable_rdma` and `ucx_env_vars`
through config → actor → `start_datasystem_worker`. Env priority:
`ucx_env_vars` > parent env > default `UCX_TLS=rc_x`.
3. **`openyuanrong_datasystem.md`**: Added RDMA Options section, updated
config examples, added manual RDMA startup instructions, and added RDMA
FAQ (endpoint timeout, verification, container memlock).
## Related Issues
Closes #98
---------
Signed-off-by: Haichuan Hu <kaisennhu@gmail.com>1 parent 0277d3f commit dc7d203
4 files changed
Lines changed: 137 additions & 27 deletions
File tree
- docs/storage_backends
- transfer_queue
- storage/bootstrap
- utils
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
| 137 | + | |
| 138 | + | |
137 | 139 | | |
138 | 140 | | |
139 | 141 | | |
| |||
143 | 145 | | |
144 | 146 | | |
145 | 147 | | |
146 | | - | |
| 148 | + | |
147 | 149 | | |
148 | 150 | | |
149 | 151 | | |
150 | 152 | | |
151 | 153 | | |
152 | 154 | | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
153 | 166 | | |
154 | 167 | | |
155 | 168 | | |
| |||
178 | 191 | | |
179 | 192 | | |
180 | 193 | | |
| 194 | + | |
| 195 | + | |
181 | 196 | | |
182 | 197 | | |
183 | 198 | | |
| |||
290 | 305 | | |
291 | 306 | | |
292 | 307 | | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
293 | 348 | | |
294 | 349 | | |
295 | 350 | | |
| |||
388 | 443 | | |
389 | 444 | | |
390 | 445 | | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
391 | 457 | | |
392 | 458 | | |
393 | 459 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
64 | 75 | | |
65 | 76 | | |
66 | 77 | | |
67 | 78 | | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
68 | 82 | | |
69 | 83 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| 81 | + | |
| 82 | + | |
81 | 83 | | |
82 | 84 | | |
83 | 85 | | |
| |||
105 | 107 | | |
106 | 108 | | |
107 | 109 | | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
108 | 114 | | |
109 | 115 | | |
110 | 116 | | |
111 | 117 | | |
112 | 118 | | |
113 | 119 | | |
114 | | - | |
| 120 | + | |
115 | 121 | | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
120 | 137 | | |
121 | 138 | | |
122 | 139 | | |
| |||
179 | 196 | | |
180 | 197 | | |
181 | 198 | | |
182 | | - | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
183 | 208 | | |
184 | 209 | | |
185 | 210 | | |
| |||
208 | 233 | | |
209 | 234 | | |
210 | 235 | | |
| 236 | + | |
| 237 | + | |
211 | 238 | | |
212 | 239 | | |
213 | 240 | | |
| |||
236 | 263 | | |
237 | 264 | | |
238 | 265 | | |
| 266 | + | |
| 267 | + | |
239 | 268 | | |
240 | 269 | | |
241 | 270 | | |
| |||
313 | 342 | | |
314 | 343 | | |
315 | 344 | | |
| 345 | + | |
| 346 | + | |
316 | 347 | | |
317 | 348 | | |
318 | 349 | | |
| |||
352 | 383 | | |
353 | 384 | | |
354 | 385 | | |
355 | | - | |
| 386 | + | |
356 | 387 | | |
357 | 388 | | |
358 | 389 | | |
359 | 390 | | |
360 | 391 | | |
361 | 392 | | |
362 | | - | |
| 393 | + | |
363 | 394 | | |
364 | 395 | | |
365 | 396 | | |
| |||
369 | 400 | | |
370 | 401 | | |
371 | 402 | | |
372 | | - | |
373 | | - | |
374 | | - | |
375 | | - | |
376 | 403 | | |
377 | 404 | | |
378 | 405 | | |
| |||
395 | 422 | | |
396 | 423 | | |
397 | 424 | | |
| 425 | + | |
398 | 426 | | |
399 | 427 | | |
400 | 428 | | |
401 | 429 | | |
402 | 430 | | |
403 | 431 | | |
| 432 | + | |
404 | 433 | | |
405 | 434 | | |
406 | | - | |
407 | | - | |
408 | | - | |
409 | | - | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
410 | 439 | | |
411 | 440 | | |
412 | 441 | | |
413 | 442 | | |
414 | 443 | | |
415 | | - | |
416 | 444 | | |
417 | | - | |
| 445 | + | |
418 | 446 | | |
419 | 447 | | |
420 | 448 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
| 156 | + | |
156 | 157 | | |
157 | 158 | | |
158 | 159 | | |
159 | 160 | | |
160 | 161 | | |
161 | | - | |
| 162 | + | |
162 | 163 | | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
167 | 169 | | |
168 | 170 | | |
169 | 171 | | |
| |||
172 | 174 | | |
173 | 175 | | |
174 | 176 | | |
175 | | - | |
| 177 | + | |
176 | 178 | | |
177 | 179 | | |
178 | | - | |
| 180 | + | |
179 | 181 | | |
180 | 182 | | |
181 | 183 | | |
| |||
0 commit comments