Skip to content

Commit c3b0d63

Browse files
fuheavengushiqiaohuochaitiantangGACLoveMusisoul
authored
Decentralized disaggregated deployment architecture (#947)
## Summary Integrated Mooncake's disaggregated deployment mode into the runner to provide LightX2V with full **three-stage** disaggregated inference capability. The inference pipeline can be split into **Encoder**, **Transformer**, and **Decoder** nodes, where the **VAE Decoder** is deployed independently on the Decoder node. This support includes both **Wan** and **Qwen** model families. On top of the three-stage foundation, this PR further introduces **decentralized queue scheduling**: a Controller process hosts RDMA metadata ring buffers, Transformer and Decoder run as pull-based workers, and the client only needs **a single HTTP POST to the Encoder**—no more three-way sequential requests. Multiple Transformer workers can be deployed across GPUs for parallel DiT execution. ## Feature Highlights 1. Disaggregated deployment is integrated with the Mooncake engine, enabling efficient RDMA-based data transfer. Inference I/O can reach the **theoretical maximum bandwidth** of the GPUs. 2. The **text encoder** component is integrated with **LightLLM** optimizations. It supports **kernel-level** optimizations and **service-level** optimizations, delivering an additional **~30% performance improvement**. 3. Compared with Mooncake's standalone disagg submission, this integration is implemented within the **local runner**. Currently, it supports both **Wan runner** and **Qwen runner**. 4. In Mooncake's original disagg approach, each stage runs as different threads within a unified process, which creates tight producer/consumer coupling and does not match high-concurrency scenarios. We decouple them into **independent processes**, allowing the three stages (**encoder + transformer + decoder**) to be deployed on different machines and different GPUs. Under high concurrency, this improves throughput. 5. **Decentralized queue scheduling** with RDMA ring buffers (`RDMABuffer`): a Controller hosts request / phase1 / phase2 metadata rings; Encoder publishes dispatch metadata after inference; Transformer and Decoder workers pull tasks from the rings automatically. The client sends **one HTTP request** to the Encoder instead of three sequential POSTs. 6. **Multi-Transformer worker parallelism**: multiple Transformer workers (each with a unique `receiver_engine_rank`) can run on different GPUs. Requests specify `disagg_phase1_receiver_engine_rank` to target a specific worker, enabling round-robin or explicit routing. 7. **True RDMA atomics**: `rdma_faa` upgraded from read-modify-write shim to real `IBV_WR_ATOMIC_FETCH_AND_ADD`; new `rdma_cas` (`IBV_WR_ATOMIC_CMP_AND_SWP`) added. Both RDMAServer and RDMAClient register `REMOTE_ATOMIC` access flags. 8. **Queue metrics & monitoring**: each service (Encoder / Transformer / Decoder) reports queue depth (`queue_sizes`, `queue_total_pending`, `all_queues_empty`) via the Reporter's `set_extra_metrics_provider()` hook, providing real-time pipeline backlog visibility. ## Disaggregated Architecture (Three-Stage Pipeline) Based on the `disagg_mode` configuration, the inference pipeline is physically split into three independent services. Data flows through **Phase1 (Encoder → Transformer)** and **Phase2 (Transformer → Decoder)**, requiring **two Mooncake transfers**. ### Encoder Role (`disagg_mode="encoder"`) - Loads only: - Text Encoder - Image Encoder (for **I2V / I2I**) - VAE Encoder - Skips: - DiT - VAE Decoder (handled by the Decoder node in the three-stage setup) After startup, it performs feature extraction and sends tensors through Mooncake Phase1 to the Transformer node, including: - `context` - `clip_encoder_out` - `vae_encoder_out` - `latent_shape` - (other required intermediate tensors) ### Transformer Role (`disagg_mode="transformer"`) - Loads only: - DiT - Skips: - Encoder - VAE Decoder - (VAE decoding is handled by the Decoder node) After startup, it waits for Phase1 data. Upon receiving it, it performs: - Hash verification - Input assembly - Denoising If `decoder_engine_rank` is configured, it sends the **denoised latent space** to the Decoder node via Mooncake Phase2, and **does not** perform local VAE decoding. ### Decoder Role (`disagg_mode="decode"`) - Loads only: - VAE Decoder - Skips: - Text/Image Encoder - DiT After startup, it enters a Phase2 receive-and-wait state. When it receives the latent space from the Transformer, it performs: - VAE decoding - Saving output videos/images Both task completion status and result files are stored on the Decoder node. ## Decentralized Queue Scheduling ### Architecture ``` ┌──────────┐ HTTP POST ┌──────────┐ Phase1 RDMA ┌─────────────┐ Phase2 RDMA ┌──────────┐ │ Client │ ──────────→ │ Encoder │ ──────────→ │ Transformer │ ──────────→ │ Decoder │ └──────────┘ │ (GPU 0) │ │ (GPU 1/2/3) │ │ (GPU 0) │ └──────────┘ └─────────────┘ └──────────┘ ↑ ↑ ↑ lightx2v.server pull worker ×N pull worker HTTP port 8002 (qwen_t2i_queue_workers) (qwen_t2i_queue_workers) │ ┌──────────┐ │Controller│ ← RDMA metadata ring buffers (always-on) └──────────┘ ``` ### How it differs from standard three-stage | Aspect | Standard three-stage | Decentralized queue | |--------|---------------------|---------------------| | **Client calls** | Must POST to Decoder → Transformer → Encoder separately | Single POST to Encoder HTTP | | **Transformer** | HTTP server, one request at a time | Pull worker, multiple instances consume in parallel | | **Decoder** | HTTP server | Pull worker, auto-consumes Phase2 | | **Request routing** | Client explicitly specifies | Encoder writes RDMA ring, workers pull by rank | | **Result retrieval** | Poll Decoder HTTP | Poll Encoder HTTP | | **Scaling** | Fixed 1:1:1 ratio | N Transformer workers on N GPUs | ### Data flow 1. **Client** POSTs to Encoder HTTP (`/v1/tasks/image/`) with prompt, `data_bootstrap_room` (unique room ID), and `disagg_phase1_receiver_engine_rank` (target Transformer rank). 2. **Encoder** runs Text Encoder inference, creates a per-request Mooncake session, sends feature tensors via Phase1, and publishes dispatch metadata to the Phase1 RDMA ring. 3. **Transformer** (pull worker) consumes the Phase1 ring slot matching its rank, initializes Mooncake Phase1 receiver + Phase2 sender, runs DiT denoising, sends latents via Phase2, and publishes dispatch metadata to the Phase2 RDMA ring. 4. **Decoder** (pull worker) consumes the Phase2 ring, initializes Mooncake Phase2 receiver, runs VAE decode, and saves the output image. 5. **Client** polls Encoder's `/v1/tasks/{task_id}/status` until `completed`. ### Key components - **Controller** (`ControllerService.serve_rdma_dispatch_only()`): hosts three RDMA ring buffers (request / phase1 / phase2), no model loading, always-on background process. - **RDMABuffer** (`rdma_buffer.py`): shared ring buffer over `RDMAServer`/`RDMAClient` with slot-level atomic coordination for multi-producer/multi-consumer JSON dispatch. - **Pull workers** (`qwen_t2i_queue_workers.py`): Transformer and Decoder worker loops that consume from RDMA rings via `disagg_try_consume_phase1()` / `disagg_try_consume_phase2()`, then call `disagg_transformer_prepare_dispatch()` / `disagg_decoder_prepare_dispatch()` to set up per-request Mooncake sessions. --------- Co-authored-by: Gu Shiqiao <77222802+gushiqiao@users.noreply.github.com> Co-authored-by: LiangLiu <1432249204@qq.com> Co-authored-by: PengGao <peng.gaoc@gmail.com> Co-authored-by: Musisoul <106440666+Musisoul@users.noreply.github.com> Co-authored-by: STwangyingrui <86730325+STwangyingrui@users.noreply.github.com> Co-authored-by: root <root@pt-de4c35727a1b4d1b9f27f422f06026ec-worker-0.pt-de4c35727a1b4d1b9f27f422f06026ec.ns-devsft-3460edd0.svc.cluster.local> Co-authored-by: root <root@pt-9b2035a55fe647eeb007584b238e5077-worker-0.pt-9b2035a55fe647eeb007584b238e5077.ns-devsft-3460edd0.svc.cluster.local> Co-authored-by: yihuiwen <617954457@qq.com> Co-authored-by: yihuiwen <yihuiwen@sensetime.com> Co-authored-by: sandy <wangshankun2011@hotmail.com> Co-authored-by: wangshankun <wangshankun@sensetime.com> Co-authored-by: Ian Thompson <37408934+Naist4869@users.noreply.github.com> Co-authored-by: Yang Yong (雍洋) <yongyang1030@163.com> Co-authored-by: qinxinyi <qxy118045534@163.com> Co-authored-by: WateBear <540295877@qq.com> Co-authored-by: Watebear <wushuo@bupt.cn> Co-authored-by: Kane <62586707+Wq-dd@users.noreply.github.com> Co-authored-by: Zhuguanyu Wu <goatwu0415@gmail.com> Co-authored-by: XHPlus <xhplus@163.com> Co-authored-by: Fredy Rivera <fredyriveraacevedo13@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: vivienfanghuagood <89012307+vivienfanghuagood@users.noreply.github.com> Co-authored-by: triple-mu <gpu@163.com> Co-authored-by: llmc-reviewer <llmc_reviewer@163.com> Co-authored-by: R0CKSTAR <yeahdongcn@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com> Co-authored-by: Sita Bérété <sita.berete.3@gmail.com> Co-authored-by: LuoLongZan <2200013198@stu.pku.edu.cn> Co-authored-by: Vivek Bhakta <vivek@wombo.ai> Co-authored-by: xiehao <hxie_chn@163.com> Co-authored-by: root <root@pt-72be2ccd01a14fa18a4b18c6c347f823-worker-0.pt-72be2ccd01a14fa18a4b18c6c347f823.ns-devsft-3460edd0.svc.cluster.local> Co-authored-by: Lihuang-a <3189274310@qq.com> Co-authored-by: Franc1sCai <guanghan@atlasv.com> Co-authored-by: Wu Ruixiao <62665119+kikidouloveme79@users.noreply.github.com> Co-authored-by: wrx <kikidouloveme79@users.noreply.github.com> Co-authored-by: root <root@pt-1566c00962444e589a1c9589088689e2-worker-0.pt-1566c00962444e589a1c9589088689e2.ns-devsft-3460edd0.svc.cluster.local> Co-authored-by: storyicon <storyicon@foxmail.com> Co-authored-by: xjq <xjq314@gmail.com> Co-authored-by: M4jupitercannon <speedforcy@outlook.com> Co-authored-by: Chengtao Lv <lvchengtao0319@gmail.com> Co-authored-by: root <root@pt-0699d18802514bc1b116c156f9ce2bc1-worker-0.pt-0699d18802514bc1b116c156f9ce2bc1.ns-devsft-3460edd0.svc.cluster.local> Co-authored-by: Harahan <yh4717023@gmail.com> Co-authored-by: ziyanxzy <109060006+ziyanxzy@users.noreply.github.com> Co-authored-by: zhtshr <44193225+zhtshr@users.noreply.github.com> Co-authored-by: jasonzhang517 <yzhang298@e.ntu.edu.sg>
1 parent 69648c8 commit c3b0d63

39 files changed

Lines changed: 4144 additions & 423 deletions
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{
2+
"task": "i2i",
3+
"disagg_mode": "decode",
4+
"infer_steps": 40,
5+
"vae_scale_factor": 8,
6+
"vae_z_dim": 16,
7+
"vae_stride": [1, 8, 8],
8+
"target_video_length": 1,
9+
"target_height": 1664,
10+
"target_width": 1664,
11+
"disagg_config": {
12+
"bootstrap_addr": "127.0.0.1",
13+
"bootstrap_room": 2,
14+
"sender_engine_rank": 1,
15+
"receiver_engine_rank": 2,
16+
"protocol": "rdma",
17+
"local_hostname": "localhost",
18+
"metadata_server": "P2PHANDSHAKE"
19+
}
20+
}
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
{
2+
"task": "i2i",
3+
"disagg_mode": "encoder",
4+
"text_encoder_type": "lightllm_kernel",
5+
"text_encoder_dim": 3584,
6+
"text_len": 4096,
7+
"infer_steps": 40,
8+
"prompt_template_encode": "<|im_start|>system\nDescribe the key features of the input image (color, shape, size, texture, objects, background), then explain how the user's text instruction should alter or modify the image. Generate a new image that meets the user's requirements while maintaining consistency with the original input where appropriate.<|im_end|>\n<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n",
9+
"prompt_template_encode_start_idx": 64,
10+
"resize_mode": "adaptive",
11+
"attn_type": "flash_attn3",
12+
"enable_cfg": true,
13+
"sample_guide_scale": 4.0,
14+
"CONDITION_IMAGE_SIZE": 147456,
15+
"USE_IMAGE_ID_IN_PROMPT": true,
16+
"vae_z_dim": 16,
17+
"vae_stride": [1, 8, 8],
18+
"target_video_length": 1,
19+
"target_height": 1664,
20+
"target_width": 1664,
21+
"disagg_config": {
22+
"bootstrap_addr": "127.0.0.1",
23+
"bootstrap_room": 1,
24+
"sender_engine_rank": 0,
25+
"receiver_engine_rank": 1,
26+
"protocol": "rdma",
27+
"local_hostname": "localhost",
28+
"metadata_server": "P2PHANDSHAKE",
29+
"device_name": ""
30+
}
31+
}
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
{
2+
"task": "i2i",
3+
"disagg_mode": "transformer",
4+
"text_encoder_dim": 3584,
5+
"text_len": 4096,
6+
"infer_steps": 40,
7+
"prompt_template_encode": "<|im_start|>system\nDescribe the key features of the input image (color, shape, size, texture, objects, background), then explain how the user's text instruction should alter or modify the image. Generate a new image that meets the user's requirements while maintaining consistency with the original input where appropriate.<|im_end|>\n<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n",
8+
"prompt_template_encode_start_idx": 64,
9+
"resize_mode": "adaptive",
10+
"attn_type": "flash_attn3",
11+
"enable_cfg": true,
12+
"sample_guide_scale": 4.0,
13+
"CONDITION_IMAGE_SIZE": 147456,
14+
"USE_IMAGE_ID_IN_PROMPT": true,
15+
"vae_z_dim": 16,
16+
"vae_stride": [1, 8, 8],
17+
"target_video_length": 1,
18+
"target_height": 1664,
19+
"target_width": 1664,
20+
"disagg_config": {
21+
"bootstrap_addr": "127.0.0.1",
22+
"bootstrap_room": 1,
23+
"sender_engine_rank": 0,
24+
"receiver_engine_rank": 1,
25+
"protocol": "rdma",
26+
"local_hostname": "localhost",
27+
"metadata_server": "P2PHANDSHAKE",
28+
"device_name": "",
29+
"decoder_engine_rank": 2,
30+
"decoder_bootstrap_room": 2
31+
}
32+
}
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
{
2+
"task": "t2i",
3+
"model_cls": "qwen_image",
4+
"text_len": 4096,
5+
"text_encoder_dim": 3584,
6+
"infer_steps": 50,
7+
"aspect_ratio": "16:9",
8+
"max_custom_size": 3072,
9+
"vae_z_dim": 16,
10+
"vae_stride": [1, 8, 8],
11+
"sample_guide_scale": 4.0,
12+
"enable_cfg": true,
13+
"disagg_mode": "controller",
14+
"disagg_config": {
15+
"bootstrap_addr": "127.0.0.1",
16+
"bootstrap_room": 0,
17+
"encoder_engine_rank": 0,
18+
"transformer_engine_rank": 1,
19+
"decoder_engine_rank": 4,
20+
"protocol": "rdma",
21+
"local_hostname": "localhost",
22+
"metadata_server": "P2PHANDSHAKE",
23+
"rdma_buffer_slots": 128,
24+
"rdma_buffer_slot_size": 4096,
25+
"rdma_request_handshake_port": 5566,
26+
"rdma_phase1_handshake_port": 5567,
27+
"rdma_phase2_handshake_port": 5568
28+
}
29+
}
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{
2+
"task": "t2i",
3+
"disagg_mode": "decode",
4+
"infer_steps": 50,
5+
"vae_scale_factor": 8,
6+
"vae_z_dim": 16,
7+
"vae_stride": [1, 8, 8],
8+
"target_video_length": 1,
9+
"target_height": 1536,
10+
"target_width": 2752,
11+
"max_custom_size": 3072,
12+
"disagg_config": {
13+
"bootstrap_addr": "127.0.0.1",
14+
"bootstrap_room": 2,
15+
"sender_engine_rank": 1,
16+
"receiver_engine_rank": 2,
17+
"protocol": "rdma",
18+
"local_hostname": "localhost",
19+
"metadata_server": "P2PHANDSHAKE"
20+
}
21+
}
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
{
2+
"task": "t2i",
3+
"model_cls": "qwen_image",
4+
"disagg_mode": "decode",
5+
"infer_steps": 50,
6+
"vae_scale_factor": 8,
7+
"vae_z_dim": 16,
8+
"vae_stride": [1, 8, 8],
9+
"disagg_config": {
10+
"bootstrap_addr": "127.0.0.1",
11+
"bootstrap_room": 0,
12+
"sender_engine_rank": 1,
13+
"receiver_engine_rank": 4,
14+
"protocol": "rdma",
15+
"local_hostname": "localhost",
16+
"metadata_server": "P2PHANDSHAKE",
17+
"decentralized_queue": true,
18+
"encoder_engine_rank": 0,
19+
"transformer_engine_rank": 1,
20+
"decoder_engine_rank": 4,
21+
"rdma_phase1_host": "127.0.0.1",
22+
"rdma_phase1_handshake_port": 5567,
23+
"rdma_phase2_host": "127.0.0.1",
24+
"rdma_phase2_handshake_port": 5568,
25+
"rdma_buffer_slots": 128,
26+
"rdma_buffer_slot_size": 4096
27+
}
28+
}
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
{
2+
"task": "t2i",
3+
"disagg_mode": "encoder",
4+
"text_encoder_type": "lightllm_kernel",
5+
"text_encoder_dim": 3584,
6+
"text_len": 4096,
7+
"infer_steps": 50,
8+
"aspect_ratio": "16:9",
9+
"max_custom_size": 3072,
10+
"vae_z_dim": 16,
11+
"vae_stride": [1, 8, 8],
12+
"target_video_length": 1,
13+
"target_height": 1536,
14+
"target_width": 2752,
15+
"prompt_template_encode": "<|im_start|>system\nDescribe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:\n<|im_start|>user\n{}\n<|im_start|>assistant\n",
16+
"prompt_template_encode_start_idx": 34,
17+
"attn_type": "flash_attn3",
18+
"enable_cfg": true,
19+
"sample_guide_scale": 4.0,
20+
"disagg_config": {
21+
"bootstrap_addr": "127.0.0.1",
22+
"bootstrap_room": 0,
23+
"sender_engine_rank": 0,
24+
"receiver_engine_rank": 1,
25+
"protocol": "rdma",
26+
"local_hostname": "localhost",
27+
"metadata_server": "P2PHANDSHAKE",
28+
"device_name": ""
29+
}
30+
}
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
{
2+
"task": "t2i",
3+
"model_cls": "qwen_image",
4+
"disagg_mode": "encoder",
5+
"text_encoder_type": "lightllm_kernel",
6+
"text_encoder_dim": 3584,
7+
"text_len": 4096,
8+
"infer_steps": 50,
9+
"aspect_ratio": "16:9",
10+
"max_custom_size": 3072,
11+
"vae_z_dim": 16,
12+
"vae_stride": [1, 8, 8],
13+
"prompt_template_encode": "<|im_start|>system\nDescribe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:\n<|im_start|>user\n{}\n<|im_start|>assistant\n",
14+
"prompt_template_encode_start_idx": 34,
15+
"attn_type": "flash_attn3",
16+
"enable_cfg": true,
17+
"sample_guide_scale": 4.0,
18+
"disagg_config": {
19+
"bootstrap_addr": "127.0.0.1",
20+
"bootstrap_room": 0,
21+
"sender_engine_rank": 0,
22+
"receiver_engine_rank": 1,
23+
"protocol": "rdma",
24+
"local_hostname": "localhost",
25+
"metadata_server": "P2PHANDSHAKE",
26+
"device_name": "",
27+
"decentralized_queue": true,
28+
"rdma_phase1_host": "127.0.0.1",
29+
"rdma_phase1_handshake_port": 5567,
30+
"rdma_buffer_slots": 128,
31+
"rdma_buffer_slot_size": 4096
32+
}
33+
}
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
{
2+
"task": "t2i",
3+
"disagg_mode": "transformer",
4+
"text_encoder_dim": 3584,
5+
"text_len": 4096,
6+
"infer_steps": 50,
7+
"aspect_ratio": "16:9",
8+
"max_custom_size": 3072,
9+
"prompt_template_encode": "<|im_start|>system\nDescribe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>\n<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n",
10+
"prompt_template_encode_start_idx": 34,
11+
"attn_type": "flash_attn2",
12+
"enable_cfg": true,
13+
"sample_guide_scale": 4.0,
14+
"vae_z_dim": 16,
15+
"vae_stride": [1, 8, 8],
16+
"target_video_length": 1,
17+
"target_height": 1536,
18+
"target_width": 2752,
19+
"dit_original_ckpt": "/home/fuhaiwen/models/qwen-2512/base_dit_info_v060_res2k_9k_3k_25kiter.safetensors",
20+
"disagg_config": {
21+
"bootstrap_addr": "127.0.0.1",
22+
"bootstrap_room": 0,
23+
"sender_engine_rank": 0,
24+
"receiver_engine_rank": 1,
25+
"protocol": "rdma",
26+
"local_hostname": "localhost",
27+
"metadata_server": "P2PHANDSHAKE",
28+
"device_name": "",
29+
"decoder_engine_rank": 2,
30+
"decoder_bootstrap_room": 2
31+
}
32+
}
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
{
2+
"task": "t2i",
3+
"model_cls": "qwen_image",
4+
"disagg_mode": "transformer",
5+
"text_encoder_dim": 3584,
6+
"text_len": 4096,
7+
"infer_steps": 50,
8+
"aspect_ratio": "16:9",
9+
"max_custom_size": 3072,
10+
"prompt_template_encode": "<|im_start|>system\nDescribe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:\n<|im_start|>user\n{}\n<|im_start|>assistant\n",
11+
"prompt_template_encode_start_idx": 34,
12+
"attn_type": "flash_attn2",
13+
"enable_cfg": true,
14+
"sample_guide_scale": 4.0,
15+
"vae_z_dim": 16,
16+
"vae_stride": [1, 8, 8],
17+
"target_video_length": 1,
18+
"target_height": 1664,
19+
"target_width": 1664,
20+
"disagg_config": {
21+
"bootstrap_addr": "127.0.0.1",
22+
"bootstrap_room": 0,
23+
"sender_engine_rank": 0,
24+
"receiver_engine_rank": 1,
25+
"decoder_engine_rank": 4,
26+
"decoder_bootstrap_room": 0,
27+
"protocol": "rdma",
28+
"local_hostname": "localhost",
29+
"metadata_server": "P2PHANDSHAKE",
30+
"device_name": "",
31+
"decentralized_queue": true,
32+
"encoder_engine_rank": 0,
33+
"transformer_engine_rank": 1,
34+
"rdma_phase1_host": "127.0.0.1",
35+
"rdma_phase1_handshake_port": 5567,
36+
"rdma_phase2_host": "127.0.0.1",
37+
"rdma_phase2_handshake_port": 5568,
38+
"rdma_buffer_slots": 128,
39+
"rdma_buffer_slot_size": 4096
40+
}
41+
}

0 commit comments

Comments
 (0)