Skip to content

Add scheduling mechanism and new workload#1025

Merged
helloyongyang merged 11 commits intoModelTC:mainfrom
zhtshr:main
Apr 24, 2026
Merged

Add scheduling mechanism and new workload#1025
helloyongyang merged 11 commits intoModelTC:mainfrom
zhtshr:main

Conversation

@zhtshr
Copy link
Copy Markdown
Contributor

@zhtshr zhtshr commented Apr 20, 2026

This pull request introduces several new configuration files and significant updates to the disaggregated (disagg) connection logic and workload orchestration for the LightX2V project. The main focus is on supporting distributed inference with improved network handling, chunked data transfer, and dynamic workload simulation. The changes enhance reliability, configurability, and usability for running and testing disaggregated video inference pipelines.

Key changes:

New configuration and workload simulation

  • Added four new configuration files for disaggregated controller, encoder, transformer, and decoder modes, each specifying model parameters, quantization settings, RDMA protocol details, and distributed ranks. [1] [2] [3] [4]
  • Introduced a workload staging configuration (wan22_i2v_workload_stages.json) to define warmup and change phases for dynamic load testing.
  • Added run_user.py example script to simulate dynamic user workloads, sending requests to the controller based on stage specifications and supporting configurable request rates.

Disagg connection reliability and protocol improvements

  • Implemented _normalize_loopback_host to ensure consistent use of IPv4 loopback addresses, controlled by the DISAGG_FORCE_IPV4_LOOPBACK environment variable, improving local and mixed-protocol deployments. [1] [2]
  • Enhanced error handling in the transfer loop and status synchronization, logging exceptions and preventing crashes during data transfer and status updates.
  • Added support for chunked data transfer in send_data, controlled by the MOONCAKE_TRANSFER_CHUNK_BYTES environment variable, to handle large tensors more efficiently and robustly.

Protocol and metadata updates

  • Updated ZMQ communication to include receiver_engine_rank in multipart messages for both encoder and transformer threads, ensuring correct routing and status updates in distributed settings. [1] [2] [3] [4] [5]
  • Improved local IP detection logic in mooncake.py to better select a non-loopback IPv4 address for outbound connections, enhancing compatibility in multi-host environments.

New libs

  • locust

These changes collectively improve the flexibility, reliability, and scalability of the disaggregated inference pipeline, making it easier to configure, test, and deploy in distributed environments.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive disaggregated inference framework for Wan2.2 models, featuring a centralized controller for dynamic instance management and a sidecar service for cross-process communication. Key improvements include autoscaling capabilities, enhanced RDMA and Mooncake transfer logic with better error handling, and the integration of a Locust-based workload generator. The review feedback suggests several performance and architectural refinements, including refactoring hardcoded instance initialization in the controller to be configuration-driven, reusing ZMQ contexts to reduce resource overhead, eliminating redundant memory zeroing in RDMA buffers to improve throughput, and moving heavy library imports to the module level for better visibility.

Comment thread lightx2v/disagg/services/controller.py Outdated
Comment on lines +1081 to +1084
for instance_type in ("encoder", "transformer", "decoder"):
address = self.create_instance(instance_type)
for _ in range(5):
self.create_instance("transformer")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The number of service instances is currently hardcoded in the run method (1 encoder, 6 transformers, 1 decoder). This significantly limits the flexibility of the disaggregated system and ignores the ranks and distribution settings provided in the configuration file. This logic should be refactored to dynamically spawn instances based on the configuration to support different hardware setups. Additionally, the address variable on line 1082 is assigned but never used.

Comment on lines +85 to +100
context = zmq.Context()
req = context.socket(zmq.REQ)
req.setsockopt(zmq.RCVTIMEO, 1000)
req.setsockopt(zmq.SNDTIMEO, 1000)
req.connect(req_addr)
try:
req.send_pyobj({"cmd": str(cmd)})
reply = req.recv_pyobj()
if isinstance(reply, dict):
return reply
return None
except Exception:
return None
finally:
req.close(0)
context.term()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Creating a new zmq.Context for every call to _query_sidecar is inefficient as contexts are heavy resources intended to be long-lived. While sockets should remain per-thread or be protected by locks, the context itself should be initialized once (e.g., in __init__) and reused across the class instance.


# Write payload to the selected slot (works for both server-local and client-remote paths).
slot_addr = self.descriptor.slot_addr + offset
self._rdma_write_bytes(slot_addr, b"\x00" * self.slot_size)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Zeroing the entire RDMA slot before writing the payload is redundant and impacts performance, especially for large slot sizes. The implementation already ensures atomicity by writing the length header last (line 291), and the consumer only reads the number of bytes specified by that header. Removing this zeroing step will improve production throughput.

Comment on lines +192 to +193
import numpy as np
import torch
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Heavy libraries like numpy and torch are imported inside the _init_transformer_output_room method. It is better practice to place these at the top of the file to improve readability and make dependencies explicit, especially since this file is the entry point for the sidecar process and these libraries will be needed for most operations.

@helloyongyang
Copy link
Copy Markdown
Contributor

Need fix the conflicts @zhtshr

@helloyongyang helloyongyang merged commit 3566cd5 into ModelTC:main Apr 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants