[Bugfix][MoE] Fix hardcoded SharedExperts output buffer size for DBO ubatches#39033
[Bugfix][MoE] Fix hardcoded SharedExperts output buffer size for DBO ubatches#39033Gregory-Pereira wants to merge 3 commits into
Conversation
…ubatches Signed-off-by: greg pereira <grpereir@redhat.com>
…a boolean Signed-off-by: greg pereira <grpereir@redhat.com>
There was a problem hiding this comment.
Code Review
This pull request replaces the enable_dbo boolean flag with a num_ubatches integer across the Fused MoE layers and runners to support dynamic micro-batching and remove hardcoded buffer sizes. While SharedExperts was updated to use this new parameter for buffer allocation, the DefaultMoERunner currently receives the parameter without storing it or updating its internal indexing and buffer logic, which remains hardcoded or reliant on the old flag.
| num_ubatches: int = 1, | ||
| ): | ||
| super().__init__() | ||
| self.moe_config = moe_config |
There was a problem hiding this comment.
The num_ubatches parameter is introduced here but not stored in the instance. This makes the fix incomplete because DefaultMoERunner itself manages internal buffers in _maybe_init_dp_chunking (lines 270-271) and indexing logic in _slice_and_copy_input (lines 671-674) that are still hardcoded to size 2 or rely solely on enable_dbo.
To fully address the hardcoding issue (mirroring the fix in SharedExperts), please store num_ubatches and update the aforementioned methods to use it for buffer allocation and indexing when num_ubatches > 1.
| num_ubatches: int = 1, | |
| ): | |
| super().__init__() | |
| self.moe_config = moe_config | |
| num_ubatches: int = 1, | |
| ): | |
| super().__init__() | |
| self.moe_config = moe_config | |
| self.num_ubatches = num_ubatches |
There was a problem hiding this comment.
I had previously intended to scope this change just to shared experts / DBO but ill do it for defaultMoERunner too
Signed-off-by: greg pereira <grpereir@redhat.com>
|
logs: |
|
this feature will only be trigger with MoE models --- Qwen/Qwen2.5-VL-3B-Instruct and qwen3-0.6B wont trigger it Did you run into a concrete issue? I dont think we support anything besides 2 ubatches |
|
This pull request has merge conflicts that must be resolved before it can be |
|
This pull request has merge conflicts that must be resolved before it can be |
yewentao256
left a comment
There was a problem hiding this comment.
Is this issue still in main? Please solve the conflicts and take another look
Summary
The
SharedExpertsclass hardcodes its output buffer to[None, None]regardless of the actual number ofubatchesconfigured. This mirrors the same bug that was fixed inWorkspaceManagerin #38853 (cc @yewentao256 and @LucasWilkinson based on similar fix)Test plan
_current_workspacessize #38853)test_moe-smoke-505passed.log
test_moe-full-3537passed.log
These were tested against a tiny qwen3-0.6B no DBO. If we need I can test DBO path too