[TRTLLM-13543][feat] WideEP FT: add EPLB mask-only reconfigure (1b.1)#15525
[TRTLLM-13543][feat] WideEP FT: add EPLB mask-only reconfigure (1b.1)#15525chienchunhung wants to merge 2 commits into
Conversation
ee6dddf to
7651902
Compare
|
/bot run |
|
PR_Github #55089 [ run ] triggered by Bot. Commit: |
|
PR_Github #55089 [ run ] completed with state
|
📝 WalkthroughWalkthroughAdds ChangesMoeLoadBalancer Mask-Only Reconfiguration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/tensorrt_llm/runtime/moeLoadBalancer/moeLoadBalancer.cpp (1)
153-175: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winMake the mask-dependent replica-count invariant a runtime check.
Line 159 makes
totalSlotCountdepend on the active mask, but Line 175 only checks the replica-count sum withassert. In release builds, a stale or mismatchedexpertReplicaCountcan drainpqand makepq.top()undefined behavior. UseTLLM_CHECK_WITH_INFOhere.Proposed fix
- assert(static_cast<int>(allReplicas.size()) == totalSlotCount); + TLLM_CHECK_WITH_INFO(static_cast<int>(allReplicas.size()) == totalSlotCount, + "Replica count sum (%ld) must match active slot count (%d)", static_cast<long>(allReplicas.size()), + totalSlotCount);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/tensorrt_llm/runtime/moeLoadBalancer/moeLoadBalancer.cpp` around lines 153 - 175, The assertion check at the end of the function that compares allReplicas.size() to totalSlotCount is only enforced in debug builds. Since totalSlotCount depends on the active mask and can mismatch with expertReplicaCount, this invariant violation could cause undefined behavior in release builds. Replace the assert statement with TLLM_CHECK_WITH_INFO to ensure this critical runtime check is performed in all build configurations, and provide a descriptive error message that explains the mismatch between the expected slot count and actual replica count.
🧹 Nitpick comments (2)
tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py (1)
992-992: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse 3.10-native typing and an explicit return type in the new API signature.
Align this new method with repo typing rules by switching to
list[int]and annotating-> None.♻️ Proposed fix
- def reconfigure_mask_only(self, dead_ranks: List[int]): + def reconfigure_mask_only(self, dead_ranks: list[int]) -> None:As per coding guidelines, "Always annotate functions with return types" and "Prefer using the built-in types
list,dict, andtupleto the legacytyping.List,typing.Dict, andtyping.Tuple."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py` at line 992, The reconfigure_mask_only method signature does not conform to the repository's typing standards. Update the method signature by replacing the legacy List[int] type hint with the Python 3.10-native list[int], and add an explicit return type annotation of -> None to the method definition to align with the coding guidelines that require all functions to have annotated return types and prefer built-in generic types over typing module imports.Source: Coding guidelines
tests/unittest/_torch/modules/test_moe_load_balancer.py (1)
272-304: 📐 Maintainability & Code Quality | 🔵 TrivialCoverage assessment: sufficient for this PR, with one optional follow-up outside it.
Coverage is sufficient here:
tests/unittest/_torch/modules/test_moe_load_balancer.pyvalidates Python forwarding and active-iteration rejection, andcpp/tests/unit_tests/runtime/moeLoadBalancerTest.cppvalidates dead-rank masking plus fail-closed behavior. Optional follow-up outside this PR: add one non-mocked Python integration case intests/unittest/_torch/modules/test_moe_load_balancer.pyto verify backend fail-closed exceptions surface end-to-end.As per path instructions, "
tests/**: Act as a QA engineer reviewing test changes and coverage for TensorRT-LLM. Keep feedback actionable: suggest concrete list file names and whether coverage is sufficient, insufficient, or needs follow-up outside the PR."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/_torch/modules/test_moe_load_balancer.py` around lines 272 - 304, The current test coverage in test_moe_load_balancer.py is sufficient for this PR, but there is an optional follow-up suggestion to add one non-mocked Python integration test case to the test_moe_load_balancer.py file that verifies backend fail-closed exceptions surface end-to-end. This test should exercise the MoeLoadBalancer with actual backend behavior (not mocked) to ensure that exceptions from fail-closed scenarios are properly propagated and not swallowed or incorrectly handled during the MoeLoadBalancer lifecycle operations.Source: Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/tensorrt_llm/runtime/moeLoadBalancer/moeLoadBalancer.cpp`:
- Around line 1127-1134: The mDeadRankMask is being updated before all layers
are reconfigured with the candidateDeadRankMask, which can expose an
inconsistent state if layer->reconfigureMaskOnly() throws an exception. Move the
entire lock_guard block that updates mDeadRankMask to after the for loop
iterating over mLayers completes successfully. This ensures the shared mask is
only published once all per-layer reconfiguration has succeeded, preventing
other threads from observing a partially-applied mask state.
- Around line 60-78: The `getActiveSlotCount` function does not validate that
the `deadRankMask` vector has the correct size before using it. Currently, if
the mask is shorter than `metaInfo.epSize`, ranks beyond the vector size are
treated as active in the `isRankMasked` check, which can leave dead-rank slots
reachable. Add validation at the beginning of `getActiveSlotCount` to ensure
that if `deadRankMask` is provided (not nullptr), its size must equal
`metaInfo.epSize`. If the size does not match, handle the error appropriately
(such as asserting or returning an error indicator).
---
Outside diff comments:
In `@cpp/tensorrt_llm/runtime/moeLoadBalancer/moeLoadBalancer.cpp`:
- Around line 153-175: The assertion check at the end of the function that
compares allReplicas.size() to totalSlotCount is only enforced in debug builds.
Since totalSlotCount depends on the active mask and can mismatch with
expertReplicaCount, this invariant violation could cause undefined behavior in
release builds. Replace the assert statement with TLLM_CHECK_WITH_INFO to ensure
this critical runtime check is performed in all build configurations, and
provide a descriptive error message that explains the mismatch between the
expected slot count and actual replica count.
---
Nitpick comments:
In `@tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py`:
- Line 992: The reconfigure_mask_only method signature does not conform to the
repository's typing standards. Update the method signature by replacing the
legacy List[int] type hint with the Python 3.10-native list[int], and add an
explicit return type annotation of -> None to the method definition to align
with the coding guidelines that require all functions to have annotated return
types and prefer built-in generic types over typing module imports.
In `@tests/unittest/_torch/modules/test_moe_load_balancer.py`:
- Around line 272-304: The current test coverage in test_moe_load_balancer.py is
sufficient for this PR, but there is an optional follow-up suggestion to add one
non-mocked Python integration test case to the test_moe_load_balancer.py file
that verifies backend fail-closed exceptions surface end-to-end. This test
should exercise the MoeLoadBalancer with actual backend behavior (not mocked) to
ensure that exceptions from fail-closed scenarios are properly propagated and
not swallowed or incorrectly handled during the MoeLoadBalancer lifecycle
operations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8ea43521-47c2-40dc-97de-03fb29f90a56
📒 Files selected for processing (6)
cpp/tensorrt_llm/nanobind/runtime/moeBindings.cppcpp/tensorrt_llm/runtime/moeLoadBalancer/moeLoadBalancer.cppcpp/tensorrt_llm/runtime/moeLoadBalancer/moeLoadBalancer.hcpp/tests/unit_tests/runtime/moeLoadBalancerTest.cpptensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.pytests/unittest/_torch/modules/test_moe_load_balancer.py
7651902 to
801713d
Compare
|
/bot run |
801713d to
088dd5c
Compare
|
/bot run |
|
PR_Github #55289 [ run ] triggered by Bot. Commit: |
|
PR_Github #55290 [ run ] triggered by Bot. Commit: |
|
PR_Github #55289 [ run ] completed with state |
|
PR_Github #55290 [ run ] completed with state
|
|
/bot run --disable-fail-fast --stage-list "L40S-PyTorch-1,H100_PCIe-CPP-1" |
|
PR_Github #55343 [ run ] triggered by Bot. Commit: |
|
PR_Github #55343 [ run ] completed with state |
|
/bot run --disable-fail-fast |
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
088dd5c to
5f391a6
Compare
|
/bot run |
|
PR_Github #55581 [ run ] triggered by Bot. Commit: |
|
PR_Github #55581 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #55614 [ run ] triggered by Bot. Commit: |
|
PR_Github #55614 [ run ] completed with state
|
Summary
Scope
main; no in-flight PR dependency.Validation
git diff --check upstream/main..HEADpython -m py_compile tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py tests/unittest/_torch/modules/test_moe_load_balancer.pySummary by CodeRabbit
New Features
Tests