fix: handle empty colocated weight buckets#3
Conversation
📝 WalkthroughWalkthroughThe weight update path now skips empty multi-dtype payloads and pads missing gathered buckets with an empty flattened tensor before remote updates. New tests cover empty-bucket gather and padding behavior, and the CPU unittest workflow includes the new test file. ChangesEmpty colocated bucket handling
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
3309275 to
c30e877
Compare
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_empty_colocated_weight_bucket.py`:
- Around line 153-198: The new tests in test_empty_colocated_weight_bucket.py
are only being defined, so when cpu-unittest runs the file directly with python
they never execute. Add a module entrypoint that invokes pytest for this file
when run as a script, or otherwise make the test functions run under direct
execution, and keep the existing test names like
test_empty_colocated_bucket_still_participates_in_gather and
test_source_rank_pads_empty_colocated_bucket_entries unchanged so CI actually
exercises them.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: ef0a6e64-5602-4e68-b865-ccc31512edd8
📒 Files selected for processing (4)
.github/workflows/pr-test.yml.github/workflows/pr-test.yml.j2slime/backends/megatron_utils/update_weight/update_weight_from_tensor.pytests/test_empty_colocated_weight_bucket.py
c30e877 to
7117b7a
Compare
|
@coderabbitai full review |
1 similar comment
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_empty_colocated_weight_bucket.py (1)
15-18: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winMake the fake bucket mode overridable.
With
supports_multi_dtypespinned toTrue, this file never exercises the multi-bucketmax(len(tensors))padding path. A follow-up test that flips it toFalseand feeds mismatched dtype buckets would cover the later-index padding this PR now relies on.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_empty_colocated_weight_bucket.py` around lines 15 - 18, The fake bucket in _FakeFlattenedTensorBucket currently hardcodes supports_multi_dtypes to True, so the test never covers the single-bucket padding branch. Make this mode overridable on the fake class or via its constructor, then add a test path that sets it to False and uses mismatched dtype buckets so the max(len(tensors)) padding behavior in the colocated weight bucket logic is exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_empty_colocated_weight_bucket.py`:
- Around line 15-18: The fake bucket in _FakeFlattenedTensorBucket currently
hardcodes supports_multi_dtypes to True, so the test never covers the
single-bucket padding branch. Make this mode overridable on the fake class or
via its constructor, then add a test path that sets it to False and uses
mismatched dtype buckets so the max(len(tensors)) padding behavior in the
colocated weight bucket logic is exercised.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 2581ea15-d37e-42b7-b276-e59b74624d4e
📒 Files selected for processing (4)
.github/workflows/pr-test.yml.github/workflows/pr-test.yml.j2slime/backends/megatron_utils/update_weight/update_weight_from_tensor.pytests/test_empty_colocated_weight_bucket.py
Summary
Fixes colocated raw weight sync when a tensor-parallel rank has no HF tensors for a given bucket. The previous path still constructed
FlattenedTensorBucket(named_tensors=[]), which can fail before the rank participates in the gather.The fix keeps the gather collective intact, skips bucket construction for local empty inputs, and has the source rank pad missing per-rank bucket entries with a serialized empty flattened tensor before calling the rollout engine.
Why this matters
This is a small weight-sync correctness bug that becomes visible with PP/EP/MoE or model converters that produce uneven raw HF tensor buckets. It is independent of Gemma4, but the Gemma4 MoE proof exercises this path.
Validation
uv run --with pytest python -m pytest tests/test_empty_colocated_weight_bucket.py -quv run --with ruff ruff check slime/backends/megatron_utils/update_weight/update_weight_from_tensor.py tests/test_empty_colocated_weight_bucket.pyuv run --with black black --check slime/backends/megatron_utils/update_weight/update_weight_from_tensor.py tests/test_empty_colocated_weight_bucket.pypython3 -m py_compile slime/backends/megatron_utils/update_weight/update_weight_from_tensor.py tests/test_empty_colocated_weight_bucket.py && git diff --checkNotes
This PR is prepared in the EazyReal fork for review before any upstream THUDM/slime PR is opened.
Summary by CodeRabbit
Bug Fixes
Tests