feat: Wire WeightSynchronizer into algorithm layer, replacing inline refit logic#2467
Open
saumishr wants to merge 1 commit into
Open
feat: Wire WeightSynchronizer into algorithm layer, replacing inline refit logic#2467saumishr wants to merge 1 commit into
saumishr wants to merge 1 commit into
Conversation
8b5be01 to
accfb63
Compare
This was referenced May 12, 2026
4 tasks
0c6164e to
657abc9
Compare
Contributor
Author
|
/ok to test 657abc9 |
657abc9 to
616f2b0
Compare
Contributor
Author
|
/ok to test 616f2b0 |
df6b623 to
40f6d46
Compare
Contributor
Author
|
/ok to test 40f6d46 |
Collaborator
|
@kajalj22 the L1 tests are timing out at 6hr. is something off? i did notice a lot of downloads where i expected the hf cache to be hit https://github.com/NVIDIA-NeMo/RL/actions/runs/26606449648/job/78418561361?pr=2467 |
Collaborator
|
@saumishr given the nature of the change, could you run nightlies from these categories to make sure there's not some convergence issue due to the refit change #2467 (comment) |
40f6d46 to
f303df0
Compare
Contributor
Author
|
/ok to test f303df0 |
f303df0 to
3555948
Compare
Contributor
Author
|
/ok to test 3555948 |
…refit logic Replaces the inline refit/weight-sync logic in the GRPO algorithm layer with the WeightSynchronizer abstraction (IPC / HTTP / collective transports). Also restores policy.prepare_refit_info() for the megatron-framework generation path, and keeps refits gated on weight_sync.is_stale, which is set only after a training step (matching main's semantics). Signed-off-by: Saurabh Mishra <sauramishra@nvidia.com>
Contributor
Author
|
/ok to test 196fd7c |
29 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Wire the
WeightSynchronizer(from MR 1 / PR #2466) into the algorithm layer, replacing inlinerefit_policy_generation()calls andNEED_REFIT/POLICY_GENERATION_STALEflags withWeightSynchronizermethod calls (sync_weights,mark_stale,is_stale).Issues
Part of the modularity/interfaces initiative. Depends on PR #2466 (
modularity/weight-sync-abc).Usage
Before (old pattern):
After (new pattern):
Tests
Updated test_grpo.py and test_distillation.py to use weight_sync mocks
Before your PR is "Ready for review"
Pre checks:
Additional Information
Stacked on #2466