extend router replay#2703
Open
faresobeid wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replayed experts are only kept if the trainers score on them is above the score of its weakest expert * ratio
Note
Medium Risk
Changes MoE expert selection on the training forward path when filtering is enabled, which can alter gradients and load balancing; default-off config limits blast radius.
Overview
Adds optional plausibility filtering for MoE router replay during RL training. With
trainer.router_replay_score_threshold_ratioset above0, each inference-replayed expert is kept only if the trainer router’s gate score for that expert is at least that fraction of the trainer’s weakest top-k score for the token; rejected slots are backfilled from the trainer’s own top-k picks. The default0leaves behavior unchanged (strict replay of inference routing).Wiring: new trainer config field,
configure_router_replay_filterapplied at model init when router replay is on, and logic inTokenChoiceTopKRouterplus docs for the inference/trainer TOML knobs.torch.histcinputs are cast to float where needed.Reviewed by Cursor Bugbot for commit 7e7f36f. Bugbot is set up for automated code reviews on this repo. Configure here.