You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add LoRA LR multiplier and detach base logits in EAGLE loss
Detach base_outputs.logits when used as soft labels in the EAGLE loss so
gradients do not flow back to LoRA through the label path (which causes
circular collapse). LoRA still receives EAGLE gradients via the hidden-
state path (out_hiddens -> eagle_input_hiddens).
Add eagle_base_lora_lr_multiplier (default 10x) to compensate for the
weaker hidden-state gradient signal: LoRA parameters are split into a
separate optimizer param group with lr = base_lr * multiplier.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
0 commit comments