Skip to content

Commit fa84494

Browse files
yeyu-nvidiaclaude
andcommitted
Revert lora_B init to default zeros (B=0 identity start)
EAGLE gradient through aux_hiddens breaks the B=0 saddle point, so non-zero B init is no longer needed. B=0 means LoRA starts as identity — EAGLE's initial accuracy is unperturbed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>
1 parent b1cef66 commit fa84494

1 file changed

Lines changed: 3 additions & 6 deletions

File tree

modelopt/torch/speculative/plugins/transformers.py

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -565,15 +565,12 @@ def _inject_base_lora(self):
565565
bias="none",
566566
)
567567
inject_adapter_in_model(lora_config, self._base_model, adapter_name="default")
568-
# Unfreeze only the LoRA parameters and initialize lora_B with small random values
569-
# instead of the default zeros. B=0 creates a saddle point where the preservation
570-
# gradient is zero at init, allowing the EAGLE gradient to dominate unopposed.
571-
# A small non-zero B ensures preservation loss is active from step 0.
568+
# Unfreeze only the LoRA parameters. B=0 default init means LoRA starts as identity,
569+
# so EAGLE's initial accuracy is unperturbed. The EAGLE gradient through aux_hiddens
570+
# (not through logits) breaks the B=0 saddle point that affected preservation loss.
572571
for name, param in self._base_model.named_parameters():
573572
if "lora_" in name:
574573
param.requires_grad = True
575-
if "lora_B" in name:
576-
torch.nn.init.normal_(param, std=0.01)
577574

578575
def _set_base_lora_enabled(self, enabled: bool) -> None:
579576
"""Enable or disable LoRA adapters in the base model."""

0 commit comments

Comments
 (0)