fix: use alpha/rank scaling in LoRaLayer (standard LoRA convention)#846
Conversation
LoRaLayer used raw `alpha` as the scaling factor instead of `alpha / rank`. With the default alpha=16, rank=8, this made the LoRA contribution 8x larger than PEFT, the original LoRA paper, and mlx-lm. Before: scale = alpha = 16.0 After: scale = alpha / rank = 2.0 Also fixes replace_lora_with_linear to use the same corrected scale. Added tests verifying: - scale = alpha / rank - Forward pass produces (alpha/rank) * (x @ A @ B) - Default settings give 2x scaling, not 16x Fixes Blaizzy#845
There was a problem hiding this comment.
Pull request overview
Fixes LoRA scaling in LoRaLayer to follow the standard LoRA convention (alpha / rank) so LoRA updates match PEFT / LoRA paper expectations and reduce unintended amplification.
Changes:
- Compute and store
self.scale = alpha / rankinLoRaLayerand use it in the forward pass. - Update LoRA weight-merge logic to use
scalewhen producing the merged delta. - Add unit tests validating scaling math and forward-pass contribution.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
mlx_vlm/trainer/lora.py |
Switches from raw alpha to alpha/rank scaling in forward and merge code. |
mlx_vlm/tests/test_trainer_utils.py |
Adds tests to validate LoRA scaling behavior and expected contribution size. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@kikoncuo LGTM, did you run the test? might have to change due to the missing B=0 test |
Verifies that when B is zeros (default init), the LoRA layer output equals the base linear layer output exactly (no LoRA contribution). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
I ran the existing tests, all 8 passed. |
|
@Goekdeniz-Guelmez need your sign off here before merging |
|
I think it would be better to move the new tests inside test_trainer.py file and not create a new test file. Other then that LGTM |
Move TestLoRaScaling class from test_trainer_utils.py into test_trainer.py as suggested in review, and revert test_trainer_utils.py to its original state. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Test moved! |
|
@Blaizzy I think this can be merged now! |
|
@Goekdeniz-Guelmez @Blaizzy any progress here? do you need anything from me? |
Summary
LoRaLayeruses rawalphaas the scaling factor instead ofalpha / rank. This makes the LoRA contributionranktimes larger than the standard convention used by PEFT, the original LoRA paper, and mlx-lm.The bug
Proof
With
alpha=16, rank=8and deterministic weights:Fix
self.scale = alpha / rankin__init__(instead ofself.alpha = alpha)self.scalein__call__andreplace_lora_with_linearTests added
4 new tests in
test_trainer_utils.py:test_scale_is_alpha_over_rank— verifies scale = alpha/ranktest_scale_with_rank_equals_alpha— verifies alpha=rank gives scale=1test_forward_scaling_matches_peft— verifies forward pass matches (alpha/rank) * deltatest_default_alpha_rank_gives_2x— verifies default settings give 2x, not 16xAll 8 tests pass.
References
self.scaling = lora_alpha / ralpha / rankFixes #845