Skip to content

fix: use alpha/rank scaling in LoRaLayer (standard LoRA convention)#846

Merged
Blaizzy merged 4 commits into
Blaizzy:mainfrom
kikoncuo:fix/lora-alpha-scaling
Apr 18, 2026
Merged

fix: use alpha/rank scaling in LoRaLayer (standard LoRA convention)#846
Blaizzy merged 4 commits into
Blaizzy:mainfrom
kikoncuo:fix/lora-alpha-scaling

Conversation

@kikoncuo
Copy link
Copy Markdown
Contributor

Summary

LoRaLayer uses raw alpha as the scaling factor instead of alpha / rank. This makes the LoRA contribution rank times larger than the standard convention used by PEFT, the original LoRA paper, and mlx-lm.

The bug

# Current (lora.py line 38):
return y + (self.alpha * lora_update)     # scale = 16

# Should be:
return y + (self.scale * lora_update)     # scale = alpha/rank = 2

Proof

With alpha=16, rank=8 and deterministic weights:

from mlx_vlm.trainer.lora import LoRaLayer

linear = nn.Linear(4, 4)
lora = LoRaLayer(linear, rank=8, alpha=16.0)
lora.A = mx.ones((4, 8))
lora.B = mx.ones((8, 4))
x = mx.ones((1, 4))

base = linear(x)
actual = lora(x)
contribution = (actual - base)[0, 0].item()
# Current:  512.0  (alpha * x @ A @ B = 16 * 32)
# Expected:  64.0  (alpha/rank * x @ A @ B = 2 * 32)
# Ratio: 8x too large

Fix

  • Store self.scale = alpha / rank in __init__ (instead of self.alpha = alpha)
  • Use self.scale in __call__ and replace_lora_with_linear

Tests added

4 new tests in test_trainer_utils.py:

  • test_scale_is_alpha_over_rank — verifies scale = alpha/rank
  • test_scale_with_rank_equals_alpha — verifies alpha=rank gives scale=1
  • test_forward_scaling_matches_peft — verifies forward pass matches (alpha/rank) * delta
  • test_default_alpha_rank_gives_2x — verifies default settings give 2x, not 16x

All 8 tests pass.

References

Fixes #845

LoRaLayer used raw `alpha` as the scaling factor instead of `alpha / rank`.
With the default alpha=16, rank=8, this made the LoRA contribution 8x
larger than PEFT, the original LoRA paper, and mlx-lm.

Before: scale = alpha = 16.0
After:  scale = alpha / rank = 2.0

Also fixes replace_lora_with_linear to use the same corrected scale.

Added tests verifying:
- scale = alpha / rank
- Forward pass produces (alpha/rank) * (x @ A @ B)
- Default settings give 2x scaling, not 16x

Fixes Blaizzy#845
Copilot AI review requested due to automatic review settings March 21, 2026 19:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes LoRA scaling in LoRaLayer to follow the standard LoRA convention (alpha / rank) so LoRA updates match PEFT / LoRA paper expectations and reduce unintended amplification.

Changes:

  • Compute and store self.scale = alpha / rank in LoRaLayer and use it in the forward pass.
  • Update LoRA weight-merge logic to use scale when producing the merged delta.
  • Add unit tests validating scaling math and forward-pass contribution.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
mlx_vlm/trainer/lora.py Switches from raw alpha to alpha/rank scaling in forward and merge code.
mlx_vlm/tests/test_trainer_utils.py Adds tests to validate LoRA scaling behavior and expected contribution size.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mlx_vlm/trainer/lora.py
Comment thread mlx_vlm/tests/test_trainer_utils.py Outdated
@Goekdeniz-Guelmez
Copy link
Copy Markdown
Contributor

@kikoncuo LGTM, did you run the test? might have to change due to the missing B=0 test

Verifies that when B is zeros (default init), the LoRA layer output
equals the base linear layer output exactly (no LoRA contribution).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kikoncuo
Copy link
Copy Markdown
Contributor Author

kikoncuo commented Mar 26, 2026

I ran the existing tests, all 8 passed.
I added a B=0 test as well, the LoRA layer just passes through the original linear output unchanged. No LoRA contribution at all cause B starts at zero, so x @ A @ B is always zero regardless of the scaling factor. Confirms the fix doesn't break it.

@Blaizzy
Copy link
Copy Markdown
Owner

Blaizzy commented Mar 27, 2026

@Goekdeniz-Guelmez need your sign off here before merging

@Goekdeniz-Guelmez
Copy link
Copy Markdown
Contributor

I think it would be better to move the new tests inside test_trainer.py file and not create a new test file. Other then that LGTM

Move TestLoRaScaling class from test_trainer_utils.py into
test_trainer.py as suggested in review, and revert
test_trainer_utils.py to its original state.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kikoncuo
Copy link
Copy Markdown
Contributor Author

Test moved!

@Goekdeniz-Guelmez
Copy link
Copy Markdown
Contributor

@Blaizzy I think this can be merged now!

@kikoncuo
Copy link
Copy Markdown
Contributor Author

kikoncuo commented Apr 9, 2026

@Goekdeniz-Guelmez @Blaizzy any progress here? do you need anything from me?

Copy link
Copy Markdown
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Blaizzy Blaizzy merged commit 6b0ad8f into Blaizzy:main Apr 18, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LoRaLayer uses raw alpha instead of alpha/rank — 8x scaling error with default settings

4 participants