Skip to content

feat(lora): add LoRA+ support via lora_plus_lr_ratio#2247

Open
ramkrishs wants to merge 2 commits into
Lightning-AI:mainfrom
ramkrishs:feat/lora-plus
Open

feat(lora): add LoRA+ support via lora_plus_lr_ratio#2247
ramkrishs wants to merge 2 commits into
Lightning-AI:mainfrom
ramkrishs:feat/lora-plus

Conversation

@ramkrishs
Copy link
Copy Markdown

Summary

Implements LoRA+ (Hayou et al., ICML 2024) by adding a --lora_plus_lr_ratio parameter to the LoRA finetune script.

Closes #963


What is LoRA+?

LoRA+ proposes applying a higher learning rate to lora_B parameters relative to lora_A. The theoretical motivation: lora_B lies in the output space and benefits from faster adaptation, while lora_A (input projection) should update more conservatively. The paper shows this asymmetry yields:

  • 1–2% accuracy gains on downstream tasks
  • Up to 2× faster convergence
  • No additional compute cost

The recommended ratio is 16.0 for standard fine-tuning.


Usage

litgpt finetune lora \
    --checkpoint_dir checkpoints/meta-llama/Llama-3.1-8B \
    --lora_plus_lr_ratio 16.0

When lora_plus_lr_ratio is not set (default None), behaviour is identical to standard LoRA.


Implementation

litgpt/utils.py — new create_lora_plus_optimizer():

  • Splits trainable parameters into two groups: lora_B at lr × ratio, everything else at base lr
  • Works with any optimizer string or config dict (same interface as instantiate_torch_optimizer)

litgpt/finetune/lora.py:

  • setup() and main() accept lora_plus_lr_ratio: float | None = None
  • When set, calls create_lora_plus_optimizer() instead of instantiate_torch_optimizer()

tests/test_lora_plus.py — 11 unit tests:

  • Two param groups created with correct LR ratio
  • lora_B in high-LR group, lora_A in base-LR group
  • All trainable params covered, no duplicates
  • lora_B updates faster than lora_A per step (verifies the asymmetry works)
  • Edge cases: no lora_B params, no trainable params, ratio=1.0

Reference

Hayou et al., LoRA+: Efficient Low Rank Adaptation of Large Models, ICML 2024
https://arxiv.org/abs/2402.12354

This PR was developed with AI assistance. All code has been reviewed, tested, and verified to follow LitGPT conventions.

Implements LoRA+ (Hayou et al., 2024, arXiv:2402.12354) which applies a
higher learning rate to lora_B parameters relative to lora_A. The paper
shows this asymmetry improves feature learning efficiency, yielding 1-2%
accuracy gains and up to 2× faster convergence at no extra compute cost.

Closes Lightning-AI#963

Usage:
    litgpt finetune lora \
        --checkpoint_dir checkpoints/meta-llama/Llama-3.1-8B \
        --lora_plus_lr_ratio 16.0

New parameter:
  --lora_plus_lr_ratio FLOAT   Multiplier applied to lora_B learning rate
                                relative to lora_A and other params.
                                Paper recommends 16.0. Default: None (standard LoRA).

Implementation:
  litgpt/utils.py            — create_lora_plus_optimizer() splits trainable
                               params into two groups: lora_B at lr * ratio,
                               everything else at base lr
  litgpt/finetune/lora.py    — setup() and main() accept lora_plus_lr_ratio;
                               optimizer is created via create_lora_plus_optimizer
                               when ratio is set, instantiate_torch_optimizer otherwise
  tests/test_lora_plus.py    — 11 unit tests

Reference: Hayou et al., LoRA+: Efficient Low Rank Adaptation of Large Models,
           ICML 2024, https://arxiv.org/abs/2402.12354

Signed-off-by: Ramakrishnan Sathyavageeswaran <ramkrishs@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LoRA+

1 participant