feat(lora): add LoRA+ support via lora_plus_lr_ratio by ramkrishs · Pull Request #2247 · Lightning-AI/litgpt

ramkrishs · 2026-05-02T02:31:39Z

Summary

Implements LoRA+ (Hayou et al., ICML 2024) by adding a --lora_plus_lr_ratio parameter to the LoRA finetune script.

Closes #963

What is LoRA+?

LoRA+ proposes applying a higher learning rate to lora_B parameters relative to lora_A. The theoretical motivation: lora_B lies in the output space and benefits from faster adaptation, while lora_A (input projection) should update more conservatively. The paper shows this asymmetry yields:

1–2% accuracy gains on downstream tasks
Up to 2× faster convergence
No additional compute cost

The recommended ratio is 16.0 for standard fine-tuning.

Usage

litgpt finetune lora \
    --checkpoint_dir checkpoints/meta-llama/Llama-3.1-8B \
    --lora_plus_lr_ratio 16.0

When lora_plus_lr_ratio is not set (default None), behaviour is identical to standard LoRA.

Implementation

litgpt/utils.py — new create_lora_plus_optimizer():

Splits trainable parameters into two groups: lora_B at lr × ratio, everything else at base lr
Works with any optimizer string or config dict (same interface as instantiate_torch_optimizer)

litgpt/finetune/lora.py:

setup() and main() accept lora_plus_lr_ratio: float | None = None
When set, calls create_lora_plus_optimizer() instead of instantiate_torch_optimizer()

tests/test_lora_plus.py — 11 unit tests:

Two param groups created with correct LR ratio
lora_B in high-LR group, lora_A in base-LR group
All trainable params covered, no duplicates
lora_B updates faster than lora_A per step (verifies the asymmetry works)
Edge cases: no lora_B params, no trainable params, ratio=1.0

Reference

Hayou et al., LoRA+: Efficient Low Rank Adaptation of Large Models, ICML 2024
https://arxiv.org/abs/2402.12354

This PR was developed with AI assistance. All code has been reviewed, tested, and verified to follow LitGPT conventions.

Implements LoRA+ (Hayou et al., 2024, arXiv:2402.12354) which applies a higher learning rate to lora_B parameters relative to lora_A. The paper shows this asymmetry improves feature learning efficiency, yielding 1-2% accuracy gains and up to 2× faster convergence at no extra compute cost. Closes Lightning-AI#963 Usage: litgpt finetune lora \ --checkpoint_dir checkpoints/meta-llama/Llama-3.1-8B \ --lora_plus_lr_ratio 16.0 New parameter: --lora_plus_lr_ratio FLOAT Multiplier applied to lora_B learning rate relative to lora_A and other params. Paper recommends 16.0. Default: None (standard LoRA). Implementation: litgpt/utils.py — create_lora_plus_optimizer() splits trainable params into two groups: lora_B at lr * ratio, everything else at base lr litgpt/finetune/lora.py — setup() and main() accept lora_plus_lr_ratio; optimizer is created via create_lora_plus_optimizer when ratio is set, instantiate_torch_optimizer otherwise tests/test_lora_plus.py — 11 unit tests Reference: Hayou et al., LoRA+: Efficient Low Rank Adaptation of Large Models, ICML 2024, https://arxiv.org/abs/2402.12354 Signed-off-by: Ramakrishnan Sathyavageeswaran <ramkrishs@outlook.com>

for more information, see https://pre-commit.ci

ramkrishs requested review from andyland, k223kim, lianakoleva and t-vi as code owners May 2, 2026 02:31

[pre-commit.ci] auto fixes from pre-commit.com hooks

ed75353

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lora): add LoRA+ support via lora_plus_lr_ratio#2247

feat(lora): add LoRA+ support via lora_plus_lr_ratio#2247
ramkrishs wants to merge 2 commits into
Lightning-AI:mainfrom
ramkrishs:feat/lora-plus

ramkrishs commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ramkrishs commented May 2, 2026

Summary

What is LoRA+?

Usage

Implementation

Reference

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant