feat(lora): add LoRA+ support via lora_plus_lr_ratio#2247
Open
ramkrishs wants to merge 2 commits into
Open
Conversation
Implements LoRA+ (Hayou et al., 2024, arXiv:2402.12354) which applies a higher learning rate to lora_B parameters relative to lora_A. The paper shows this asymmetry improves feature learning efficiency, yielding 1-2% accuracy gains and up to 2× faster convergence at no extra compute cost. Closes Lightning-AI#963 Usage: litgpt finetune lora \ --checkpoint_dir checkpoints/meta-llama/Llama-3.1-8B \ --lora_plus_lr_ratio 16.0 New parameter: --lora_plus_lr_ratio FLOAT Multiplier applied to lora_B learning rate relative to lora_A and other params. Paper recommends 16.0. Default: None (standard LoRA). Implementation: litgpt/utils.py — create_lora_plus_optimizer() splits trainable params into two groups: lora_B at lr * ratio, everything else at base lr litgpt/finetune/lora.py — setup() and main() accept lora_plus_lr_ratio; optimizer is created via create_lora_plus_optimizer when ratio is set, instantiate_torch_optimizer otherwise tests/test_lora_plus.py — 11 unit tests Reference: Hayou et al., LoRA+: Efficient Low Rank Adaptation of Large Models, ICML 2024, https://arxiv.org/abs/2402.12354 Signed-off-by: Ramakrishnan Sathyavageeswaran <ramkrishs@outlook.com>
for more information, see https://pre-commit.ci
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements LoRA+ (Hayou et al., ICML 2024) by adding a
--lora_plus_lr_ratioparameter to the LoRA finetune script.Closes #963
What is LoRA+?
LoRA+ proposes applying a higher learning rate to lora_B parameters relative to lora_A. The theoretical motivation: lora_B lies in the output space and benefits from faster adaptation, while lora_A (input projection) should update more conservatively. The paper shows this asymmetry yields:
The recommended ratio is 16.0 for standard fine-tuning.
Usage
litgpt finetune lora \ --checkpoint_dir checkpoints/meta-llama/Llama-3.1-8B \ --lora_plus_lr_ratio 16.0When
lora_plus_lr_ratiois not set (defaultNone), behaviour is identical to standard LoRA.Implementation
litgpt/utils.py— newcreate_lora_plus_optimizer():lora_Batlr × ratio, everything else at baselrinstantiate_torch_optimizer)litgpt/finetune/lora.py:setup()andmain()acceptlora_plus_lr_ratio: float | None = Nonecreate_lora_plus_optimizer()instead ofinstantiate_torch_optimizer()tests/test_lora_plus.py— 11 unit tests:Reference
Hayou et al., LoRA+: Efficient Low Rank Adaptation of Large Models, ICML 2024
https://arxiv.org/abs/2402.12354