feat(adalora): add SVDConv2d to support torch.nn.Conv2d target modules#3196
feat(adalora): add SVDConv2d to support torch.nn.Conv2d target modules#3196BenjaminBossan merged 12 commits intohuggingface:mainfrom
Conversation
|
Thanks for the PR. I haven't checked the details yet, but could you please extend the test matrix in |
Extends the parametrized test matrix with AdaLoRA + MLP and AdaLoRA + Conv2d entries, exercising SVDConv2d support added in this PR. Also adds basic MLP entries so AdaLoRA appears in ALL_PEFT_CONFIG_CLASSES and participates in the full shared test suite. Requested by @BenjaminBossan.
|
@BenjaminBossan Updated! Here's what I've added: Test matrix extension ( ###########
# AdaLoRA #
###########
("Vanilla MLP 1 AdaLoRA", "MLP", AdaLoraConfig,
{"target_modules": ["lin0"], "init_lora_weights": False, "inference_mode": True, "total_step": 1}),
("Vanilla MLP 2 AdaLoRA", "MLP", AdaLoraConfig,
{"target_modules": ["lin0", "lin1"], "init_lora_weights": False, "inference_mode": True, "total_step": 1}),
("Conv2d 1 AdaLoRA", "Conv2d", AdaLoraConfig,
{"target_modules": ["conv2d"], "init_lora_weights": False, "inference_mode": True, "total_step": 1}),
("Conv2d 2 AdaLoRA", "Conv2d", AdaLoraConfig,
{"target_modules": ["conv2d", "lin0"], "init_lora_weights": False, "inference_mode": True, "total_step": 1}),
Regarding |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Generally, the PR looks good, thanks! But the ruff check is failing. Please ensure that make style passes.
|
Pushed style fix — added trailing newlines to the three adalora files (W292). |
|
@Anai-Guo ruff still complains, could you please check again? Possibly, you need to match the ruff version with the one from CI, v0.12.12. |
|
Thanks for fixing quality check. The CI shows a few tests failing, could you please check? Some of the failures ( |
… drop inference_mode from tests - SVDConv2d.forward now uses in-place add (`result +=`) so result keeps its base layer dtype, mirroring SVDLinear; previously the out-of-place add upcast result to lora_A.dtype, breaking subsequent fp16/bf16 ops in test_forward_float16/bfloat16. - ranknum is deterministic (always float(r)) and is not saved in the adapter state_dict, so it must not be placed on meta under low_cpu_mem_usage=True. Wrap its init in _skip_init_on_device() to keep it on the real device, fixing test_load_model_low_cpu_mem_usage. - Drop inference_mode=True from the new AdaLora TEST_CASES entries: the shared test matrix exercises training/inference_safetensors/only_params which all call loss.backward(); inference_mode disabled gradients, causing the runtime errors in CI. total_step=1 still prevents schedule pruning during the single forward step.
|
@BenjaminBossan Pushed three fixes: 1. 2. 3. The |
|
@Anai-Guo Thanks for the updates of the tests, your changes look reasonable. Unfortunately, some AdaLoRA tests are still failing. It's possible that those failures are not directly related to your PR and only caused by adding AdaLoRA to the test matrix (it was missing before). Still, it would be great if you could take a look and see if the failures can be fixed. I could tell some failures are because AdaLoRA only supports one adapter at a time. Feel free to add a skip to those tests that require multiple adapters. Also, you can run the tests locally with: |
…est matrix with single-adapter restriction * extend _skip_init_on_device to wrap _move_adapter_to_device_of_base_layer in AdaLoraLayer / SVDConv2d update_layer; otherwise re-registering ranknum via Module.__setattr__ inside init_empty_weights sends it back to meta and the saved state_dict can never restore it. * tests: zero lora_E so AdaLoRA is identity at init in test_disable_adapters and test_disable_adapters_with_merging (mirroring the VBLoRA pattern); use a smaller lr to avoid divergence on multi-target SGD; skip the single-adapter AdaLoRA helper in tests that require multiple trainable adapters or that build configs without total_step.
|
@BenjaminBossan Pushed two more fixes addressing the remaining AdaLoRA test failures: 1. 2. Test matrix alignment (test_custom_models.py):
Verified locally: all 152 AdaLoRA test cases now pass (40 skipped for the cases above), and the |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks a lot for your continued work on making the AdaLoRA tests pass. Regarding one of your fixes, I have a different suggestion, please check my comment.
| pytest.skip("AdaLoRA only supports a single trainable adapter") | ||
|
|
||
|
|
||
| def _skip_if_adalora_without_total_step(config_cls): |
There was a problem hiding this comment.
Let's not skip all those tests just because a single argument is missing. Instead, let's add the missing argument. Check e.g. here, where we add an argument for IA³, the same idea should work for AdaLoRA:
peft/tests/test_custom_models.py
Lines 5882 to 5883 in 9a20e07
…a_kwargs[total_step]=1 Per BenjaminBossan's review on PR huggingface#3196: rather than skipping AdaLoRA entirely when a test parametrizes over ALL_PEFT_CONFIG_CLASSES, follow the IA3 pattern and pass the missing kwarg in the test (extra_kwargs/config_kwargs).
|
@BenjaminBossan Good point — switched to your suggested approach. Pushed ebaffb3:
Net diff: -199 bytes. AdaLoRA now actually runs through these tests instead of being skipped. |
|
Fantastic, thanks. Could you please merge with/rebase on the latest main, that should allow CI to be green again. |
|
Done — merged latest main (3e8a7af). CI should now run on the updated branch. Thanks @BenjaminBossan! |
AdaLoraModel allows only a single trainable adapter, so 6 multi-adapter tests cannot run for it; previously they only added total_step=1 and then hit ValueError. Skip them with a clear reason instead. For test_loading_model_*_requires_grad_set_correctly with is_trainable=True, AdaLoRA's ranknum parameter is keyed by adapter name (.default) and intentionally has requires_grad=False, which breaks the assert that all .default params are trainable. Skip those parametrizations as well.
|
Pushed 55f9fbb to address the test failures from the previous run. Root cause: my earlier patch added Fix: skip those 6 tests for
Plus 2 separate failures in 🤖 Generated with Claude Code |
|
Thanks for the updates, your explanations make sense. However, for the |
Per reviewer suggestion: instead of pytest.skip-ing the whole AdaLoRA parametrization in test_loading_model_requires_grad_set_correctly and test_loading_model_with_modules_to_save_requires_grad_set_correctly, filter out 'ranknum' parameters when iterating named_parameters(). This keeps the rest of AdaLoRA's params under test (the requires_grad assertions still run) while accommodating the intentional ranknum.default requires_grad=False.
|
Pushed 1a81613 to address the review feedback. Replaced the Thanks @BenjaminBossan! |
|
Pushed d4cd553 to fix the remaining 2 AdaLoRA test failures. Root cause (post-1a81613): Fix: For |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for adding support for Conv2d to AdaLoRA, PR LGTM.
Problem
AdaLoRA currently raises a
ValueErrorwhen targetingtorch.nn.Conv2dmodules:This blocks users who want to fine-tune CNNs (e.g. ResNet) with AdaLoRA and compare it against standard LoRA on the same convolutional layers.
Closes #3193
Solution
Add a new
SVDConv2dclass that applies SVD-based rank adaptation to Conv2d layers, following the same pattern asSVDLinear.Design:
(C_out, C_in, kH, kW)is treated as a 2D matrix of shape(C_out, C_in·kH·kW)for the SVD factorisation — consistent with how LoRA handles Conv2dlora_A: right singular vectors(r, C_in·kH·kW)lora_E: singular values(r, 1)lora_B: left singular vectors(C_out, r)delta_wand applies it viann.functional.conv2d, preserving the original layer'sstride,padding,dilation, andgroupsdelta_wback to(C_out, C_in, kH, kW)and adds/subtracts frombase_layer.weightRankAllocatorworks without modification:lora_A.size(0) == rfor both Linear and Conv2dChanges:
layer.py: addSVDConv2d(overridesupdate_layerto accommodateC_in·kH·kWfeature dimension)model.py: routetorch.nn.Conv2dtargets toSVDConv2din_create_new_module__init__.py: exportSVDConv2dQuick test
🤖 Generated with Claude Code