Fix OFT Conv2d weight reshaping for non-square kernels#3150
Fix OFT Conv2d weight reshaping for non-square kernels#3150Chessing234 wants to merge 2 commits intohuggingface:mainfrom
Conversation
All weight reshape operations in the OFT Conv2d layer use kernel_size[0] * kernel_size[0], squaring the height dimension instead of computing height * width. This gives wrong filter dimensions for non-square kernels (e.g. 3x1, 1x7), causing shape mismatches in forward/merge/unmerge. It works by accident for square kernels. Note: the same pattern exists in boft/layer.py and hra/layer.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@Chessing234 Thanks for the PR. Could you please extend the unit tests to cover this case? I.e. add a test case that fails with the current main branch but is fixed with your branch? It should be fine to add a one-off test for this in
This would be really appreciated, better to have that in one PR than multiple smaller ones. The unit test can be parametrized to cover these PEFT methods too. |
|
Added a standalone test |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for adding the test, but it's in the wrong place. Please check the comment. You can ensure that the test passes by running:
pytest tests/test_custom_models.py -k test_oft_conv2d_non_square_kernel
Also, as mentioned above, it would be much better to have the BOFT and HRA fixes in the same PR, so please add those as well.
| "base_model.model.lin0.oft_R.adapter1.weight", | ||
| ) | ||
|
|
||
| def test_oft_conv2d_non_square_kernel(self): |
There was a problem hiding this comment.
This test is placed in the wrong place. Please place it at the end of TestPeftCustomModel.
| peft_model = get_peft_model(model, config) | ||
|
|
||
| X = torch.arange(5 * 5 * 3 * 5, dtype=torch.float, device=self.torch_device).reshape(5, 5, 3, 5) | ||
| output = peft_model(X) |
There was a problem hiding this comment.
Let's add a comment: "# Ensure that the forward pass does not raise". This is what we actually want to test. Equality of non-merged vs merged is nice to test too, but not the critical issue.
Summary
The OFT
Conv2dlayer computes the filter dimension and reshapes weights usingkernel_size[0] * kernel_size[0], squaring the kernel height instead of computing height × width (kernel_size[0] * kernel_size[1]). Similarly, the 4D reshape useskernel_size[0], kernel_size[0]instead ofkernel_size[0], kernel_size[1].This works by accident for square kernels (3×3, 5×5, etc.) but produces wrong dimensions for any asymmetric kernel (e.g., 3×1, 1×7, 3×5), causing
RuntimeError: shape mismatchduring forward/merge/unmerge.Fix: Replace all 5 occurrences of the second
kernel_size[0]withkernel_size[1]inoft/layer.py.Note: The same pattern exists in
boft/layer.pyandhra/layer.py. Happy to extend this fix to those files if desired.Test plan
nn.Conv2d(3, 16, (3, 1))) should now work🤖 Generated with Claude Code