MiSS update#3194
Conversation
|
I use this model 'unsloth/Llama-3.2-3B' — maybe this is wrong. |
|
So IIUC, this change should not influence the results in any way, it's just a change for better readability. Therefore, we should expect the results to be identical. To test this, don't change the base model: We always want to use the same one or else results are not comparable. Instead, run one of the existing experiments and then re-run the same experiment but with your changes applied on top: |
Sorry, I don't have permission/license for Llama-3.2-3B. |
Ouch, I thought you pretty much get auto permission if you request. I can't check right now, but I'll check next week and let you know if I see any difference. LMK if there is any setting in particular that I should test. Meanwhile, please revert the change to the default training params. If you want to test a different model, you can always create a new experiment, e.g. |
|
I've added the code for converting MIss to LoRA. Please take a look. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for the updated. I ran the MetaMathQA benchmark on my machine with the main branch and with your changes using python run.py -v experiments/miss/llama-3.2-3B-default/ (i.e. default MiSS setting). Train train loss is basically identical:
Max memory for both were identical, train times were 833 vs 856 sec, which is reasonably close. So I think overall, this shows that results stay the same. LMK if I should test something else.
As for the conversion, thanks a lot for adding the MiSS-specific path. I converted the trained MiSS model from the benchmark to LoRA using a relatively small rank of 32 and it got a test accuracy of 50.3%, so basically the same as the MiSS adapter. That's quite a nice result.
Since there is this special MiSS conversion path now, we should add a unit test for that. The easiest way should be to take this LoKr test, copy it, and replace the lokr_model with a MiSS model.
done |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@Joluck Thanks for adding the tests. Ruff is complaining about the use of the variable name |
What name do you think would be suitable? |
Doesn't really matter to me, perhaps |
done |
|
@Joluck Could you please run |
|
Ruff is still complaining, the required changes are: modified src/peft/tuners/lora/conversion.py
@@ -51,8 +51,8 @@ def _convert_miss_module_to_lora(
) -> tuple[torch.Tensor, torch.Tensor, int]:
"""Convert a single MiSS layer to LoRA A and B matrices.
- For standard and mini modes, the MiSS forward pass (reshape+sum @ miss) is already a rank-r
- factorization, so the exact factors are returned directly without SVD.
+ For standard and mini modes, the MiSS forward pass (reshape+sum @ miss) is already a rank-r factorization, so the
+ exact factors are returned directly without SVD.
For bat mode, the delta weight depends on the base weight, so SVD is used.
"""
modified src/peft/tuners/miss/layer.py
@@ -313,8 +313,12 @@ class MissLinear(nn.Module, MissLayer):
aligned_size = n_blocks * r
W_aligned = orig_weight[:, :aligned_size].reshape(-1, n_blocks, r).permute(1, 2, 0)
- orig_weight[:, :aligned_size] = (W_aligned + sign * miss_B).permute(2, 0, 1).reshape(*orig_weight[:, :aligned_size].shape)
- orig_weight[:, aligned_size:] = orig_weight[:, aligned_size:] + sign * miss_B.transpose(0, 1)[:, :remainder]
+ orig_weight[:, :aligned_size] = (
+ (W_aligned + sign * miss_B).permute(2, 0, 1).reshape(*orig_weight[:, :aligned_size].shape)
+ )
+ orig_weight[:, aligned_size:] = (
+ orig_weight[:, aligned_size:] + sign * miss_B.transpose(0, 1)[:, :remainder]
+ )
output_tensor = orig_weight
else:
W_blocks = orig_weight.reshape(-1, orig_weight.size(1) // r, r).permute(1, 2, 0) |
|
I don't know why there are so many updates when I use make style. Does it require a specific version? |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for updating MiSS and adding the LoRA conversion code, LGTM.

The optimized writing improves readability, but when I tested it using
method_comparison, the results were incorrect. Could you help me test it?