Skip to content

MiSS update#3194

Merged
BenjaminBossan merged 12 commits intohuggingface:mainfrom
Joluck:main
May 4, 2026
Merged

MiSS update#3194
BenjaminBossan merged 12 commits intohuggingface:mainfrom
Joluck:main

Conversation

@Joluck
Copy link
Copy Markdown
Contributor

@Joluck Joluck commented Apr 24, 2026

The optimized writing improves readability, but when I tested it using method_comparison, the results were incorrect. Could you help me test it?

@Joluck
Copy link
Copy Markdown
Contributor Author

Joluck commented Apr 24, 2026

I use this model 'unsloth/Llama-3.2-3B' — maybe this is wrong.

@BenjaminBossan
Copy link
Copy Markdown
Member

So IIUC, this change should not influence the results in any way, it's just a change for better readability. Therefore, we should expect the results to be identical. To test this, don't change the base model: We always want to use the same one or else results are not comparable. Instead, run one of the existing experiments and then re-run the same experiment but with your changes applied on top:

python run.py -v experiments/miss/llama-3.2-3B-default
# checkout your branch
python run.py -v experiments/miss/llama-3.2-3B-default

@Joluck
Copy link
Copy Markdown
Contributor Author

Joluck commented Apr 24, 2026

So IIUC, this change should not influence the results in any way, it's just a change for better readability. Therefore, we should expect the results to be identical. To test this, don't change the base model: We always want to use the same one or else results are not comparable. Instead, run one of the existing experiments and then re-run the same experiment but with your changes applied on top:

python run.py -v experiments/miss/llama-3.2-3B-default
# checkout your branch
python run.py -v experiments/miss/llama-3.2-3B-default

Sorry, I don't have permission/license for Llama-3.2-3B.

@BenjaminBossan
Copy link
Copy Markdown
Member

Sorry, I don't have permission/license for Llama-3.2-3B.

Ouch, I thought you pretty much get auto permission if you request. I can't check right now, but I'll check next week and let you know if I see any difference. LMK if there is any setting in particular that I should test.

Meanwhile, please revert the change to the default training params. If you want to test a different model, you can always create a new experiment, e.g. method_comparison/MetaMathQA/experiments/miss/unsloth-llama-3.2-3B-default/. Put the adapter_config.json in there and then create a training_params.json. This allows to override the defaults, so you can just add {"model_id": "unsloth/Llama-3.2-3B"} there.

@Joluck
Copy link
Copy Markdown
Contributor Author

Joluck commented Apr 27, 2026

I've added the code for converting MIss to LoRA. Please take a look.

Copy link
Copy Markdown
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updated. I ran the MetaMathQA benchmark on my machine with the main branch and with your changes using python run.py -v experiments/miss/llama-3.2-3B-default/ (i.e. default MiSS setting). Train train loss is basically identical:

Image

Max memory for both were identical, train times were 833 vs 856 sec, which is reasonably close. So I think overall, this shows that results stay the same. LMK if I should test something else.

As for the conversion, thanks a lot for adding the MiSS-specific path. I converted the trained MiSS model from the benchmark to LoRA using a relatively small rank of 32 and it got a test accuracy of 50.3%, so basically the same as the MiSS adapter. That's quite a nice result.

Since there is this special MiSS conversion path now, we should add a unit test for that. The easiest way should be to take this LoKr test, copy it, and replace the lokr_model with a MiSS model.

@Joluck
Copy link
Copy Markdown
Contributor Author

Joluck commented Apr 28, 2026

Thanks for the updated. I ran the MetaMathQA benchmark on my machine with the main branch and with your changes using python run.py -v experiments/miss/llama-3.2-3B-default/ (i.e. default MiSS setting). Train train loss is basically identical:

Image Max memory for both were identical, train times were 833 vs 856 sec, which is reasonably close. So I think overall, this shows that results stay the same. LMK if I should test something else.

As for the conversion, thanks a lot for adding the MiSS-specific path. I converted the trained MiSS model from the benchmark to LoRA using a relatively small rank of 32 and it got a test accuracy of 50.3%, so basically the same as the MiSS adapter. That's quite a nice result.

Since there is this special MiSS conversion path now, we should add a unit test for that. The easiest way should be to take this LoKr test, copy it, and replace the lokr_model with a MiSS model.

done

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Copy Markdown
Member

@Joluck Thanks for adding the tests. Ruff is complaining about the use of the variable name I. It's a bit annoying, but let's just use a different name.

@Joluck
Copy link
Copy Markdown
Contributor Author

Joluck commented Apr 28, 2026

@Joluck Thanks for adding the tests. Ruff is complaining about the use of the variable name I. It's a bit annoying, but let's just use a different name.

What name do you think would be suitable?

@BenjaminBossan
Copy link
Copy Markdown
Member

What name do you think would be suitable?

Doesn't really matter to me, perhaps eye :-]

@Joluck
Copy link
Copy Markdown
Contributor Author

Joluck commented Apr 29, 2026

What name do you think would be suitable?

Doesn't really matter to me, perhaps eye :-]

done

@BenjaminBossan
Copy link
Copy Markdown
Member

@Joluck Could you please run make style?

@BenjaminBossan
Copy link
Copy Markdown
Member

Ruff is still complaining, the required changes are:

modified   src/peft/tuners/lora/conversion.py
@@ -51,8 +51,8 @@ def _convert_miss_module_to_lora(
 ) -> tuple[torch.Tensor, torch.Tensor, int]:
     """Convert a single MiSS layer to LoRA A and B matrices.
 
-    For standard and mini modes, the MiSS forward pass (reshape+sum @ miss) is already a rank-r
-    factorization, so the exact factors are returned directly without SVD.
+    For standard and mini modes, the MiSS forward pass (reshape+sum @ miss) is already a rank-r factorization, so the
+    exact factors are returned directly without SVD.
 
     For bat mode, the delta weight depends on the base weight, so SVD is used.
     """
modified   src/peft/tuners/miss/layer.py
@@ -313,8 +313,12 @@ class MissLinear(nn.Module, MissLayer):
             aligned_size = n_blocks * r
 
             W_aligned = orig_weight[:, :aligned_size].reshape(-1, n_blocks, r).permute(1, 2, 0)
-            orig_weight[:, :aligned_size] = (W_aligned + sign * miss_B).permute(2, 0, 1).reshape(*orig_weight[:, :aligned_size].shape)
-            orig_weight[:, aligned_size:] = orig_weight[:, aligned_size:] + sign * miss_B.transpose(0, 1)[:, :remainder]
+            orig_weight[:, :aligned_size] = (
+                (W_aligned + sign * miss_B).permute(2, 0, 1).reshape(*orig_weight[:, :aligned_size].shape)
+            )
+            orig_weight[:, aligned_size:] = (
+                orig_weight[:, aligned_size:] + sign * miss_B.transpose(0, 1)[:, :remainder]
+            )
             output_tensor = orig_weight
         else:
             W_blocks = orig_weight.reshape(-1, orig_weight.size(1) // r, r).permute(1, 2, 0)

@Joluck
Copy link
Copy Markdown
Contributor Author

Joluck commented May 1, 2026

I don't know why there are so many updates when I use make style. Does it require a specific version?

Copy link
Copy Markdown
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating MiSS and adding the LoRA conversion code, LGTM.

@BenjaminBossan BenjaminBossan merged commit 4050ef5 into huggingface:main May 4, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants