feat: Enable LoRA saving only for non MoE linear layers training with kernels. #530
Conversation
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
|
Thanks for making a pull request! 😃 |
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
6b70cc8 to
da81f93
Compare
| model = self.trainer.model | ||
| if hasattr(model, "module"): | ||
| model = model.module |
There was a problem hiding this comment.
is this needed for non lora use case? since after this step it would directly reach the below block
else:
model.config.save_pretrained(hf_converted_output_dir)
There was a problem hiding this comment.
It's needed to determine if there is a peft config for the following if statement, but in fine-tuning case this if statement should be false
|
@willmj thanks for diligently taking my inputs, after addressing my comments, I request you to run all tests related to moe kernels on your local setup and paste screenshots here for record. Thanks, appreciate this PR! |
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
| model, train_args, modifiable_args=(peft_config,) | ||
| ) | ||
| # For LoRa ScatterMoE, if expert layers are included, disable grad | ||
| if peft_config is not None: |
There was a problem hiding this comment.
I think its better to just do an instance check using ScatterMoE and then freeze everything inside.
There was a problem hiding this comment.
maybe you should put a comment that this is a workaround, and in the future where ScatterMoE loras can be tuned, this code needs to be removed.
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
| model, train_args, modifiable_args=(peft_config,) | ||
| ) | ||
| # For LoRa ScatterMoE, if expert layers are included, disable grad | ||
| if peft_config is not None: |
There was a problem hiding this comment.
maybe you should put a comment that this is a workaround, and in the future where ScatterMoE loras can be tuned, this code needs to be removed.
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
… kernels. (#530) * save peft Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * post process hf converted dir Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: convert hf converted checkpoint Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * lora config Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * save adapter config Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: add input linear and output linear to target modules Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: extend instead of append Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: if hasattr peft config Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: remove unneeded target modules Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * test: lora for scattermoe Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * explitcitly don't support router layer Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * docs: update documentation Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: simplify accelerate launch post processing Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * tests: more target modules + ep_degree Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: only restrict all-linear, raise warning for other modules Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: augmentation test Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: raise error Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * turn off requires grad if using scattermoe with lora Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: freeze scattermoe params Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: safer freezing Signed-off-by: Will Johnson <mwjohnson728@gmail.com> --------- Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
… kernels. (foundation-model-stack#530) * save peft Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * post process hf converted dir Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: convert hf converted checkpoint Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * lora config Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * save adapter config Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: add input linear and output linear to target modules Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: extend instead of append Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: if hasattr peft config Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: remove unneeded target modules Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * test: lora for scattermoe Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * explitcitly don't support router layer Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * docs: update documentation Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: simplify accelerate launch post processing Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * tests: more target modules + ep_degree Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: only restrict all-linear, raise warning for other modules Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: augmentation test Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: raise error Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * turn off requires grad if using scattermoe with lora Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: freeze scattermoe params Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: safer freezing Signed-off-by: Will Johnson <mwjohnson728@gmail.com> --------- Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Description of the change
A more limited version of #523 to enable LoRA but explicitly block off router linear and expert layers.
Related issue number
How to verify the PR
Was the PR tested