Tasks
Feature request
Would be good to bump transformers to at least 4.51, to support SOTA models.
Motivation
I was planning to test my methods on some recent models like gemma3 or qwen3, but transformers 4.45 doesn't support them. In the long-term, this problem of freezing transformers at this old version will only grow.
Implementation
I see a potential issue could be this monkey patch.
We could either update this code of prediction_step to be in line with the version in new transformers, or make a more future-proof fix. For example instead of modifying whole prediction_step, in UnlearnTrainer we could do:
def compute_loss(self, model, inputs, **kwargs):
if model.training:
return self.compute_unlearn_loss(model, inputs, **kwargs)
else:
return super().compute_loss(model, inputs, **kwargs)
And have the unlearning trainers define compute_unlearn_loss instead.
(Also note the change of compute_loss signature #155. Should compute_loss accept the new argument num_items_in_batch and just ignore it?)
Let me know whether you'd prefer the first or the second solution. (Personally, I think the future-proof one is better.) You can then assign me to this issue and I'll make a PR.
Tasks
Feature request
Would be good to bump transformers to at least
4.51, to support SOTA models.Motivation
I was planning to test my methods on some recent models like gemma3 or qwen3, but transformers 4.45 doesn't support them. In the long-term, this problem of freezing transformers at this old version will only grow.
Implementation
I see a potential issue could be this monkey patch.
We could either update this code of prediction_step to be in line with the version in new transformers, or make a more future-proof fix. For example instead of modifying whole prediction_step, in UnlearnTrainer we could do:
And have the unlearning trainers define compute_unlearn_loss instead.
(Also note the change of compute_loss signature #155. Should compute_loss accept the new argument
num_items_in_batchand just ignore it?)Let me know whether you'd prefer the first or the second solution. (Personally, I think the future-proof one is better.) You can then assign me to this issue and I'll make a PR.