Skip to content

fix: also patch model.generate() to inject cached pixel_values#254

Merged
abrichr merged 1 commit intomainfrom
fix/patch-generate-too
Mar 29, 2026
Merged

fix: also patch model.generate() to inject cached pixel_values#254
abrichr merged 1 commit intomainfrom
fix/patch-generate-too

Conversation

@abrichr
Copy link
Copy Markdown
Member

@abrichr abrichr commented Mar 29, 2026

Patches both forward() and generate() on the model instance. TRL calls generate(input_ids=...) without pixel_values — the generate patch injects them from cache.

forward() patch handles training logprob recomputation, but TRL also
calls model.generate(input_ids=...) without pixel_values. HF's
generate() uses prepare_inputs_for_generation() which builds a fresh
kwargs dict — cached pixel_values in forward() aren't enough because
generate() needs them at the top level to pass them through.

Now patches BOTH forward() and generate() on the model instance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abrichr abrichr merged commit 9612019 into main Mar 29, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant