Skip to content

fix: handle internvl_hf video-only inputs and enable frame sampling#1279

Merged
kcz358 merged 2 commits intoEvolvingLMMs-Lab:mainfrom
akawincent:fix/1241_1242_internvl_hf
Apr 9, 2026
Merged

fix: handle internvl_hf video-only inputs and enable frame sampling#1279
kcz358 merged 2 commits intoEvolvingLMMs-Lab:mainfrom
akawincent:fix/1241_1242_internvl_hf

Conversation

@akawincent
Copy link
Copy Markdown
Contributor

Summary

This PR fixes two internvl_hf video handling bugs in lmms_eval/models/chat/internvl_hf.py:

  • explicitly enables frame sampling when num_frames or fps is configured
  • normalizes empty image inputs to None for video-only examples
  • avoids building image_sizes from an empty image list

Why

This PR is motivated by two related internvl_hf bugs reported upstream:

Together, these two issues make internvl_hf unreliable on video tasks:
video-only samples can raise an IndexError, and sampled-video runs can still process all frames and overflow the model context length.

Closes #1241
Closes #1242

Validation

  • pre-commit run --files lmms_eval/models/chat/internvl_hf.py
  • python -m py_compile lmms_eval/models/chat/internvl_hf.py

@akawincent akawincent force-pushed the fix/1241_1242_internvl_hf branch from 1835a30 to 24b7ea9 Compare March 28, 2026 06:49
@akawincent
Copy link
Copy Markdown
Contributor Author

Hi @Luodian!

Need for review and tests.
InternVL is a widely used model, so I think this issue should be resolved as soon as possible.

@mwxely mwxely requested review from kcz358 and mwxely April 9, 2026 07:25
Copy link
Copy Markdown
Collaborator

@mwxely mwxely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean and well-scoped fix. Two minor points to consider:

  1. Implicit behavioral change: Since num_frames defaults to 32 in __init__, setting do_sample_frames=True means frame sampling will now actually take effect by default. Previously num_frames=32 was silently ignored (since do_sample_frames was never set). This is arguably the correct behavior, but worth noting in the PR description as it changes the default video processing pipeline for all internvl_hf users on video tasks.

  2. Validation question: Did you run an end-to-end video eval (e.g., a small subset of VideoMME or similar) to confirm the fix works beyond py_compile and pre-commit? A quick --limit 4 smoke test would be reassuring.

Otherwise LGTM — the visuals = None guard and image_sizes fix are correct and consistent with existing patterns in the file. cc @kcz358

@kcz358 kcz358 merged commit 037f12b into EvolvingLMMs-Lab:main Apr 9, 2026
3 checks passed
@akawincent
Copy link
Copy Markdown
Contributor Author

  1. Validation question: Did you run an end-to-end video eval (e.g., a small subset of VideoMME or similar) to confirm the fix works beyond py_compile and pre-commit? A quick --limit 4 smoke test would be reassuring.

@mwxely Hello,

Yes, I did. Actually, I ran it with --limit to test whether the code works, but I didn't run a full benchmark. To be rigorous, a better testing would be to change the num_frames and observe changes in the final metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Internvl_hf: num_frames not applied to videos [Bug] internvl_hf: IndexError when visuals is empty list (video-only inputs)

3 participants