add HyperClovaX Vision by jp1924 · Pull Request #44314 · huggingface/transformers

jp1924 · 2026-02-27T02:01:28Z

What does this PR do?

Hello, Transformers team!

I submitted a PR to add naver-hyperclovax/HyperCLOVAX-SEED-Think-32B (hereafter HCX), developed by the Korean IT company Naver while executing the government's national AI model project.

The HCX code was written based on Transformer 4.52.4, leading to the following issues:

Being based on an outdated Transformer model prevents the application of the latest training optimization techniques supported by Transformer 5.0.0 (e.g., sequence parallelism).
The use of some deprecated code or features may cause unexpected bugs in the latest Transformer version.
The modeling code was overly complex, reducing debugging and development convenience. Additionally, experimental code used during model creation remained untouched.

Moving to Transformer 5.0.0 significantly improved the readability and development convenience of the modeling code. We aim to leverage this to add the HCX model to transformers.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@zucchini-nlp @yonigozlan @molbap

jp1924 · 2026-02-27T02:06:37Z

When I ran the test in my environment, hcx test ran normally without any failures.
The process for adding a model will likely proceed like this:

First, fix all the comments you guys added in the modeling part.
(Optional) Add modular code.
Once the modeling code work is nearly complete, write the docstrings.
Then apply the style.

I think the work will proceed in this manner?

jp1924 · 2026-02-27T07:33:48Z

Hmm... In my environment, torch_compilable_check passes normally, but it fails in this test. I think I need to look into this issue further.

…iteration and using cache

…tion file

…mapping

…orConditionalGeneration

…d processor classes

…template application

…ing in HyperClovaXProcessor

…on classes

ArthurZucker · 2026-03-02T09:19:45Z

Yep don't worry we'll fix this one on main!

zucchini-nlp

Hey @jp1924, thanks a lot for the PR!

I think we need to move all the code to modular file first, since there is a lot of copied stuff from other models. I left you comments below about where exactly can copy from. Ping me for another review when the modular is ready

…clarity

…deo processor classes.

… to HCXVisionConfig and remove unnecessary code.

vasqu

A few comments I think we can improve on especially re modular. But I don't think it's anything major

I responded on the comment re model type as well btw, it's a bit hard because not being able to use auto is kind of a big deal

vasqu · 2026-05-19T15:23:12Z

+# Use processor.tokenizer.apply_chat_template for video inputs.
+# processor.apply_chat_template rewrites image_url to image before the
+# template runs, which breaks HCX's extension-based video detection.
+text = processor.tokenizer.apply_chat_template(


This is weird, why are we even using image url --> shouldnt it be video url to properly distinguish between img/video? cc @zucchini-nlp

huh, didn't see it. I believe related ti how the jinja template is saved on the hub

This indeed looks weird, lets' ping the Naver team to change model-type/jinja/etc or ask them to host a non-remote version. @jp1924 , when you are done passing over comments, could you list the things we need to change on the hub and ping @ (bigshanedogg). We need to either push directly to the existing repo or upload a new repo with {{MODELNAME}}-hf suffix

#44314 (comment)
You can check the details via this link.
Sharing the text below as well since the link is dead.

I added this logic because video handling for HCX is broken on recent versions of transformers. If you look at HCX's chat_template.jinja, it handles image/video differently from most models. Models like `Qwen2_5_vl` or `VideoLLaMA3` expect multimodal items in a normalized form such as `{"type": "image"}` or `{"type": "video"}`. HCX, however, uses `{"type": "image_url", "image_url": {"url": "video.mp4"}}` and `{"type": "image_url", "image_url": {"url": "image.png"}}`, then branches by file extension inside the template. In newer transformers, this block rewrites image_url into image: https://github.com/huggingface/transformers/blob/2da00a3cec88fac160d481406e7961cf59472894/src/transformers/processing_utils.py#L1792-L1799 Because of that rewrite, HCX's image_url-based branch no longer triggers correctly. As a result, even when a video is provided, it gets rendered like plain text: ``` >>> from transformers import AutoProcessor >>> processor = AutoProcessor.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Think-32B", trust_remote_code=False) You are using a model of type `vlm` to instantiate a model of type ``. This may be expected if you are loading a checkpoint that shares a subset of the architecture (e.g., loading a `sam2_video` checkpoint into `Sam2Model`), but is otherwise not supported and can yield errors. Please verify that the checkpoint is compatible with the model you are instantiating. >>> video_messages = [ ... { ... "role": "user", ... "content": [ ... {"type": "image_url", "image_url": {"url": "https://huggingface.co/datasets/hf-internal-testing/sam2-fixtures/resolve/main/bedroom.mp4"}}, ... {"type": "text", "text": "What is shown in this video?"}, ... ], ... } ... ] >>> text = processor.apply_chat_template(video_messages, tokenize=False, add_generation_prompt=True) >>> text '<|im_start|>user\n\nWhat is shown in this video?<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n' >>> video_messages [{'role': 'user', 'content': [{'type': 'image', 'url': 'https://huggingface.co/datasets/hf-internal-testing/sam2-fixtures/resolve/main/bedroom.mp4'}, {'type': 'text', 'text': 'What is shown in this video?'}]}] ``` If it were working correctly, the output should include the video MIME and video tokens, e.g. mime_start / video_aux_start / VIDEO_PAD, not the plain-text-only prompt above. bigshanedogg The current multimodal input contract in naver-hyperclovax/HyperCLOVAX-SEED-Think-32B is non-standard. I opened a Hub discussion/PR to update chat_template.jinja so it aligns with current processor behavior: https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B/discussions/12 zucchini-nlp Until that template update is merged, we need to keep this compatibility code in this PR. I'll follow up again once we have a decision on the Hub-side template PR.

vasqu · 2026-05-19T15:24:24Z

+
+## Notes
+
+- HyperCLOVAX Vision V2 uses a unique media input format. Both images and videos are specified using `{"type": "image_url", "image_url": {"url": "..."}}`. The processor and chat template distinguish images from videos by file extension (`.mp4`, `.avi`, `.mov`, `.mkv`, `.webm`, `.flv`, `.wmv`, `.m4v` are treated as video; everything else is treated as image).


Ah ok this is the explicit change, imo would maybe make sense to have a different chat template if possible. This really is weird

vasqu · 2026-05-19T15:25:29Z

+    ```
+
+    > [!WARNING]
+    > Video input via `processor.apply_chat_template` is currently broken. Recent Transformers versions rewrite `image_url` entries to `image` before the chat template runs, so the video-detection branch in HCX's template never triggers and video inputs are silently dropped. As a workaround, use `processor.tokenizer.apply_chat_template` to render the prompt text, then pass the video path separately to `processor(...)`. See [this review comment](https://github.com/huggingface/transformers/pull/44314#discussion_r3008382827) for details.


I think the review comment is closed, cant load via url it seems

I've reopened the comment.

vasqu · 2026-05-19T15:48:57Z

+@auto_docstring
+class HCXVisionV2ForConditionalGeneration(HCXVisionV2PreTrainedModel, GenerationMixin):
+    accepts_loss_kwargs = False
+    _tied_weights_keys = {"lm_head.weight": "model.language_model.embed_tokens.weight"}


Sorry but I don't really agree, we can inherit a lot from exaone 4.5 here

the flags are the same

init is the same (minus the vocab size but we can just call the forward similarly to get the vocab size)

get img/video features are identical (only wrappers, agree base mode is different but it doesnt matter here)

The difference can be easily overwriten as is (forward/prepare for generate). And we can ignore the methods we don't need with def method(...): raise AttributeError()

Currently it seems we miss a few utilities as expand input is missing which will cause failures with beam search iirc

bigshanedogg

Thanks for the changes — I've reviewed them.

The main thing I took away is that the model_type in config.json needs to be updated. If my understanding from the comment above is correct, that should be all that's needed for this part:
#44314 (comment)

I also have two minor suggestions.

First, would it be possible to rename the classes from HCX to HyperCLOVAX?
This would also be more consistent with the naming convention used in the related preceding PR for the HyperCLOVAX bare LM — e.g., HyperCLOVAXVisionV2Config.

Second, regarding the processor updates: in previous models, image_processing_*.py and video_processing_*.py are split into separate files, with processing_*.py acting as the unified entry point. We were planning to push a new version of the processor for hyperclovax_vision_v2 to the hub following the same structure, and I was wondering if it might make sense to include that update in this branch as well.
(ref: https://github.com/huggingface/transformers/tree/main/src/transformers/models/qwen2_vl)

Please feel free to share any feedback on the minor points.

As for the config.json change, based on my understanding only the model_type needs to be updated to "hyperclovax_vision_v2" — could you double-check this? Once confirmed, I'll go ahead and update it on the hub.

vasqu

Sorry most of the team was off for a few days!

Responding now:

I think the rename is reasonable (HCX -> HyperCLOVAXVisionV2) cc @jp1924
Hmm, not sure I can follow the processor update but in general yes, we would likely need to update the code as well. I would not update qwen2 vl itself - if there are changes needed for this model, I'd rather have a custom version that builds upon that (that's what modular is for). It kind of relates to #44314 (comment) where we want to change the chat template to properly separate between images and videos.
Yes, exactly the model type on the hub needs to be corrected, e.g. see https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B/blob/main/config.json#L25 where we have "model_type": "vlm", but need to sync with here

Ideally, we try to sync with this PR so that we can avoid the remote code and directly move to the "native" version. Meaning, we keep these changes to a PR and only merge them when everything is ready here, wdyt? @bigshanedogg @jp1924

…vax_vision_v2.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

- Updated class names and configurations to follow the new naming convention (e.g., HCXVisionV2 to HyperCLOVAXVisionV2). - Enhanced documentation strings for clarity and consistency. - Adjusted default token IDs for image and video processing. - Removed unnecessary attributes and methods related to token type IDs and temporal RoPE. - Improved the processor's handling of multimodal inputs, including image and video tokens. - Updated test cases to reflect the new class names and configurations.

jp1924 · 2026-05-27T00:43:04Z

@vasqu @bigshanedogg @zucchini-nlp

Sorry for the late reply.
I agree with your suggestions.
The first suggestion has been reflected in the code.
The third suggestion will be handled on the hub side, so I've removed the related note from the docs in advance.
Regarding the chat_template, @bigshanedogg — I'll remove it from the docs once we've had further discussion and the final fix is in.

vasqu · 2026-05-28T12:10:24Z

Did I understand it correctly that the template fixes are still missing then? @bigshanedogg @jp1924

Would it be possible to make them available (through a hub PR for example). The goal is to split the image to separate image and video tokens / placeholders. Would like to review when we have everything together if that's ok

bigshanedogg · 2026-05-28T12:36:06Z

Sure — here's my proposed breakdown and ordering of the work:

Template fix on the hub — @bigshanedogg
Add the hyperclovax_vision_v2 processor — @bigshanedogg
- qwen2_vl is the reference; we'd add processing_hyperclovax_vision_v2.py following the same structure
Update the related docs and validate — @jp1924
(Optional) Benchmark validation tests — @bigshanedogg
Final review — @vasqu , @zucchini-nlp

I was thinking we could proceed roughly like this: once 1) and 2) are done, I'll share the hub PR here.
@jp1924 can then pick up 3) based on that, and once that's wrapped up we can move to the final review in 5).
Let me know if this works for you.

Also, I saw your comment from yesterday but wasn't able to get to it today.
That said, since I had already started on the processor-related work in advance,
I should be able to wrap up 1) and 2) by the end of the weekend and share the PR with you then.

vasqu · 2026-05-28T12:44:47Z

Awesome, sounds good to me! Thanks a lot for all the work and quick responses, appreciate it

jp1924 · 2026-05-28T13:46:01Z

@bigshanedogg
If I understand correctly,

you modify the chat-template.jinja file in the hub,
and configure the image and video processors in the hub to use hyperclovax_vision_v2.
Since the model_type will change to hyperclovax_vision_v2 during this process,
I’ll finish the work on the transformers based on this,
After verifying performance based on the current code,
Final review

Is that what you’re saying?
OK, got it.

bigshanedogg · 2026-05-29T03:42:32Z

Thanks for summarizing — your understanding is largely correct!

3, 4, and 5 are exactly as you described. For 1 and 2, I'd just like to align on a couple of small details:

The hub-side updates involve both config.json (the model_type field) and chat_template.jinja. Just wanted to call out config.json explicitly so it's clearly on our radar alongside the template fix.
The processor changes will land both on the hub and in this branch.
- As in the qwen2_vl example I shared earlier, the shared logic in processing_*.py — which can be reused across multiple models (2B, 7B, or even qwen2.5_vl, etc.) — lives inside the transformers package for reusability, while model-specific differences are kept in processor_config.json on the hub repo.
- References:
  - https://github.com/huggingface/transformers/tree/main/src/transformers/models/qwen2_vl
  - https://huggingface.co/Qwen/Qwen2-VL-2B/tree/main

As for where to commit the processor changes

A. either directly to your jp1924:feat/hcx-seed-32b branch, or
B. via a separate branch merged into this PR
I'd be happy to defer to @jp1924 and @vasqu on this.

If it helps, my own thinking was to commit directly to this branch, since the processor additions naturally move together with the model code, and keeping things in a single branch might be a bit easier to track given everything else the reviewers are juggling.
But I'm equally happy to go with whichever flow works best for you both — please let me know what you'd prefer.

jp1924 · 2026-05-29T05:18:37Z

@bigshanedogg
Got it, I understand option 1.

As for option 2, regardless of the exact approach, it basically means restoring the video and image processors that were previously removed as duplicates during this PR, right?
And since the video/image processor work is already in progress anyway, I’m good with going that route.

Yeah, let’s proceed like that. You can either open a PR against jp1924:feat/hcx-seed-32b, send it as a code suggestion (not sure if that’ll work here though), or upload it to the hub and I can commit/apply it on my side.

Since this means adding more code changes around the video/image processors, I think there are probably two scenarios:

You make the implementation polished enough that reviewers won’t have much to comment on, or
After merge, I’ll handle follow-up changes based on reviewer feedback.

Either way, please understand that I may need to touch up some parts of the code afterward depending on review comments.

Since this involves some additional implementation work, the PR will probably get delayed a bit more.
But looking at the hub, I saw one PR with 596 comments that stayed open for 8 months lol.
Compared to that, we’re still doing fine 😂

I’ve also been pretty busy lately, so I couldn’t spend much time on this PR and my replies were slower than usual.
Things have calmed down a bit now though, so I should be able to help get this wrapped up faster.

github-actions · 2026-05-29T05:27:01Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, hyperclovax_vision_v2, qwen2_5_vl

vasqu · 2026-05-29T10:53:26Z

So I see we are going for option 2? But yea fine with either way tbh, just a matter of preference for you guys to sync 🤗

jp1924 · 2026-05-29T11:11:04Z

Yeah, looks like I'll go with Option 2. I'll send another review request once the work is done!

bigshanedogg · 2026-06-01T19:05:54Z

Sharing a quick progress update since things are running a bit behind.

Status

✅ 1) Hub-side updates: config.json, chat_template.jinja, removal of the .py files, and preprocessor_config.json adjustments to align with the upstream processor.
✅ 2) Processor changes for this PR: split processing_*.py into processing_*.py / image_processing_*.py / video_processing_*.py, plus auto mapping updates. (Logic-side work is complete; only the style/lint pass with make fix-repo remains.)
🔄 4) (Optional) Benchmark validation tests

What's blocking

A quick sanity check on our side suggests the current state doesn't fully reproduce our released performance — AI2D_test is around 65 vs. the expected 85+.
This isn't a conclusive result though; it was a quick first pass, so I may have missed something and I'm re-verifying now.
This is what I've looked into so far:

✅ Processor outputs (pixel_values, etc.) compared — no issue on the first pass.
🔄 Comparing intermediate tensors stage by stage (e.g., post-VE, post-VisionMLP, etc.).

@jp1924, if you happen to have any prior sanity-check results or notes from your side, I'd love to cross-reference them — it might well be a small processor-side issue I'm overlooking, or a minor cause like a dtype mismatch that sometimes pops up.
Your input would help me narrow things down faster.

Path forward

Given @jp1924's earlier concern about not dragging this PR out, one option is to merge as-is and let me submit a follow-up fix PR once we've found the root cause.
If you'd prefer to land the fix here, I can open a PR against jp1924:feat/hcx-seed-32b with the processor changes in the meantime.
(Side note: I noticed jp1924/transformers doesn't currently appear as a selectable base repository on my end, or if I'm just missing something.)

Either way works for me — please let me know what would prefer 🙏

vasqu · 2026-06-01T19:26:35Z

I'm fine with either way, we can also fix afterwards but best would be to have aligned outputs tbh - except we are really in a time rush. Correctness > quickness imo

selectable base repository --> it might be because you created your changes based on the transformers base not the fork? Either way, @jp1924 could add your branch as remote and add a PR based on that I think OR you could be added as collaborator so you can "sync" it to the fork there and open a PR that way.

jp1924 · 2026-06-01T23:37:16Z

@bigshanedogg
This is an evaluation I conducted a while back. I simply used KoBEST, which is based on log-likelihood, for the evaluation.
#44314 (comment)

I used to be against PRs getting too long, but after seeing the PRs for the “Add model” features open on the Transformers GitHub, I’ve had a change of heart—it’s totally fine now, lol.
Also, as @vasqu mentioned, I think it’s better to release the feature in a stable state rather than in an incomplete one.

Hmm... What kind of evaluation are you currently conducting?
Could you share the evaluation results and the environment with me? I’ll take a look on my end as well.

bigshanedogg · 2026-06-03T21:26:10Z

@jp1924,
I agree with you that ideally we'd want to resolve the reproducibility question before release.
I'll re-check my own evaluation process first to rule out any mistakes on my side, and if a real fix to the model code is needed, I'll prepare a PR against this branch.

Sharing a few details below for context and to keep us in sync:

This is an evaluation I conducted a while back. I simply used KoBEST, which is based on log-likelihood, for the evaluation.

Since the 32B uses the bare LM class, the KoBEST result isn't unrelated — but as it's a bare-LM evaluation, it is not sufficient on its own to validate this PR, where we'd want an end-to-end VLM benchmark.

What kind of evaluation are you currently conducting?

The evaluation I'm running is essentially the lmms-eval-based VLM evaluation that you mentioned in #44314 (comment) hadn't been completed yet.
So if you're open to cross-checking together, it can be seen as a continuation of that earlier effort.

I attempted evaluation using evolvinglmms-lab/lmms-eval with NCSOFT/K-MMStar, but encountered errors during the lmms-eval preprocessing pipeline due to differences in chat template handling compared to standard VLMs.

For the chat template issue, two options should work:

Use the revision parameter pointing to the Hub-side update I shared in the previous comment.
Download the checkpoint locally, modify chat_template.jinja as needed, and load it via from_pretrained() with the local path.

The second approach in particular should let you freely adjust the checkpoint, so the limitation you ran into shouldn't be a blocker.

Also, a quick note on benchmark choice: K-MMStar is a fine choice, but for comparison with the Tech Report numbers, K-MMBench, K-DTCBench, or TextVQA would probably be more aligned.
Llava-W tends to be noisy due to its llm-as-a-judge nature, and SeedBench has a large sample count so it takes quite a while to run.
I started with AI2D_test simply because it's the fastest to run — I'm planning to validate against the benchmarks above as well.

Could you share the evaluation results and the environment with me?

If the target VLM in lmms-eval is adapted to load HyperCLOVAX-SEED-Think-32B via the transformers code in this PR, the evaluation should be equivalent — the underlying metric shouldn't differ between tools.
For full transparency: my evaluation is run through an internal tool that we used to measure the numbers reported in the Tech Report, so I'm not able to share the environment itself.
(For the evaluation results, please refer to the AI2D_test numbers and the numbers from Tech report I shared above.)

jp1924 · 2026-06-05T05:45:22Z

@bigshanedogg
Yeah, sounds good. I'll give that a shot and see how it goes.

By the way, if the processor changes you worked on are already done on your side, could you send over a PR for those first?

We're already validating things anyway, so there's no reason not to get the processor review started in parallel. Anything we can get reviewed and out of the way early, let's just get it done and keep things moving.

Oh, and could you also take a look at the PR I uploaded to the Hub and get that merged when you get a chance?

At this point we've been going back and forth on this for quite a while, so it'd be pretty cool if I could become a contributor to your repo too 😂

jp1924 added 3 commits February 27, 2026 01:44

add: hcx vision model

647ac9d

add: modeling test

f420da6

Merge branch 'main' into feat/hcx-seed-32b

3aeac2e

jp1924 changed the title ~~add HyperCLOVAX Vision~~ add HyperClovaX Vision Feb 27, 2026

jp1924 added 3 commits February 27, 2026 04:32

fix: update vision model selection logic for HCXVisionModel

c3f0231

fix: prepare_inputs_for_generation to handle first iteration logic

d231dff

fix: IndentationError:

e67eaf7

jp1924 added 12 commits March 1, 2026 19:31

Merge branch 'main' into feat/hcx-seed-32b

a33087b

fix: lint

26ad483

fix: update conditional checks for model type comparisons

7df5a6d

fix: set pixel_values and pixel_values_videos to None when not first …

8638c8b

…iteration and using cache

fix: remove unnecessary coding declaration from HyperClovaX configura…

f459f54

…tion file

fix: streamline imports in test_modeling_hyperclovax_vision.py

2fff572

fix: add hyperclovax model type to SPECIAL_MODEL_TYPE_TO_MODULE_NAME …

0212ef5

…mapping

fix: add docstrings for image and video grid dimensions in HCXVisionF…

eb222b7

…orConditionalGeneration

fix: update docstrings for parameters in HyperClovaX configuration an…

781c956

…d processor classes

test: add unit tests for HyperClovaXProcessor functionality and chat …

c59d586

…template application

fix: update image processing logic and enhance multimodal token handl…

21816a2

…ing in HyperClovaXProcessor

fix: update parameter types and docstrings in HyperClovaX configurati…

843f314

…on classes

zucchini-nlp reviewed Mar 2, 2026

View reviewed changes

jp1924 added 7 commits March 3, 2026 09:42

Merge branch 'main' into feat/hcx-seed-32b

bd62dfa

Merge branch 'main' into feat/hcx-seed-32b

05ff2e9

'models/auto' change

a509a21

remove image processor

82b81f0

Refactor HyperClovaX configuration classes and update type hints for …

ab921aa

…clarity

Change the class name to HCXVisionV2Processor and update the image/vi…

6add4b7

…deo processor classes.

Refactor HCXVisionModel and related classes: rename HyperClovaXConfig…

23040f4

… to HCXVisionConfig and remove unnecessary code.

vasqu reviewed May 19, 2026

View reviewed changes

bigshanedogg reviewed May 20, 2026

View reviewed changes

vasqu reviewed May 25, 2026

View reviewed changes

jp1924 and others added 8 commits May 26, 2026 14:26

Update src/transformers/models/hyperclovax_vision_v2/modular_hyperclo…

e941367

…vax_vision_v2.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

Update docs/source/en/model_doc/hyperclovax_vision_v2.md

0c64154

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

Merge branch 'main' into feat/hcx-seed-32b

80a9622

Merge branch 'main' into feat/hcx-seed-32b

a4e7a48

refactor test and modular

42a028a

docs: update HyperCLOVAX Vision V2 model documentation and examples

88e9c63

Merge branch 'main' into feat/hcx-seed-32b

36388fa

fix: remove hyperclovax tokenizer

0d112c7

jp1924 requested review from bigshanedogg and vasqu May 27, 2026 02:24

Merge branch 'huggingface:main' into feat/hcx-seed-32b

724f795


		## Notes

		- HyperCLOVAX Vision V2 uses a unique media input format. Both images and videos are specified using `{"type": "image_url", "image_url": {"url": "..."}}`. The processor and chat template distinguish images from videos by file extension (`.mp4`, `.avi`, `.mov`, `.mkv`, `.webm`, `.flv`, `.wmv`, `.m4v` are treated as video; everything else is treated as image).

Uh oh!

Conversation

jp1924 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

jp1924 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jp1924 commented Feb 27, 2026

Uh oh!

ArthurZucker commented Mar 2, 2026

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasqu May 19, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp May 20, 2026

Choose a reason for hiding this comment

Uh oh!

jp1924 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu May 19, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu May 19, 2026

Choose a reason for hiding this comment

Uh oh!

jp1924 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vasqu May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bigshanedogg left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

jp1924 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu commented May 28, 2026

Uh oh!

bigshanedogg commented May 28, 2026

Uh oh!

vasqu commented May 28, 2026

Uh oh!

jp1924 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bigshanedogg commented May 29, 2026

Uh oh!

jp1924 commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

jp1924 commented Feb 27, 2026 •

edited

Loading

jp1924 commented Feb 27, 2026 •

edited

Loading

bigshanedogg left a comment •

edited

Loading

jp1924 commented May 27, 2026 •

edited

Loading

jp1924 commented May 28, 2026 •

edited

Loading

jp1924 commented May 29, 2026 •

edited

Loading

bigshanedogg commented Jun 1, 2026 •

edited

Loading

jp1924 commented Jun 1, 2026 •

edited

Loading

bigshanedogg commented Jun 3, 2026 •

edited

Loading