add HyperClovaX Vision#44314
Conversation
|
When I ran the test in my environment, hcx test ran normally without any failures.
I think the work will proceed in this manner? |
|
Hmm... In my environment, |
…iteration and using cache
…orConditionalGeneration
…d processor classes
…template application
…ing in HyperClovaXProcessor
|
Yep don't worry we'll fix this one on main! |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Hey @jp1924, thanks a lot for the PR!
I think we need to move all the code to modular file first, since there is a lot of copied stuff from other models. I left you comments below about where exactly can copy from. Ping me for another review when the modular is ready
…deo processor classes.
… to HCXVisionConfig and remove unnecessary code.
|
BTW, I didn't forget about this PR, coming to it right after the hyperclovax lm is merged :) |
|
@zucchini-nlp @bigshanedogg |
|
The text backbone was merged just today, so we can resume this PR now. We'd need to use its own lm backbone and the vision backbone stays as discussed earlier Whenever you have time @jp1924 feel free to rebase, no rush :) |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, hyperclovax_vision_v2, qwen2_5_vl |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44314&sha=23a8be |
|
@zucchini-nlp @bigshanedogg Could you please take a look for a code review? I'm getting an |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Niiice. giving an early approval
Some nits to fix, especially since we added Exaone4.5 which seems to be the best candidate to copy from in modular. And we can iterate with a core maintainter of the rest
|
|
||
| # HyperCLOVAX Vision V2 | ||
|
|
||
| HyperCLOVAX Vision V2 is a multimodal vision-language model developed by NAVER. It combines the HyperCLOVAX language model backbone — based on the [Granite](./granite) architecture with optional post-norm (Peri-LN) layers for MuP scaling — with a [Qwen2.5-VL](./qwen2_5_vl) vision encoder. The model supports text, image, and video inputs and is capable of chain-of-thought reasoning via built-in thinking tokens (`<think>...</think>`). |
There was a problem hiding this comment.
imo we can say that the lm is HypexClovax now
| You can find the original HyperCLOVAX-SEED-Think-32B checkpoint on the [naver-hyperclovax/HyperCLOVAX-SEED-Think-32B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B) page. | ||
|
|
||
| > [!TIP] | ||
| > The `model_type` in the released checkpoint's `config.json` is `"vlm"`, while the Transformers implementation registers this model as `"hyperclovax_vision_v2"`. Due to this mismatch, loading via `AutoModel` or `AutoModelForCausalLM` is not supported. Use the model class directly as shown in the examples below. |
There was a problem hiding this comment.
lets see if we can do anything to this with Naver team before merging
| torch_dtype=torch.bfloat16, | ||
| device_map="auto", | ||
| attn_implementation="sdpa", |
There was a problem hiding this comment.
nit, delete dtype and attn_implementation, they are loaded from config
| "deepseek_ocr2", | ||
| "fuyu", | ||
| "h2ovl_chat", | ||
| "hyperclovax_vlm", |
There was a problem hiding this comment.
shouldn't we keep the entry with 'vlm'?
| model_type = self.vision_config.get("model_type", "qwen2_5_vl_vision") | ||
| model_type = "qwen2_5_vl_vision" if model_type == "qwen2_5_vl" else model_type |
There was a problem hiding this comment.
tiny comment that this is for BC/any other reason would be nice
| @auto_docstring | ||
| class HCXVisionV2ForConditionalGeneration(HCXVisionV2PreTrainedModel, GenerationMixin): | ||
| accepts_loss_kwargs = False | ||
| _tied_weights_keys = {"lm_head.weight": "model.language_model.embed_tokens.weight"} |
There was a problem hiding this comment.
same here, copy from exaone and i beleve we can delete identical or similar methods
| WeightRenaming(r"^model.mm_projector", "model.projector"), | ||
| WeightRenaming(r"^model.language_model.model.layers", "model.language_model.layers"), | ||
| WeightRenaming(r"^model.language_model.model.embed_tokens", "model.language_model.embed_tokens"), | ||
| WeightRenaming(r"^model.language_model.model.norm", "model.language_model.norm"), |
There was a problem hiding this comment.
or model\.language_model\.model: model.language_model ?
| self.assertIsNotNone(outputs) | ||
|
|
||
| def test_reverse_loading_mapping(self, check_keys_were_modified=True): | ||
| # Conversion happens only for the `ConditionalGeneration` model, not the base model |
There was a problem hiding this comment.
I think this was fixed last Friday and we allow adding conversion per class now, rebasing should help
| @unittest.skip("Loading nested configs with overwritten `kwargs` isn't supported yet, FIXME @raushan.") | ||
| def test_load_with_mismatched_shapes(self): | ||
| pass | ||
|
|
There was a problem hiding this comment.
Fixed maybe as well, I merged smth else last Friday 😆
| @require_vision | ||
| @require_torch | ||
| @require_torchvision | ||
| class HCXVisionV2ProcessorTest(ProcessorTesterMixin, unittest.TestCase): | ||
| processor_class = HCXVisionV2Processor |
There was a problem hiding this comment.
not needed, the whole file. We will use the same processor as exaone
What does this PR do?
Hello, Transformers team!
I submitted a PR to add naver-hyperclovax/HyperCLOVAX-SEED-Think-32B (hereafter HCX), developed by the Korean IT company Naver while executing the government's national AI model project.
The HCX code was written based on Transformer 4.52.4, leading to the following issues:
Moving to Transformer 5.0.0 significantly improved the readability and development convenience of the modeling code. We aim to leverage this to add the HCX model to transformers.
TODO list
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@zucchini-nlp @yonigozlan @molbap