Skip to content

add HyperClovaX Vision#44314

Open
jp1924 wants to merge 110 commits intohuggingface:mainfrom
jp1924:feat/hcx-seed-32b
Open

add HyperClovaX Vision#44314
jp1924 wants to merge 110 commits intohuggingface:mainfrom
jp1924:feat/hcx-seed-32b

Conversation

@jp1924
Copy link
Copy Markdown
Contributor

@jp1924 jp1924 commented Feb 27, 2026

What does this PR do?

Hello, Transformers team!

I submitted a PR to add naver-hyperclovax/HyperCLOVAX-SEED-Think-32B (hereafter HCX), developed by the Korean IT company Naver while executing the government's national AI model project.

The HCX code was written based on Transformer 4.52.4, leading to the following issues:

  1. Being based on an outdated Transformer model prevents the application of the latest training optimization techniques supported by Transformer 5.0.0 (e.g., sequence parallelism).
  2. The use of some deprecated code or features may cause unexpected bugs in the latest Transformer version.
  3. The modeling code was overly complex, reducing debugging and development convenience. Additionally, experimental code used during model creation remained untouched.

Moving to Transformer 5.0.0 significantly improved the readability and development convenience of the modeling code. We aim to leverage this to add the HCX model to transformers.

TODO list

  • Add docstrings

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@zucchini-nlp @yonigozlan @molbap

@jp1924 jp1924 changed the title add HyperCLOVAX Vision add HyperClovaX Vision Feb 27, 2026
@jp1924
Copy link
Copy Markdown
Contributor Author

jp1924 commented Feb 27, 2026

When I ran the test in my environment, hcx test ran normally without any failures.
The process for adding a model will likely proceed like this:

  • First, fix all the comments you guys added in the modeling part.
  • (Optional) Add modular code.
  • Once the modeling code work is nearly complete, write the docstrings.
  • Then apply the style.

I think the work will proceed in this manner?

@jp1924
Copy link
Copy Markdown
Contributor Author

jp1924 commented Feb 27, 2026

Hmm... In my environment, torch_compilable_check passes normally, but it fails in this test. I think I need to look into this issue further.

@ArthurZucker
Copy link
Copy Markdown
Collaborator

Yep don't worry we'll fix this one on main!

Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jp1924, thanks a lot for the PR!

I think we need to move all the code to modular file first, since there is a lot of copied stuff from other models. I left you comments below about where exactly can copy from. Ping me for another review when the modular is ready

Comment thread src/transformers/models/hyperclovax_vision/configuration_hyperclovax_vision.py Outdated
Comment thread src/transformers/models/hyperclovax_vision/configuration_hyperclovax_vision.py Outdated
Comment thread src/transformers/models/hyperclovax_vision/configuration_hyperclovax_vision.py Outdated
Comment thread src/transformers/models/hyperclovax_vision/configuration_hyperclovax_vision.py Outdated
Comment thread src/transformers/models/hyperclovax_vision/configuration_hyperclovax_vision.py Outdated
Comment thread src/transformers/models/hyperclovax_vision/modeling_hyperclovax_vision.py Outdated
Comment thread src/transformers/models/hyperclovax_vision/modeling_hyperclovax_vision.py Outdated
Comment thread tests/models/hyperclovax_vision/test_modeling_hyperclovax_vision.py Outdated
Comment thread tests/models/hyperclovax_vision/test_modeling_hyperclovax_vision.py Outdated
Comment thread tests/models/hyperclovax_vision/test_processing_hyperclovax_vision.py Outdated
@zucchini-nlp
Copy link
Copy Markdown
Member

BTW, I didn't forget about this PR, coming to it right after the hyperclovax lm is merged :)

@jp1924
Copy link
Copy Markdown
Contributor Author

jp1924 commented Apr 10, 2026

@zucchini-nlp @bigshanedogg
Sounds good — doc-strings and tests are all done on my end. I was just putting together a summary of what changed to make the review easier. Let me know when it's merged!

@zucchini-nlp
Copy link
Copy Markdown
Member

The text backbone was merged just today, so we can resume this PR now. We'd need to use its own lm backbone and the vision backbone stays as discussed earlier

Whenever you have time @jp1924 feel free to rebase, no rush :)

@bigshanedogg bigshanedogg mentioned this pull request May 7, 2026
6 tasks
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, hyperclovax_vision_v2, qwen2_5_vl

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44314&sha=23a8be

@jp1924
Copy link
Copy Markdown
Contributor Author

jp1924 commented May 11, 2026

@zucchini-nlp @bigshanedogg Could you please take a look for a code review? I'm getting an AssertionError: tensor(False) is not true : Found non-zero attention weights for padding token at batch 1, sequence position 0 in paligemma, but it doesn't seem to be related to this PR

@jp1924 jp1924 requested a review from zucchini-nlp May 11, 2026 08:24
Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Niiice. giving an early approval

Some nits to fix, especially since we added Exaone4.5 which seems to be the best candidate to copy from in modular. And we can iterate with a core maintainter of the rest


# HyperCLOVAX Vision V2

HyperCLOVAX Vision V2 is a multimodal vision-language model developed by NAVER. It combines the HyperCLOVAX language model backbone — based on the [Granite](./granite) architecture with optional post-norm (Peri-LN) layers for MuP scaling — with a [Qwen2.5-VL](./qwen2_5_vl) vision encoder. The model supports text, image, and video inputs and is capable of chain-of-thought reasoning via built-in thinking tokens (`<think>...</think>`).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo we can say that the lm is HypexClovax now

You can find the original HyperCLOVAX-SEED-Think-32B checkpoint on the [naver-hyperclovax/HyperCLOVAX-SEED-Think-32B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B) page.

> [!TIP]
> The `model_type` in the released checkpoint's `config.json` is `"vlm"`, while the Transformers implementation registers this model as `"hyperclovax_vision_v2"`. Due to this mismatch, loading via `AutoModel` or `AutoModelForCausalLM` is not supported. Use the model class directly as shown in the examples below.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets see if we can do anything to this with Naver team before merging

Comment on lines +46 to +48
torch_dtype=torch.bfloat16,
device_map="auto",
attn_implementation="sdpa",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, delete dtype and attn_implementation, they are loaded from config

"deepseek_ocr2",
"fuyu",
"h2ovl_chat",
"hyperclovax_vlm",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we keep the entry with 'vlm'?

Comment on lines +85 to +86
model_type = self.vision_config.get("model_type", "qwen2_5_vl_vision")
model_type = "qwen2_5_vl_vision" if model_type == "qwen2_5_vl" else model_type
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny comment that this is for BC/any other reason would be nice

Comment on lines +266 to +269
@auto_docstring
class HCXVisionV2ForConditionalGeneration(HCXVisionV2PreTrainedModel, GenerationMixin):
accepts_loss_kwargs = False
_tied_weights_keys = {"lm_head.weight": "model.language_model.embed_tokens.weight"}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, copy from exaone and i beleve we can delete identical or similar methods

Comment on lines +803 to +806
WeightRenaming(r"^model.mm_projector", "model.projector"),
WeightRenaming(r"^model.language_model.model.layers", "model.language_model.layers"),
WeightRenaming(r"^model.language_model.model.embed_tokens", "model.language_model.embed_tokens"),
WeightRenaming(r"^model.language_model.model.norm", "model.language_model.norm"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or model\.language_model\.model: model.language_model ?

self.assertIsNotNone(outputs)

def test_reverse_loading_mapping(self, check_keys_were_modified=True):
# Conversion happens only for the `ConditionalGeneration` model, not the base model
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was fixed last Friday and we allow adding conversion per class now, rebasing should help

Comment on lines +259 to +262
@unittest.skip("Loading nested configs with overwritten `kwargs` isn't supported yet, FIXME @raushan.")
def test_load_with_mismatched_shapes(self):
pass

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed maybe as well, I merged smth else last Friday 😆

Comment on lines +40 to +44
@require_vision
@require_torch
@require_torchvision
class HCXVisionV2ProcessorTest(ProcessorTesterMixin, unittest.TestCase):
processor_class = HCXVisionV2Processor
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed, the whole file. We will use the same processor as exaone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants