Add conversion overrides for Llama, Qwen3, and Gemma 4 models so they roundtrip properly by finbarrtimbers · Pull Request #677 · allenai/OLMo-core

finbarrtimbers · 2026-05-13T16:05:23Z

Previously, we got errors.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f1b69d796a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T17:19:37Z

+    hf_config = AutoConfig.from_pretrained(model_id)
+    hf_model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float32)


Avoid unconditional Hub downloads in unit tests

These helpers are called by the new Qwen3/Gemma tests without any HF_TOKEN/network skip, so a normal pytest src/test/nn/hf/convert_test.py now tries to download Qwen/Qwen3-0.6B and google/gemma-3-270m during unit tests. In offline CI or developer environments this fails before testing the converter, and even with network it pulls hundreds of MB; the existing golden tests gate the same Hub-backed coverage behind HF_TOKEN, so these should use a local/tiny config or be similarly skipped/marked.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7c58c38c6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-29T20:48:48Z

  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  BEAKER_TOKEN: ${{ secrets.BEAKER_TOKEN }}
  GOOGLE_CREDENTIALS: ${{ secrets.GOOGLE_CREDENTIALS }}
+  HF_TOKEN: ${{ secrets.HF_TOKEN }}


Keep Hub token out of the default test job

When the Test matrix runs on same-repo PRs or pushes where secrets are available, setting HF_TOKEN globally makes src/test/nn/hf/golden_tests.py's @skipif(not HF_TOKEN) false, so the ordinary 15-minute CPU test job now downloads and runs the Qwen3-0.6B and Gemma3-270m generation golden tests. That couples normal unit CI to Hub availability and large-model runtime; scope this token to a dedicated/gated job or keep those tests skipped unless explicitly requested.

Useful? React with 👍 / 👎.

Implements tied LM head & word embeddings for Qwen3. The three sizes that Qwen ships tied (0.6B, 1.7B, 4B) now default to tying; 8B/14B/32B stay untied. The HF import path is tie-aware.

finbarrtimbers added 2 commits May 13, 2026 10:02

conversion works properly

5892f5a

conversion works properly

7b7d90e

finbarrtimbers changed the title ~~Fix conversion so it roundtrips properly~~ Add conversion overrides for Llama, Qwen3, and Gemma 4 models so they roundtrip properly May 13, 2026

finbarrtimbers requested a review from AkshitaB May 13, 2026 16:07

Update CHANGELOG for HF conversion fix

f1b69d7

finbarrtimbers marked this pull request as ready for review May 13, 2026 17:17

finbarrtimbers enabled auto-merge (squash) May 13, 2026 17:17

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

AkshitaB and others added 2 commits May 15, 2026 13:12

Merge branch 'main' into finbarr/fix-conversion

2aa3def

now we use hf_oktne

eb81a6e

finbarrtimbers mentioned this pull request May 26, 2026

Implement tied LM head & word embeddings for Qwen3 #686

Merged

Merge branch 'main' into finbarr/fix-conversion

b7c58c3

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

Implement tied LM head & word embeddings for Qwen3 (#686)

0021abd

Implements tied LM head & word embeddings for Qwen3. The three sizes that Qwen ships tied (0.6B, 1.7B, 4B) now default to tying; 8B/14B/32B stay untied. The HF import path is tie-aware.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add conversion overrides for Llama, Qwen3, and Gemma 4 models so they roundtrip properly#677

Add conversion overrides for Llama, Qwen3, and Gemma 4 models so they roundtrip properly#677
finbarrtimbers wants to merge 7 commits into
mainfrom
finbarr/fix-conversion

finbarrtimbers commented May 13, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		hf_config = AutoConfig.from_pretrained(model_id)
		hf_model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float32)

Conversation

finbarrtimbers commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

finbarrtimbers commented May 13, 2026 •

edited

Loading