Add conversion overrides for Llama, Qwen3, and Gemma 4 models so they roundtrip properly#677
Add conversion overrides for Llama, Qwen3, and Gemma 4 models so they roundtrip properly#677finbarrtimbers wants to merge 7 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f1b69d796a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| hf_config = AutoConfig.from_pretrained(model_id) | ||
| hf_model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float32) |
There was a problem hiding this comment.
Avoid unconditional Hub downloads in unit tests
These helpers are called by the new Qwen3/Gemma tests without any HF_TOKEN/network skip, so a normal pytest src/test/nn/hf/convert_test.py now tries to download Qwen/Qwen3-0.6B and google/gemma-3-270m during unit tests. In offline CI or developer environments this fails before testing the converter, and even with network it pulls hundreds of MB; the existing golden tests gate the same Hub-backed coverage behind HF_TOKEN, so these should use a local/tiny config or be similarly skipped/marked.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b7c58c38c6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} | ||
| BEAKER_TOKEN: ${{ secrets.BEAKER_TOKEN }} | ||
| GOOGLE_CREDENTIALS: ${{ secrets.GOOGLE_CREDENTIALS }} | ||
| HF_TOKEN: ${{ secrets.HF_TOKEN }} |
There was a problem hiding this comment.
Keep Hub token out of the default test job
When the Test matrix runs on same-repo PRs or pushes where secrets are available, setting HF_TOKEN globally makes src/test/nn/hf/golden_tests.py's @skipif(not HF_TOKEN) false, so the ordinary 15-minute CPU test job now downloads and runs the Qwen3-0.6B and Gemma3-270m generation golden tests. That couples normal unit CI to Hub availability and large-model runtime; scope this token to a dedicated/gated job or keep those tests skipped unless explicitly requested.
Useful? React with 👍 / 👎.
Implements tied LM head & word embeddings for Qwen3. The three sizes that Qwen ships tied (0.6B, 1.7B, 4B) now default to tying; 8B/14B/32B stay untied. The HF import path is tie-aware.
Previously, we got errors.