Skip to content

Add Gemma 4 E2B/E4B support (text-only)#18695

Open
Phineas1500 wants to merge 1 commit intopytorch:mainfrom
Phineas1500:codex/gemma4-e2b-e4b-support
Open

Add Gemma 4 E2B/E4B support (text-only)#18695
Phineas1500 wants to merge 1 commit intopytorch:mainfrom
Phineas1500:codex/gemma4-e2b-e4b-support

Conversation

@Phineas1500
Copy link
Copy Markdown
Contributor

@Phineas1500 Phineas1500 commented Apr 3, 2026

Summary

Add native text-only Gemma 4 support for google/gemma-4-E2B and google/gemma-4-E4B in the ExecuTorch LLM export path.

Why

Gemma 4 E2B/E4B do not fit the existing Llama/Qwen config-only path. Supporting them required new model/runtime behavior plus a checkpoint conversion path, not just new repo IDs and JSON configs.

What Changed

  • Register gemma4_e2b and gemma4_e4b as first-class export targets.
  • Add a new examples/models/gemma4 package with configs, converter, BUCK target, and README.
  • Extend the native text runtime for Gemma 4-specific behavior, including:
    • layer-type-aware sliding/full attention
    • dual RoPE behavior
    • shared-KV reuse
    • per-layer input embeddings / scaling
    • GELU-tanh MLP support
    • post-attention and post-FFN norms
    • layer scaling and final logit softcapping
  • Carry Gemma 4 attention scaling through the custom-SDPA export path.
  • Add focused regression coverage for Gemma 4 support.
  • Add two small supporting fixes discovered during validation:
    • source-tree import cleanup in examples/models/model_factory.py
    • source-tree flatbuffer schema fallback in exir/_serialize/_flatbuffer.py

Validation

Ran:

conda activate et_pt211_clean
export PYTHONNOUSERSITE=1
export PYTHONPATH=..
python -m unittest \
  executorch.examples.models.test.test_model_factory \
  executorch.exir._serialize.test.test_flatbuffer \
  executorch.examples.models.llama.tests.test_gemma4_support \
  executorch.examples.models.qwen3_5.tests.test_convert_weights

Result: Ran 31 tests ... OK

Also validated with real HF checkpoint conversion/export/runtime smoke tests for both google/gemma-4-E2B and google/gemma-4-E4B, including broad greedy-decoding parity checks against HF.

Prompt benchmark summary:

  • E4B: exact match on 11/12 prompts, first-token match on 12/12 prompts
  • E2B: exact match on 8/12 prompts, first-token match on 10/12 prompts

The remaining E2B drift was concentrated in open-ended near-tie generations rather than structural export failures.

Not Included In This PR

  • Gemma 4 multimodal support
  • Qualcomm/QNN or other backend-specific bring-up
  • A dedicated Gemma 4 runner / example app beyond the native text export path
  • CI end-to-end export coverage with real HF weights
  • Performance or memory tuning work beyond correctness bring-up

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 3, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18695

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 11 Awaiting Approval

As of commit 8fa1faa with merge base 28f3cf3 (image):

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 3, 2026
@Phineas1500
Copy link
Copy Markdown
Contributor Author

@pytorchbot label "release notes: examples"

@pytorch-bot pytorch-bot bot added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label Apr 3, 2026
@Phineas1500 Phineas1500 changed the title [codex] Add Gemma 4 E2B/E4B support Add Gemma 4 E2B/E4B support (text-only) Apr 4, 2026
@Phineas1500 Phineas1500 marked this pull request as ready for review April 4, 2026 03:27
Copilot AI review requested due to automatic review settings April 4, 2026 03:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds native text-only export/runtime support for Gemma 4 E2B/E4B models to ExecuTorch’s LLM (Llama-style) export path, including checkpoint conversion, runtime behavior updates, and regression tests.

Changes:

  • Registers gemma4_e2b / gemma4_e4b as first-class export targets and wires them into the Llama export loader/converter selection.
  • Extends the Llama native text runtime to support Gemma 4 specifics (new attention impl, per-layer embeddings/scaling, dual RoPE tables, additional norms, logit softcapping).
  • Adds a new examples/models/gemma4 package (configs, converter, docs) plus targeted unit/regression tests and a small flatbuffer schema fallback fix for source-tree usage.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
extension/llm/export/config/llm_config.py Adds Gemma4 model types to the export config enum.
exir/_serialize/_flatbuffer.py Adds a fallback to load flatbuffer schemas from the repo schema/ dir when package resources are missing (editable/source-tree).
exir/_serialize/test/test_flatbuffer.py Adds coverage verifying the schema fallback behavior.
examples/models/model_factory.py Simplifies source-tree vs package-root model imports using __package__ instead of cwd heuristics.
examples/models/test/test_model_factory.py Adds unit tests validating EagerModelFactory.create_model import behavior and toy model load.
examples/models/test/BUCK Adds Buck unittest target for test_model_factory.
examples/models/llama/tests/test_gemma4_support.py Adds focused Gemma4 runtime/export/convert regression tests.
examples/models/llama/tests/BUCK Registers Buck unittest target for Gemma4 support tests.
examples/models/llama/source_transformation/sdpa.py Threads SDPA scale through custom/quantized custom SDPA export path.
examples/models/llama/rope.py Adds proportional RoPE precompute + layer-type-specific RoPE tables and dtype-preserving HF RoPE application.
examples/models/llama/norm.py Extends RMSNorm to support “no scale parameter” mode (with_scale=False) used by Gemma4.
examples/models/llama/model_args.py Adds Gemma4-related args (layer-type rope params, per-layer embedding dims, global head dims, etc.).
examples/models/llama/llama_transformer.py Adds Gemma4 features in transformer blocks (post norms, per-layer inputs, layer scaling, KV donor selection, logit softcapping).
examples/models/llama/feed_forward.py Allows configurable activation function in FFN (needed for GELU-tanh variant).
examples/models/llama/export_llama_lib.py Registers Gemma4 model IDs, routes converter import, and loads Gemma4Model when appropriate.
examples/models/llama/attention.py Adds SDPA scaling support and introduces AttentionGemma4MHA with Gemma4 KV sharing + sliding/full attention behavior.
examples/models/gemma4/README.md Documents Gemma4 export usage and supported models.
examples/models/gemma4/convert_weights.py Adds Gemma4 checkpoint loading (pt/safetensors) + key mapping into ExecuTorch meta format.
examples/models/gemma4/config/e4b_config.json Adds Gemma4 E4B export config (layer types, rope params, scaling constants).
examples/models/gemma4/config/e2b_config.json Adds Gemma4 E2B export config (layer types, rope params, scaling constants).
examples/models/gemma4/BUCK Adds Buck library target for Gemma4 package (incl. resources + safetensors dep).
examples/models/gemma4/init.py Adds Gemma4 model entrypoint (lazy Gemma4Model wrapper) and exports converter.
examples/models/BUCK Adds Gemma4 package to the aggregated models target list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 158 to 160
model_architecture: Optional[str] = (
None # Architecture of model. For HF models, please refer to the HF model.config.architectures. This is used in QNN backend only for now.
)
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelArgs defines model_architecture twice (earlier as a required str with default, and again here as Optional[str]). In a dataclass the later field overwrites the earlier one, which silently changes the default from e.g. "LlamaForCausalLM" to None and can break any code relying on the original default. Remove the duplicate and keep a single model_architecture definition (or rename the new field if it’s meant to be different metadata).

Suggested change
model_architecture: Optional[str] = (
None # Architecture of model. For HF models, please refer to the HF model.config.architectures. This is used in QNN backend only for now.
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants