Add Gemma 4 E2B/E4B support (text-only) by Phineas1500 · Pull Request #18695 · pytorch/executorch

Phineas1500 · 2026-04-03T22:54:45Z

Summary

Add native text-only Gemma 4 support for google/gemma-4-E2B and google/gemma-4-E4B in the ExecuTorch LLM export path.

Why

Gemma 4 E2B/E4B do not fit the existing Llama/Qwen config-only path. Supporting them required new model/runtime behavior plus a checkpoint conversion path, not just new repo IDs and JSON configs.

What Changed

Register gemma4_e2b and gemma4_e4b as first-class export targets.
Add a new examples/models/gemma4 package with configs, converter, BUCK target, and README.
Extend the native text runtime for Gemma 4-specific behavior, including:
- layer-type-aware sliding/full attention
- dual RoPE behavior
- shared-KV reuse
- per-layer input embeddings / scaling
- GELU-tanh MLP support
- post-attention and post-FFN norms
- layer scaling and final logit softcapping
Carry Gemma 4 attention scaling through the custom-SDPA export path.
Add focused regression coverage for Gemma 4 support.
Add two small supporting fixes discovered during validation:
- source-tree import cleanup in examples/models/model_factory.py
- source-tree flatbuffer schema fallback in exir/_serialize/_flatbuffer.py

Validation

Ran:

conda activate et_pt211_clean
export PYTHONNOUSERSITE=1
export PYTHONPATH=..
python -m unittest \
  executorch.examples.models.test.test_model_factory \
  executorch.exir._serialize.test.test_flatbuffer \
  executorch.examples.models.llama.tests.test_gemma4_support \
  executorch.examples.models.qwen3_5.tests.test_convert_weights

Result: Ran 31 tests ... OK

Also validated with real HF checkpoint conversion/export/runtime smoke tests for both google/gemma-4-E2B and google/gemma-4-E4B, including broad greedy-decoding parity checks against HF.

Prompt benchmark summary:

E4B: exact match on 11/12 prompts, first-token match on 12/12 prompts
E2B: exact match on 8/12 prompts, first-token match on 10/12 prompts

The remaining E2B drift was concentrated in open-ended near-tie generations rather than structural export failures.

Not Included In This PR

Gemma 4 multimodal support
Qualcomm/QNN or other backend-specific bring-up
A dedicated Gemma 4 runner / example app beyond the native text export path
CI end-to-end export coverage with real HF weights
Performance or memory tuning work beyond correctness bring-up

pytorch-bot · 2026-04-03T22:54:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18695

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 11 Awaiting Approval

As of commit 8fa1faa with merge base 28f3cf3 ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Phineas1500 · 2026-04-03T22:58:12Z

@pytorchbot label "release notes: examples"

Copilot

Pull request overview

Adds native text-only export/runtime support for Gemma 4 E2B/E4B models to ExecuTorch’s LLM (Llama-style) export path, including checkpoint conversion, runtime behavior updates, and regression tests.

Changes:

Registers gemma4_e2b / gemma4_e4b as first-class export targets and wires them into the Llama export loader/converter selection.
Extends the Llama native text runtime to support Gemma 4 specifics (new attention impl, per-layer embeddings/scaling, dual RoPE tables, additional norms, logit softcapping).
Adds a new examples/models/gemma4 package (configs, converter, docs) plus targeted unit/regression tests and a small flatbuffer schema fallback fix for source-tree usage.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
extension/llm/export/config/llm_config.py	Adds Gemma4 model types to the export config enum.
exir/_serialize/_flatbuffer.py	Adds a fallback to load flatbuffer schemas from the repo `schema/` dir when package resources are missing (editable/source-tree).
exir/_serialize/test/test_flatbuffer.py	Adds coverage verifying the schema fallback behavior.
examples/models/model_factory.py	Simplifies source-tree vs package-root model imports using `__package__` instead of cwd heuristics.
examples/models/test/test_model_factory.py	Adds unit tests validating `EagerModelFactory.create_model` import behavior and toy model load.
examples/models/test/BUCK	Adds Buck unittest target for `test_model_factory`.
examples/models/llama/tests/test_gemma4_support.py	Adds focused Gemma4 runtime/export/convert regression tests.
examples/models/llama/tests/BUCK	Registers Buck unittest target for Gemma4 support tests.
examples/models/llama/source_transformation/sdpa.py	Threads SDPA `scale` through custom/quantized custom SDPA export path.
examples/models/llama/rope.py	Adds proportional RoPE precompute + layer-type-specific RoPE tables and dtype-preserving HF RoPE application.
examples/models/llama/norm.py	Extends `RMSNorm` to support “no scale parameter” mode (`with_scale=False`) used by Gemma4.
examples/models/llama/model_args.py	Adds Gemma4-related args (layer-type rope params, per-layer embedding dims, global head dims, etc.).
examples/models/llama/llama_transformer.py	Adds Gemma4 features in transformer blocks (post norms, per-layer inputs, layer scaling, KV donor selection, logit softcapping).
examples/models/llama/feed_forward.py	Allows configurable activation function in FFN (needed for GELU-tanh variant).
examples/models/llama/export_llama_lib.py	Registers Gemma4 model IDs, routes converter import, and loads Gemma4Model when appropriate.
examples/models/llama/attention.py	Adds SDPA scaling support and introduces `AttentionGemma4MHA` with Gemma4 KV sharing + sliding/full attention behavior.
examples/models/gemma4/README.md	Documents Gemma4 export usage and supported models.
examples/models/gemma4/convert_weights.py	Adds Gemma4 checkpoint loading (pt/safetensors) + key mapping into ExecuTorch meta format.
examples/models/gemma4/config/e4b_config.json	Adds Gemma4 E4B export config (layer types, rope params, scaling constants).
examples/models/gemma4/config/e2b_config.json	Adds Gemma4 E2B export config (layer types, rope params, scaling constants).
examples/models/gemma4/BUCK	Adds Buck library target for Gemma4 package (incl. resources + safetensors dep).
examples/models/gemma4/init.py	Adds Gemma4 model entrypoint (lazy `Gemma4Model` wrapper) and exports converter.
examples/models/BUCK	Adds Gemma4 package to the aggregated models target list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-04T03:34:01Z

examples/models/llama/model_args.py

    model_architecture: Optional[str] = (
        None  # Architecture of model. For HF models, please refer to the HF model.config.architectures. This is used in QNN backend only for now.
    )


ModelArgs defines model_architecture twice (earlier as a required str with default, and again here as Optional[str]). In a dataclass the later field overwrites the earlier one, which silently changes the default from e.g. "LlamaForCausalLM" to None and can break any code relying on the original default. Remove the duplicate and keep a single model_architecture definition (or rename the new field if it’s meant to be different metadata).

Suggested change

model_architecture: Optional[str] = (

None # Architecture of model. For HF models, please refer to the HF model.config.architectures. This is used in QNN backend only for now.

)

Add Gemma 4 E2B/E4B support

8fa1faa

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 3, 2026

pytorch-bot bot added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label Apr 3, 2026

Phineas1500 changed the title ~~[codex] Add Gemma 4 E2B/E4B support~~ Add Gemma 4 E2B/E4B support (text-only) Apr 4, 2026

Phineas1500 marked this pull request as ready for review April 4, 2026 03:27

Phineas1500 requested review from JacobSzwejbka, larryliu0820, lucylq and mergennachin as code owners April 4, 2026 03:27

Copilot AI review requested due to automatic review settings April 4, 2026 03:27

Copilot started reviewing on behalf of Phineas1500 April 4, 2026 03:28 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gemma 4 E2B/E4B support (text-only)#18695

Add Gemma 4 E2B/E4B support (text-only)#18695
Phineas1500 wants to merge 1 commit intopytorch:mainfrom
Phineas1500:codex/gemma4-e2b-e4b-support

Phineas1500 commented Apr 3, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

Phineas1500 commented Apr 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	model_architecture: Optional[str] = (
	None # Architecture of model. For HF models, please refer to the HF model.config.architectures. This is used in QNN backend only for now.
	)

Conversation

Phineas1500 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What Changed

Validation

Not Included In This PR

Uh oh!

pytorch-bot bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18695

⚠️ 11 Awaiting Approval

Uh oh!

Phineas1500 commented Apr 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Phineas1500 commented Apr 3, 2026 •

edited

Loading

pytorch-bot bot commented Apr 3, 2026 •

edited

Loading