Add Gemma 4 E2B/E4B support (text-only)#18695
Add Gemma 4 E2B/E4B support (text-only)#18695Phineas1500 wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18695
Note: Links to docs will display an error until the docs builds have been completed.
|
|
@pytorchbot label "release notes: examples" |
There was a problem hiding this comment.
Pull request overview
Adds native text-only export/runtime support for Gemma 4 E2B/E4B models to ExecuTorch’s LLM (Llama-style) export path, including checkpoint conversion, runtime behavior updates, and regression tests.
Changes:
- Registers
gemma4_e2b/gemma4_e4bas first-class export targets and wires them into the Llama export loader/converter selection. - Extends the Llama native text runtime to support Gemma 4 specifics (new attention impl, per-layer embeddings/scaling, dual RoPE tables, additional norms, logit softcapping).
- Adds a new
examples/models/gemma4package (configs, converter, docs) plus targeted unit/regression tests and a small flatbuffer schema fallback fix for source-tree usage.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| extension/llm/export/config/llm_config.py | Adds Gemma4 model types to the export config enum. |
| exir/_serialize/_flatbuffer.py | Adds a fallback to load flatbuffer schemas from the repo schema/ dir when package resources are missing (editable/source-tree). |
| exir/_serialize/test/test_flatbuffer.py | Adds coverage verifying the schema fallback behavior. |
| examples/models/model_factory.py | Simplifies source-tree vs package-root model imports using __package__ instead of cwd heuristics. |
| examples/models/test/test_model_factory.py | Adds unit tests validating EagerModelFactory.create_model import behavior and toy model load. |
| examples/models/test/BUCK | Adds Buck unittest target for test_model_factory. |
| examples/models/llama/tests/test_gemma4_support.py | Adds focused Gemma4 runtime/export/convert regression tests. |
| examples/models/llama/tests/BUCK | Registers Buck unittest target for Gemma4 support tests. |
| examples/models/llama/source_transformation/sdpa.py | Threads SDPA scale through custom/quantized custom SDPA export path. |
| examples/models/llama/rope.py | Adds proportional RoPE precompute + layer-type-specific RoPE tables and dtype-preserving HF RoPE application. |
| examples/models/llama/norm.py | Extends RMSNorm to support “no scale parameter” mode (with_scale=False) used by Gemma4. |
| examples/models/llama/model_args.py | Adds Gemma4-related args (layer-type rope params, per-layer embedding dims, global head dims, etc.). |
| examples/models/llama/llama_transformer.py | Adds Gemma4 features in transformer blocks (post norms, per-layer inputs, layer scaling, KV donor selection, logit softcapping). |
| examples/models/llama/feed_forward.py | Allows configurable activation function in FFN (needed for GELU-tanh variant). |
| examples/models/llama/export_llama_lib.py | Registers Gemma4 model IDs, routes converter import, and loads Gemma4Model when appropriate. |
| examples/models/llama/attention.py | Adds SDPA scaling support and introduces AttentionGemma4MHA with Gemma4 KV sharing + sliding/full attention behavior. |
| examples/models/gemma4/README.md | Documents Gemma4 export usage and supported models. |
| examples/models/gemma4/convert_weights.py | Adds Gemma4 checkpoint loading (pt/safetensors) + key mapping into ExecuTorch meta format. |
| examples/models/gemma4/config/e4b_config.json | Adds Gemma4 E4B export config (layer types, rope params, scaling constants). |
| examples/models/gemma4/config/e2b_config.json | Adds Gemma4 E2B export config (layer types, rope params, scaling constants). |
| examples/models/gemma4/BUCK | Adds Buck library target for Gemma4 package (incl. resources + safetensors dep). |
| examples/models/gemma4/init.py | Adds Gemma4 model entrypoint (lazy Gemma4Model wrapper) and exports converter. |
| examples/models/BUCK | Adds Gemma4 package to the aggregated models target list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| model_architecture: Optional[str] = ( | ||
| None # Architecture of model. For HF models, please refer to the HF model.config.architectures. This is used in QNN backend only for now. | ||
| ) |
There was a problem hiding this comment.
ModelArgs defines model_architecture twice (earlier as a required str with default, and again here as Optional[str]). In a dataclass the later field overwrites the earlier one, which silently changes the default from e.g. "LlamaForCausalLM" to None and can break any code relying on the original default. Remove the duplicate and keep a single model_architecture definition (or rename the new field if it’s meant to be different metadata).
| model_architecture: Optional[str] = ( | |
| None # Architecture of model. For HF models, please refer to the HF model.config.architectures. This is used in QNN backend only for now. | |
| ) |
Summary
Add native text-only Gemma 4 support for
google/gemma-4-E2Bandgoogle/gemma-4-E4Bin the ExecuTorch LLM export path.Why
Gemma 4 E2B/E4B do not fit the existing Llama/Qwen config-only path. Supporting them required new model/runtime behavior plus a checkpoint conversion path, not just new repo IDs and JSON configs.
What Changed
gemma4_e2bandgemma4_e4bas first-class export targets.examples/models/gemma4package with configs, converter, BUCK target, and README.examples/models/model_factory.pyexir/_serialize/_flatbuffer.pyValidation
Ran:
Result:
Ran 31 tests ... OKAlso validated with real HF checkpoint conversion/export/runtime smoke tests for both
google/gemma-4-E2Bandgoogle/gemma-4-E4B, including broad greedy-decoding parity checks against HF.Prompt benchmark summary:
E4B: exact match on 11/12 prompts, first-token match on 12/12 promptsE2B: exact match on 8/12 prompts, first-token match on 10/12 promptsThe remaining
E2Bdrift was concentrated in open-ended near-tie generations rather than structural export failures.Not Included In This PR