[Gemma4] Add docstrings for Per-Layer Embeddings (PLE) pipeline#45207
Open
w4nderlust wants to merge 1 commit intohuggingface:mainfrom
Open
[Gemma4] Add docstrings for Per-Layer Embeddings (PLE) pipeline#45207w4nderlust wants to merge 1 commit intohuggingface:mainfrom
w4nderlust wants to merge 1 commit intohuggingface:mainfrom
Conversation
The PLE system is complex and underdocumented, which makes it hard for third-party implementations (llama.cpp, candle, mlx, etc.) to get right. This adds: - Config docstring for hidden_size_per_layer_input explaining that the actual embedding dim is num_hidden_layers * hidden_size_per_layer_input, the embedding is scaled by sqrt(hidden_size_per_layer_input), and describing the full two-component pipeline - Docstring for get_per_layer_inputs() explaining the token-identity component and the packed-to-4D reshape - Docstring for project_per_layer_inputs() explaining the context-aware projection, normalization, and combination with scale factors - Comment on the PLE init block pointing to the pipeline methods Fixes huggingface#45206
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: gemma4 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #45206
What does this PR do?
Adds documentation for the Gemma4 Per-Layer Embeddings (PLE) system, which is currently pretty hard to reverse-engineer from the code alone.
I ran into this while implementing Gemma4 inference from scratch in Rust. The PLE system has several non-obvious aspects that aren't documented anywhere:
hidden_size_per_layer_input(256) is the per-layer dimension, but the actual embedding weight is[vocab, num_layers * 256]=[262144, 8960]because all layers are packedGemma4TextScaledWordEmbeddingthat silently multiplies bysqrt(256) = 16- this took me a while to track downper_layer_model_projection+ scale + RMSNorm) that combines with the token lookup before being passed to layers, with specific scale factors (1/sqrt(hidden_size)and1/sqrt(2))This PR adds:
hidden_size_per_layer_inputexplaining the packed layout, scaling, and full pipelineget_per_layer_inputs()andproject_per_layer_inputs()Hopefully this saves some pain for others implementing Gemma4 outside of transformers.