[Testing] Missing unit tests for Embedder.encode_vision multimodal path

The `Embedder` module in `gemma/gm/nn/_modules.py` currently implements an `encode_vision` method which acts as the critical "bridge" for multimodal inference. This method projects visual features (e.g., from SigLiP) into the Transformer's unified embedding space using `RMSNorm` and an `Einsum` projection.

Currently, there are no dedicated unit tests for this path, as noted by the TODO at line 74 of `gemma/gm/nn/_modules_test.py`. 

**Goal**:
- Implement a robust test suite for `encode_vision`.
- Verify that initializing the `Embedder` with `vision_proj_dim` correctly creates the `mm_input_projection` and `mm_soft_embedding_norm` parameters.
- Ensure that visual tokens are correctly projected to the model's `embed_dim`.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Testing] Missing unit tests for Embedder.encode_vision multimodal path #544

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Testing] Missing unit tests for Embedder.encode_vision multimodal path #544

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions