Skip to content

Commit d77ea77

Browse files
committed
Add Gemma 3n E4B audio encoder (Conformer) support
12-layer USM-style Conformer audio encoder for Gemma 3n E4B multimodal models. Enables on-device audio encoding: mel spectrogram → Conformer → embedder pipeline. New files: - Gemma3nAudio.swift: Full Conformer port (chunked local attention, depthwise conv1d, cumulative group norm, temporal reduction, sub-sample conv projection) - Gemma3nAudioConfig.swift: 28 audio encoder configuration parameters - Gemma3nVLM.swift: Top-level VLM wrapper with audio embedding injection - Gemma3nAudioTests.swift: Configuration decoding tests Architecture: mel [1,T,128] → SubSampleConv (4x) → 12 Conformer blocks → temporal reduction (4x) → AudioEmbedder (1536→2048) → LM token stream Tested on iPhone: 15.8s audio → 99 tokens in 0.48s
1 parent 8c9dd63 commit d77ea77

5 files changed

Lines changed: 2025 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)