Commit 4acec7f
feat(gemma): optional maxInferenceLen on GemmaNetworkLoader.load() (#178)
The eager network sizes its KV cache + RoPE tables for maxInferenceLen
(= min(contextLength, 4096) by default). On the 1.9 GB SL2610 that ~0.4 GB
KV cache (allocated at the first forward) OOMs the board even after the
packed Q8_0 lm_head dropped the weight footprint to ~1.06 GB resident.
Thread an optional `maxInferenceLen: Int? = null` through
load() -> applyWeightsToNetwork -> applyWeightsToNetworkNonReified ->
gemmaNetwork so a constrained-device consumer can cap the context (e.g. 32
for a short tool-call prompt), shrinking the KV cache ~100x. Default null
preserves the existing min(contextLength, 4096) behaviour.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>1 parent f94ce6c commit 4acec7f
1 file changed
Lines changed: 9 additions & 5 deletions
Lines changed: 9 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
123 | | - | |
| 123 | + | |
| 124 | + | |
124 | 125 | | |
125 | 126 | | |
126 | 127 | | |
| |||
160 | 161 | | |
161 | 162 | | |
162 | 163 | | |
163 | | - | |
| 164 | + | |
164 | 165 | | |
165 | 166 | | |
166 | 167 | | |
167 | 168 | | |
168 | 169 | | |
169 | | - | |
170 | | - | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
171 | 173 | | |
172 | 174 | | |
173 | 175 | | |
| |||
177 | 179 | | |
178 | 180 | | |
179 | 181 | | |
180 | | - | |
| 182 | + | |
| 183 | + | |
181 | 184 | | |
182 | 185 | | |
183 | 186 | | |
| |||
197 | 200 | | |
198 | 201 | | |
199 | 202 | | |
| 203 | + | |
200 | 204 | | |
201 | 205 | | |
202 | 206 | | |
| |||
0 commit comments