Commit 1e49140
committed
fix(memory): heap-wrap remaining hot-path tensor allocs — stop direct-memory leak
Same root cause as 319c394 (sliceView): ctx.fromFloatArray copies the
input FloatArray into a fresh MemorySegment from Arena.ofAuto(). Direct
memory doesn't pressure the GC, so per-forward auto-arenas accumulate
until -XX:MaxDirectMemorySize is exhausted. Empirically: smoke test
went from a 45 GB direct-memory OOM mid-prefill to a 271 MB net
direct-memory growth across the full 27 min forward, with the resident
JVM staying inside the 32 GB cap.
Fixed sites (all on the per-token / per-layer path):
- RoPE.applyRoPESplitHalf: cos/sin tables (sliding layers, partial=1.0)
- RoPE.applyRoPESplitHalfFull: cos/sin tables (full layers, partial=0.25)
- MultiHeadAttention.buildSlidingCausalMask: mask tensor (every block
using the sliding path, every forward)
- GemmaModel softcap: scale + inv scalar tensors (every forward)
- PaddedSharedPositionalKVCache.padHeadDim: padded V (Gemma 4
value-head padding when src/target head_dim differ)
Each site now wraps the FloatArray as DenseFloatArrayTensorData and
goes through ctx.fromData, which keeps the storage on the heap and lets
the GC reclaim it normally.
Tool-call format regression on the smoke test prompt is tracked
separately; this commit only fixes the runnability OOM.1 parent 5741d7b commit 1e49140
4 files changed
Lines changed: 54 additions & 13 deletions
File tree
- llm-core/src/commonMain/kotlin/sk/ainet/lang/nn/transformer
- llm-inference/gemma/src/commonMain/kotlin/sk/ainet/models/gemma
Lines changed: 9 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
473 | 473 | | |
474 | 474 | | |
475 | 475 | | |
476 | | - | |
477 | | - | |
478 | | - | |
479 | | - | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
480 | 485 | | |
481 | 486 | | |
482 | 487 | | |
| |||
Lines changed: 9 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
322 | 322 | | |
323 | 323 | | |
324 | 324 | | |
325 | | - | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
326 | 334 | | |
327 | 335 | | |
328 | 336 | | |
| |||
Lines changed: 25 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
191 | 191 | | |
192 | 192 | | |
193 | 193 | | |
194 | | - | |
195 | | - | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
196 | 209 | | |
197 | 210 | | |
198 | 211 | | |
| |||
219 | 232 | | |
220 | 233 | | |
221 | 234 | | |
222 | | - | |
223 | | - | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
224 | 245 | | |
225 | 246 | | |
226 | 247 | | |
| |||
Lines changed: 11 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
128 | 128 | | |
129 | 129 | | |
130 | 130 | | |
131 | | - | |
132 | | - | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
133 | 139 | | |
134 | | - | |
135 | | - | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
136 | 143 | | |
137 | 144 | | |
138 | 145 | | |
| |||
0 commit comments