Commit c794695
committed
perf(gpt2): transpose weights at load time for SIMD-contiguous matmul
Weight matrices pre-transposed from [in_dim, out_dim] to [out_dim, in_dim]
during safetensors loading. matmul_vec_simd now reads contiguous rows via
F32x16::from_slice + mul_add — full SIMD utilization (768D = 48 × F32x16).
https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o71 parent 929b143 commit c794695
2 files changed
Lines changed: 46 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
364 | 364 | | |
365 | 365 | | |
366 | 366 | | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
367 | 370 | | |
368 | | - | |
369 | | - | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
370 | 374 | | |
371 | | - | |
372 | | - | |
373 | | - | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
374 | 388 | | |
375 | 389 | | |
376 | 390 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
114 | | - | |
| 114 | + | |
115 | 115 | | |
116 | | - | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
117 | 141 | | |
| 142 | + | |
118 | 143 | | |
119 | 144 | | |
120 | 145 | | |
| |||
0 commit comments