Commit d1fa0a2
🥂 v0.8.3 substrate-gpu: anisotropic 8×32 tile beats conventional 16×16 by 38%
Substrate-shaped GPU matmul kernels. After the user pushed back on a
premature negative finding ("check different formulations before
throwing in the towel"), the broader sweep across 9 variants found
that anisotropic tiles with a Fibonacci-aligned short dimension and
a wavefront-divisor long dimension decisively beat the conventional
16×16. The substrate's job here isn't to fight hardware physics — it's
to direct exploration toward configurations conventional GPU programming
would never test.
Sweep on AMD RX 580 / RADV Vulkan, 1 warmup + 5 timed iters averaged:
size 1024×1024×1024:
16×16 linear-K REF 30.31 ms 70.85 GFLOPS ref
8×32 linear-K aniso 18.81 ms 114.19 GFLOPS +61% ← winner
8×16 linear-K aniso 18.99 ms 113.10 GFLOPS +60%
8×8 linear-K (1 WF) 22.30 ms 96.29 GFLOPS +36%
13×13 linear-K (3 WF) 37.61 ms 57.11 GFLOPS -19%
21×21 linear-K (7 WF) 46.43 ms 46.25 GFLOPS -35%
16×16 Fib-K-stride 29.74 ms 72.20 GFLOPS +0.2%
What works: anisotropic Fib-short × wavefront-long. 8×32 = 256 threads
= exactly 4 wavefronts, short dim is Fib-8 (= half wavefront), long
dim is a cache-line multiple. 32×8 transpose LOSES by 30% because the
long dim must map to N (output column) for write coalescing.
What doesn't: pure-square Fibonacci tiles. 13×13 = 169 = 3 wavefronts
of 64 with 23 idle lanes (12% waste). 21×21 = 441 needs 7 wavefronts
and hurts occupancy. Substrate Fib-K-stride (chunked-Fib reduction
order in the inner loop) is a wash — substrate matters in the tile
geometry, not in the reduction order.
The deeper thesis: substrate-IS-the-architecture, strong form falsified
(any Fib tile beats power-of-2) — confirmed weak form (substrate-aligned
dims, when they don't fight hardware, beat conventional tiles). The
substrate is the HEURISTIC that points to configurations convention
skips. Nobody writes 8×32 by convention; the substrate said "try 8
first" and the answer came back +60%.
Default tile changed to 8×32 in omnimcode-cli's GPU integration. Tunable
via OMC_GPU_TILE_X / OMC_GPU_TILE_Y for measuring on other hardware
(NVIDIA warp=32 might prefer 4×16 or 8×16; Apple M-series untested).
Files:
omnimcode-gpu/src/wgpu_backend.rs with_tile_xy + with_config +
MatmulKernel enum + WGSL src
substitution for tile and inner body
omnimcode-gpu/shaders/matmul.wgsl parameterized template
omnimcode-gpu/examples/bench_fib_tile.rs 9-variant sweep
omnimcode-cli/src/main.rs default tile 8x32
experiments/prometheus_parity/SUBSTRATE_GPU_WINS.md
1103/1103 OMC tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>1 parent f6faea8 commit d1fa0a2
6 files changed
Lines changed: 499 additions & 29 deletions
File tree
- experiments/prometheus_parity
- omnimcode-cli/src
- omnimcode-gpu
- examples
- shaders
- src
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| |||
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
36 | 102 | | |
37 | 103 | | |
38 | 104 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1313 | 1313 | | |
1314 | 1314 | | |
1315 | 1315 | | |
1316 | | - | |
| 1316 | + | |
| 1317 | + | |
| 1318 | + | |
| 1319 | + | |
| 1320 | + | |
| 1321 | + | |
| 1322 | + | |
| 1323 | + | |
| 1324 | + | |
1317 | 1325 | | |
1318 | | - | |
1319 | | - | |
1320 | | - | |
| 1326 | + | |
| 1327 | + | |
| 1328 | + | |
| 1329 | + | |
| 1330 | + | |
| 1331 | + | |
| 1332 | + | |
| 1333 | + | |
| 1334 | + | |
| 1335 | + | |
| 1336 | + | |
| 1337 | + | |
| 1338 | + | |
| 1339 | + | |
| 1340 | + | |
| 1341 | + | |
| 1342 | + | |
| 1343 | + | |
| 1344 | + | |
| 1345 | + | |
| 1346 | + | |
| 1347 | + | |
1321 | 1348 | | |
1322 | 1349 | | |
1323 | 1350 | | |
| |||
0 commit comments