Commit ee22828
perf(kernel): cache-block PanamaVectorMatmulKernel (8x8x128 tiles)
Ports the (m, n, k)-tile blocking pattern from
JvmVectorKernels.matmulFloatBlocked into the SPI kernel: 8x8 output
tiles, 128-wide K-stripes. Output is zeroed once up front and the
K-tile loop accumulates via `+=`, which keeps the contract "fully
overwrite the m x n block" intact and avoids the gnarly "init only on
first tile" gating in the original blocked kernel.
Closes the perf gap that #558 flagged between the SPI kernel and the
existing production blocked path. After this change the SPI kernel
matches or beats the production path within JMH noise — routing
DefaultCpuOpsJvm.matmul through KernelRegistry won't show a regression
any more.
KernelMatmulBench (JDK 21.0.10, M-series macOS):
size scalar panama speedup prior panama (simple)
256 9.77ms 1.13ms 8.61x 1.36ms (-16%)
512 81.55ms 9.47ms 8.62x 13.62ms (-30%)
1024 865.54ms 79.88ms 10.83x 118.24ms (-32%)
vs production MatmulBench (vector=true, blas=false) same run:
size SPI tiled production blocked delta
256 1.13ms 1.24ms SPI 8.5% faster
512 9.47ms 10.38ms SPI 8.8% faster
1024 79.88ms 78.32ms SPI 2% slower (within noise)
Existing parity tests (PanamaVectorMatmulKernelTest, including the
31x17x23 randomized case that exercises partial tiles in all three
dims) pass unchanged within the 1e-5*k tolerance.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent b3b0c05 commit ee22828
1 file changed
Lines changed: 59 additions & 33 deletions
File tree
- skainet-backends/skainet-backend-cpu/src/jvmMain/kotlin/sk/ainet/exec/kernel
Lines changed: 59 additions & 33 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
21 | 26 | | |
22 | | - | |
23 | | - | |
| 27 | + | |
| 28 | + | |
24 | 29 | | |
25 | 30 | | |
26 | 31 | | |
| |||
31 | 36 | | |
32 | 37 | | |
33 | 38 | | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
34 | 43 | | |
35 | 44 | | |
36 | 45 | | |
| |||
41 | 50 | | |
42 | 51 | | |
43 | 52 | | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
50 | 59 | | |
| 60 | + | |
51 | 61 | | |
52 | 62 | | |
53 | 63 | | |
| |||
59 | 69 | | |
60 | 70 | | |
61 | 71 | | |
62 | | - | |
63 | 72 | | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
81 | 106 | | |
82 | | - | |
| 107 | + | |
83 | 108 | | |
| 109 | + | |
84 | 110 | | |
85 | 111 | | |
86 | 112 | | |
0 commit comments