feat(matmul): route DefaultCpuOpsJvm FP32 matmul through KernelRegistry by michalharakal · Pull Request #561 · SKaiNET-developers/SKaiNET

michalharakal · 2026-04-28T19:42:13Z

Summary

Replaces the direct JvmVectorKernels.matmulFloat / matmulFloatBlocked calls in DefaultCpuOpsJvm.chooseMatmul with a single fp32MatmulKernel.matmul(...) call. The kernel is resolved via KernelRegistry.bestAvailable(), lazily auto-installing ServiceLoader-discovered providers on first use.

After PR #560 (cache-blocked Panama) the SPI's PanamaVectorMatmulKernel is on par with the existing matmulFloatBlocked path, so this is purely a plumbing change with no production regression.

Numbers (JDK 21.0.10, M-series macOS)

MatmulBench (production routing, vectorEnabled=true):

size	post-routing (this PR)	pre-routing (develop @ `5f7f515`)	delta
256	1.221 ms (blas=false) / 1.120 ms (blas=true)	1.240 / 1.183 ms	within noise
512	10.361 / 8.662 ms	10.384 / 9.736 ms	within noise
1024	78.058 / 69.620 ms	78.322 / 77.976 ms	within noise

Production routing now lands in the same ballpark as direct KernelMatmulBench panama (1.101 / 8.677 / 72.492 ms at 256/512/1024) — confirming the routing path is taking the SPI kernel as expected.

Test plan

./gradlew :skainet-backends:skainet-backend-cpu:jvmTest — 213/213 tests pass.
./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmh — full bench run, numbers above.

Closes

Closes the production-routing piece of M5; remaining M5 work is the quantized kernel SPI extension (Q4_K SIMD — separate PR off develop).

🤖 Generated with Claude Code

Replaces the direct JvmVectorKernels.matmulFloat / matmulFloatBlocked calls in DefaultCpuOpsJvm.chooseMatmul with a single fp32MatmulKernel.matmul(...) call, where fp32MatmulKernel is resolved via KernelRegistry.bestAvailable() and lazily falls back to KernelServiceLoader.installAll() on first use. After PR #560 the SPI's PanamaVectorMatmulKernel is on par with the existing matmulFloatBlocked path within JMH noise (8.5% faster at 256/512, 2% slower within noise at 1024) — so this is purely a plumbing change with no expected production regression. JMH numbers will be appended to the PR body once the bench run completes. Tests: ./gradlew :skainet-backends:skainet-backend-cpu:jvmTest passes 213/213 with the routed kernel. Closes the production-routing piece of M5; remaining M5 work is the quantized kernel SPI extension (Q4_K SIMD). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

michalharakal marked this pull request as ready for review April 28, 2026 20:46

michalharakal merged commit db00c95 into develop Apr 28, 2026
6 checks passed

michalharakal deleted the feature/jvm-matmul-route-spi branch April 28, 2026 20:46

michalharakal mentioned this pull request Apr 28, 2026

chore(release): prepare 0.21.0 #566

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(matmul): route DefaultCpuOpsJvm FP32 matmul through KernelRegistry#561

feat(matmul): route DefaultCpuOpsJvm FP32 matmul through KernelRegistry#561
michalharakal merged 1 commit intodevelopfrom
feature/jvm-matmul-route-spi

michalharakal commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented Apr 28, 2026

Summary

Numbers (JDK 21.0.10, M-series macOS)

Test plan

Closes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant