Skip to content

feat(matmul): route DefaultCpuOpsJvm FP32 matmul through KernelRegistry#561

Merged
michalharakal merged 1 commit intodevelopfrom
feature/jvm-matmul-route-spi
Apr 28, 2026
Merged

feat(matmul): route DefaultCpuOpsJvm FP32 matmul through KernelRegistry#561
michalharakal merged 1 commit intodevelopfrom
feature/jvm-matmul-route-spi

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Summary

Replaces the direct JvmVectorKernels.matmulFloat / matmulFloatBlocked calls in DefaultCpuOpsJvm.chooseMatmul with a single fp32MatmulKernel.matmul(...) call. The kernel is resolved via KernelRegistry.bestAvailable(), lazily auto-installing ServiceLoader-discovered providers on first use.

After PR #560 (cache-blocked Panama) the SPI's PanamaVectorMatmulKernel is on par with the existing matmulFloatBlocked path, so this is purely a plumbing change with no production regression.

Numbers (JDK 21.0.10, M-series macOS)

MatmulBench (production routing, vectorEnabled=true):

size post-routing (this PR) pre-routing (develop @ 5f7f515) delta
256 1.221 ms (blas=false) / 1.120 ms (blas=true) 1.240 / 1.183 ms within noise
512 10.361 / 8.662 ms 10.384 / 9.736 ms within noise
1024 78.058 / 69.620 ms 78.322 / 77.976 ms within noise

Production routing now lands in the same ballpark as direct KernelMatmulBench panama (1.101 / 8.677 / 72.492 ms at 256/512/1024) — confirming the routing path is taking the SPI kernel as expected.

Test plan

  • ./gradlew :skainet-backends:skainet-backend-cpu:jvmTest — 213/213 tests pass.
  • ./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmh — full bench run, numbers above.

Closes

Closes the production-routing piece of M5; remaining M5 work is the quantized kernel SPI extension (Q4_K SIMD — separate PR off develop).

🤖 Generated with Claude Code

Replaces the direct JvmVectorKernels.matmulFloat / matmulFloatBlocked
calls in DefaultCpuOpsJvm.chooseMatmul with a single
fp32MatmulKernel.matmul(...) call, where fp32MatmulKernel is
resolved via KernelRegistry.bestAvailable() and lazily falls back to
KernelServiceLoader.installAll() on first use.

After PR #560 the SPI's PanamaVectorMatmulKernel is on par with the
existing matmulFloatBlocked path within JMH noise (8.5% faster at
256/512, 2% slower within noise at 1024) — so this is purely a
plumbing change with no expected production regression. JMH numbers
will be appended to the PR body once the bench run completes.

Tests: ./gradlew :skainet-backends:skainet-backend-cpu:jvmTest
passes 213/213 with the routed kernel.

Closes the production-routing piece of M5; remaining M5 work is the
quantized kernel SPI extension (Q4_K SIMD).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal marked this pull request as ready for review April 28, 2026 20:46
@michalharakal michalharakal merged commit db00c95 into develop Apr 28, 2026
6 checks passed
@michalharakal michalharakal deleted the feature/jvm-matmul-route-spi branch April 28, 2026 20:46
@michalharakal michalharakal mentioned this pull request Apr 28, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant