feat(kernel): Panama Vector FP32 matmul provider (M5) by michalharakal · Pull Request #557 · SKaiNET-developers/SKaiNET

michalharakal · 2026-04-28T12:22:03Z

Summary

Adds PanamaVectorMatmulKernel (jdk.incubator.vector — FloatVector + fma + reduceLanes) implementing the Fp32MatmulKernel SPI from feat(kernel): add KernelProvider SPI for matmul dispatch (Scalar baseline) #554.
Adds PanamaVectorKernelProvider (name = \"panama-vector\", priority = 50). Sits above ScalarKernelProvider (0) and below a future native provider (100).
isAvailable() requires JDK 21+, the jdk.incubator.vector module on the path, and respects the existing skainet.cpu.vector.enabled kill switch (-D or SKAINET_CPU_VECTOR_ENABLED).
Closes the "Panama-first" half of milestone M5 — CPU backend dispatch in the JVM inference perf roadmap.

Why this shape

The kernel packs B^T into a contiguous (n, k) buffer so the inner reduction streams sequentially over k for both operands. One pack + one FMA accumulator per output cell, scalar tail for the lanes that don't fill a vector.
Bit-for-bit numerical equivalence with ScalarMatmulKernel is not guaranteed (FMA + reordered accumulation), but parity within 1e-5 * k tolerance is asserted across contiguous, strided sub-blocks, non-aligned k (tail loop), and randomized larger sizes. This matches the per-milestone golden-output regression bar in the roadmap.

Out of scope (follow-ups)

Wire DefaultCpuOpsJvm.matmul through the kernel SPI. Today it still calls JvmVectorKernels.matmulFloat / matmulFloatBlocked directly; until that routing change lands, the existing MatmulBench won't exercise this provider end-to-end.
JMH evidence for the M5 ≥1.5× target. Best added together with the routing change so the bench numbers reflect the SPI path. A kernel-level microbench in :skainet-backends:benchmarks:jvm-cpu-jmh is the natural home.
ServiceLoader auto-discovery in KernelRegistry. The SPI doc explicitly defers this until a second concrete JVM provider exists — that condition is now met, so this is a clean small follow-up PR.
Cache-blocked variant of the kernel (8×8×128 tiling, like JvmVectorKernels.matmulFloatBlocked). Useful if the simple FMA path doesn't clear the ≥4× target on 512² that docs/.../perf/jvm-cpu.adoc mentions.

Test plan

./gradlew :skainet-backends:skainet-backend-cpu:jvmTest --tests \"sk.ainet.exec.kernel.*\" — 13 new tests pass (8 kernel parity + 5 provider/registry); existing KernelRegistryTest and ScalarMatmulKernelTest still pass.
Parity vs ScalarMatmulKernel for: 2×3×4 contiguous, 8×16×32 random, 31×17×23 random, non-aligned k=23 (tail loop), strided A sub-block.
Boundary semantics: m=0/n=0 is no-op, k=0 zeros the output block, negative dims throw IllegalArgumentException.
Provider: name/priority assertions, isAvailable() on test JDK, registry picks Panama over Scalar when both registered, kill-switch via -Dskainet.cpu.vector.enabled=false disables it.

🤖 Generated with Claude Code

Implements `PanamaVectorMatmulKernel` (jdk.incubator.vector, FloatVector + fma + reduceLanes) and `PanamaVectorKernelProvider` against the kernel SPI from PR #554. Picks up automatically over `ScalarKernelProvider` once registered, and respects the existing `-Dskainet.cpu.vector.enabled=false` kill switch. Closes the M5 "Panama-first" half of the JVM perf milestone plan. Routing `DefaultCpuOpsJvm.matmul` through the SPI and adding a ServiceLoader-based auto-registration are deferred to follow-ups so this PR stays focused on the kernel itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

michalharakal marked this pull request as ready for review April 28, 2026 12:22

michalharakal merged commit a5e5f93 into develop Apr 28, 2026
6 checks passed

michalharakal deleted the feature/jvm-panama-fp32-matmul-kernel branch April 28, 2026 12:23

This was referenced Apr 28, 2026

feat(kernel): JVM ServiceLoader auto-discovery for KernelProvider #559

Merged

chore(release): prepare 0.21.0 #566

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kernel): Panama Vector FP32 matmul provider (M5)#557

feat(kernel): Panama Vector FP32 matmul provider (M5)#557
michalharakal merged 1 commit intodevelopfrom
feature/jvm-panama-fp32-matmul-kernel

michalharakal commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented Apr 28, 2026

Summary

Why this shape

Out of scope (follow-ups)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant