feat(kernel): add KernelProvider SPI for matmul dispatch (Scalar baseline) by michalharakal · Pull Request #554 · SKaiNET-developers/SKaiNET

michalharakal · 2026-04-28T11:25:41Z

Closes #553. First step on the M5 track (KernelProvider + accelerated kernels for matmul / SDPA / quantized) — this PR lands only the SPI plus the scalar baseline; Panama Vector and native FFM kernels follow in separate PRs once this lands.

Summary

sk.ainet.backend.api.kernel.Fp32MatmulKernel — C(m,n) = A(m,k)·B(k,n) row-major. Stride parameters let callers pass sub-blocks of larger arrays without copying. Implementations must not mutate inputs and must fully overwrite the m × n block of out.
KernelProvider — name / priority / isAvailable() plus per-kernel accessors (matmulFp32(): Fp32MatmulKernel?). Per-accessor null lets callers fall through to a lower-priority provider when the higher one doesn't ship that kernel.
KernelRegistry — manual register / find(name) / bestAvailable() / availableNames(). JVM ServiceLoader auto-discovery is deferred to a follow-up PR (only one provider ships today; the shape supports it without further interface changes).
ScalarMatmulKernel + ScalarKernelProvider in skainet-backend-cpu — triple-nested-loop reference, priority=0, always available. Acts as the correctness benchmark accelerated kernels must match, and as the runtime fallback.

Test plan

:skainet-backends:skainet-backend-cpu:jvmTest:

ScalarMatmulKernelTest — small/medium shapes; strided sub-blocks on both A and out; m=0; k=0; rejects negative dimensions
KernelRegistryTest — empty / scalar-only / priority-ordering / skip-unavailable / case-insensitive name lookup / re-register no-op

Plus pre-existing :skainet-lang:skainet-lang-core:jvmTest and :skainet-compile:skainet-compile-dag:jvmTest still green.

Out of scope (separate issues/PRs)

Panama Vector matmul — the actual JVM perf win. Builds on this SPI.
Native FFM matmul.
Wiring DefaultCpuOps.matmul to consult the registry — needs at least one accelerated provider to make the dispatch worth doing. Until then ScalarKernelProvider is reachable but unused by the existing op layer.
SDPA kernel API.
Quantized kernels (Q4_K, Q8).

🤖 Generated with Claude Code

…line) Closes #553. Introduces a small SPI between high-level tensor ops (`TensorOps.matmul` et al.) and the actual numeric kernels that do the FLOPs. This is the groundwork that lets a SIMD-accelerated matmul be plugged in without re-implementing the rest of an op-level backend, and lets a hand-written kernel be tested against a scalar reference. What lands: * `sk.ainet.backend.api.kernel.Fp32MatmulKernel` - `C(m, n) = A(m, k) · B(k, n)` row-major - element-stride parameters for caller sub-blocks (no copy needed) - implementations must not mutate inputs / must overwrite the m×n block of out * `sk.ainet.backend.api.kernel.KernelProvider` - `name` / `priority` / `isAvailable()` / per-kernel accessors - per-accessor `null` lets callers fall through to a lower-priority provider when the higher one doesn't ship the kernel * `sk.ainet.backend.api.kernel.KernelRegistry` - process-wide manual registration - `register()` / `find(name)` / `bestAvailable()` / `availableNames()` - `clearForTesting()` for tests - JVM ServiceLoader auto-discovery deferred to a follow-up PR (only one provider ships today; the registry shape supports it without further interface changes) * `sk.ainet.exec.kernel.ScalarMatmulKernel` + `ScalarKernelProvider` (in `skainet-backend-cpu`) - triple-nested-loop reference; honours stride parameters - priority = 0; always available - guaranteed correctness reference and runtime fallback * Tests: - `ScalarMatmulKernelTest`: small / medium / strided sub-blocks on both A and out / zero-m / zero-k / rejects negatives - `KernelRegistryTest`: empty / scalar-only / priority ordering / skip-unavailable / case-insensitive name lookup / re-register no-op Out of scope (separate issues / PRs): * Panama Vector matmul (the actual perf win on JVM). * Native FFM matmul. * Wiring `DefaultCpuOps.matmul` to consult the registry — needs at least one accelerated provider to make the dispatch worth doing. * SDPA kernel API. * Quantized kernels (Q4_K, Q8). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

michalharakal merged commit 9786960 into develop Apr 28, 2026
6 checks passed

michalharakal deleted the feature/ISSUE-553-kernel-provider-spi branch April 28, 2026 11:26

This was referenced Apr 28, 2026

feat(kernel): Panama Vector FP32 matmul provider (M5) #557

Merged

feat(kernel): SIMD-fused Q4_K matmul kernel + Q4KMatmulKernel SPI #562

Merged

chore(release): prepare 0.21.0 #566

Merged

Kernel reference docs (KSP-generated, parallel to ops) #568

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kernel): add KernelProvider SPI for matmul dispatch (Scalar baseline)#554

feat(kernel): add KernelProvider SPI for matmul dispatch (Scalar baseline)#554
michalharakal merged 1 commit intodevelopfrom
feature/ISSUE-553-kernel-provider-spi

michalharakal commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented Apr 28, 2026

Summary

Test plan

Out of scope (separate issues/PRs)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant