Skip to content

feat(kernel): add KernelProvider SPI for matmul dispatch (Scalar baseline)#554

Merged
michalharakal merged 1 commit intodevelopfrom
feature/ISSUE-553-kernel-provider-spi
Apr 28, 2026
Merged

feat(kernel): add KernelProvider SPI for matmul dispatch (Scalar baseline)#554
michalharakal merged 1 commit intodevelopfrom
feature/ISSUE-553-kernel-provider-spi

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Closes #553. First step on the M5 track (KernelProvider + accelerated kernels for matmul / SDPA / quantized) — this PR lands only the SPI plus the scalar baseline; Panama Vector and native FFM kernels follow in separate PRs once this lands.

Summary

  • sk.ainet.backend.api.kernel.Fp32MatmulKernelC(m,n) = A(m,k)·B(k,n) row-major. Stride parameters let callers pass sub-blocks of larger arrays without copying. Implementations must not mutate inputs and must fully overwrite the m × n block of out.
  • KernelProvidername / priority / isAvailable() plus per-kernel accessors (matmulFp32(): Fp32MatmulKernel?). Per-accessor null lets callers fall through to a lower-priority provider when the higher one doesn't ship that kernel.
  • KernelRegistry — manual register / find(name) / bestAvailable() / availableNames(). JVM ServiceLoader auto-discovery is deferred to a follow-up PR (only one provider ships today; the shape supports it without further interface changes).
  • ScalarMatmulKernel + ScalarKernelProvider in skainet-backend-cpu — triple-nested-loop reference, priority=0, always available. Acts as the correctness benchmark accelerated kernels must match, and as the runtime fallback.

Test plan

:skainet-backends:skainet-backend-cpu:jvmTest:

  • ScalarMatmulKernelTest — small/medium shapes; strided sub-blocks on both A and out; m=0; k=0; rejects negative dimensions
  • KernelRegistryTest — empty / scalar-only / priority-ordering / skip-unavailable / case-insensitive name lookup / re-register no-op

Plus pre-existing :skainet-lang:skainet-lang-core:jvmTest and :skainet-compile:skainet-compile-dag:jvmTest still green.

Out of scope (separate issues/PRs)

  • Panama Vector matmul — the actual JVM perf win. Builds on this SPI.
  • Native FFM matmul.
  • Wiring DefaultCpuOps.matmul to consult the registry — needs at least one accelerated provider to make the dispatch worth doing. Until then ScalarKernelProvider is reachable but unused by the existing op layer.
  • SDPA kernel API.
  • Quantized kernels (Q4_K, Q8).

🤖 Generated with Claude Code

…line)

Closes #553.

Introduces a small SPI between high-level tensor ops (`TensorOps.matmul`
et al.) and the actual numeric kernels that do the FLOPs. This is the
groundwork that lets a SIMD-accelerated matmul be plugged in without
re-implementing the rest of an op-level backend, and lets a hand-written
kernel be tested against a scalar reference.

What lands:

* `sk.ainet.backend.api.kernel.Fp32MatmulKernel`
  - `C(m, n) = A(m, k) · B(k, n)` row-major
  - element-stride parameters for caller sub-blocks (no copy needed)
  - implementations must not mutate inputs / must overwrite the m×n
    block of out

* `sk.ainet.backend.api.kernel.KernelProvider`
  - `name` / `priority` / `isAvailable()` / per-kernel accessors
  - per-accessor `null` lets callers fall through to a lower-priority
    provider when the higher one doesn't ship the kernel

* `sk.ainet.backend.api.kernel.KernelRegistry`
  - process-wide manual registration
  - `register()` / `find(name)` / `bestAvailable()` / `availableNames()`
  - `clearForTesting()` for tests
  - JVM ServiceLoader auto-discovery deferred to a follow-up PR (only
    one provider ships today; the registry shape supports it without
    further interface changes)

* `sk.ainet.exec.kernel.ScalarMatmulKernel` + `ScalarKernelProvider`
  (in `skainet-backend-cpu`)
  - triple-nested-loop reference; honours stride parameters
  - priority = 0; always available
  - guaranteed correctness reference and runtime fallback

* Tests:
  - `ScalarMatmulKernelTest`: small / medium / strided sub-blocks
    on both A and out / zero-m / zero-k / rejects negatives
  - `KernelRegistryTest`: empty / scalar-only / priority ordering /
    skip-unavailable / case-insensitive name lookup / re-register no-op

Out of scope (separate issues / PRs):

* Panama Vector matmul (the actual perf win on JVM).
* Native FFM matmul.
* Wiring `DefaultCpuOps.matmul` to consult the registry — needs at
  least one accelerated provider to make the dispatch worth doing.
* SDPA kernel API.
* Quantized kernels (Q4_K, Q8).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal merged commit 9786960 into develop Apr 28, 2026
6 checks passed
@michalharakal michalharakal deleted the feature/ISSUE-553-kernel-provider-spi branch April 28, 2026 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add KernelProvider SPI for matmul/SDPA dispatch (Scalar baseline)

1 participant