Context
0.21.0 shipped a kernel SPI — KernelProvider, Fp32MatmulKernel, Q4KMatmulKernel, KernelRegistry (PRs #554, #559, #562 and the rest of the M5 chain). Three concrete providers are registered today (ScalarKernelProvider priority 0, PanamaVectorKernelProvider priority 50, with a NativeKernelProvider priority 100 captured as a plan). A fourth wave of providers is plausible in 2026 — native FFM, GPU, NPU.
Documentation about these kernels is currently hand-written prose in three new pages under docs/modules/ROOT/pages/explanation/perf/ (simd-kernels.adoc, quantized-simd-kernels.adoc, native-ffm-plan.adoc). That works for explanation-quadrant content (math, algorithms, why), but it doesn't scale for reference content — the matrix of "which provider implements which kernel for which target with what priority and what availability check".
Compare: how operators handle this
For operators we already have the right answer. skainet-lang-ksp-processor scans TensorOps interfaces with annotations like @NotImplemented(backends) and @InProgress(backends, owner), and emits AsciiDoc fragments into docs/modules/ROOT/pages/reference/operators/generated/ plus partials/ops/.... The Antora build (./gradlew generateDocs) regenerates them every CI run, so the operator coverage matrix at reference/ops-status-matrix.adoc is always in sync with code. See explanation/operator-design.adoc for the full story.
Proposal
Mirror the operator pipeline for kernels:
-
Annotations in skainet-backend-api:
@Target(AnnotationTarget.CLASS)
annotation class KernelImplementation(
val name: String,
val priority: Int,
val targets: Array<String>, // e.g. ["jvm", "jvm-aarch64", "jvm-x86_64-avx2"]
val availability: String = "", // human-readable predicate
)
@Target(AnnotationTarget.FUNCTION)
annotation class KernelOp(
val kernel: String, // "matmulFp32", "matmulQ4K", ...
val complexity: String = "",
val throughput: String = "", // e.g. "~73 GFLOPS @ 4096² Apple M-series"
)
Apply to ScalarKernelProvider, PanamaVectorKernelProvider, and (eventually) NativeKernelProvider.
-
KSP processor extension in skainet-lang-ksp-processor that scans annotated providers and emits:
docs/modules/ROOT/pages/reference/kernels/generated/<provider-name>.adoc — one page per provider with the kernels it implements, priority, availability predicate, target architectures.
docs/modules/ROOT/partials/kernels/<kernel-name>.adoc — one partial per *MatmulKernel interface, listing all providers that implement it (sorted by priority).
docs/modules/ROOT/pages/reference/kernels-status-matrix.adoc — coverage matrix (rows = kernels, columns = providers/targets, cells = priority + throughput note + link to source).
-
Cross-link from the new simd-kernels.adoc / quantized-simd-kernels.adoc explanation pages to the generated reference pages, and from the generated pages back to the explanation narrative. Same Diátaxis pattern as ops have today.
-
Bench numbers could be wired in too if we annotate the JMH classes with their corresponding kernel — KSP would emit a "latest measured throughput" footer fed by :skainet-backends:benchmarks:jvm-cpu-jmh:jmhResults or a checked-in JSON snapshot. Out of scope for the first PR, but worth designing the annotation shape with this in mind.
Why this matters now
- Drift prevention. When the future native FFM provider lands (see explanation/perf/native-ffm-plan.adoc), its annotations would auto-populate the reference pages and the coverage matrix the same day the code merges. No "forgot to update the docs" PR.
- GPU / NPU readiness. The same annotation surface works for backends we don't have yet. We're locking in the reference-doc pattern before the explosion of providers, not retrofitting later under churn.
- Mirror of ops. Operators and kernels are the two extension points readers most need a coverage matrix for. Right now ops have one, kernels don't — that asymmetry will get worse, not better.
Scope of the first PR
- Annotations + KSP processor extension (matches the operator pipeline's shape —
OperatorMetadataProcessor-equivalent for kernels).
KernelImplementation annotation applied to ScalarKernelProvider, PanamaVectorKernelProvider.
- Generated
reference/kernels/generated/scalar-kernel-provider.adoc, .../panama-vector-kernel-provider.adoc, kernels-status-matrix.adoc.
- Cross-link block at the top of
simd-kernels.adoc / quantized-simd-kernels.adoc pointing to the matrix.
Out of scope (separate issues if pursued)
- Bench-throughput injection from JMH JSON.
- Native FFM provider annotations — wait until the provider exists (see explanation/perf/native-ffm-plan.adoc trigger conditions).
- Kotlin/Native and JS/Wasm provider annotations — only relevant if we ever ship per-target kernels there.
Decision needed
The narrower question for this issue: do we want to commit to a first-class kernel reference docs system, modeled on the operators one, before the next wave of providers arrives?
- Yes → schedule the KSP work for 0.22.x. Estimated 2–3 PRs (annotations + processor + initial pages, then status matrix, then cross-linking polish).
- No → close as won't-fix; keep kernel docs purely hand-written under
explanation/perf/. Acceptable as long as the kernel surface stays small (≤4 providers, ≤2 sibling SPIs).
- Defer → revisit when the native FFM provider PR lands; that's the natural moment to either invest in generation or accept hand-written drift.
Context
0.21.0 shipped a kernel SPI —
KernelProvider,Fp32MatmulKernel,Q4KMatmulKernel,KernelRegistry(PRs #554, #559, #562 and the rest of the M5 chain). Three concrete providers are registered today (ScalarKernelProviderpriority 0,PanamaVectorKernelProviderpriority 50, with aNativeKernelProviderpriority 100 captured as a plan). A fourth wave of providers is plausible in 2026 — native FFM, GPU, NPU.Documentation about these kernels is currently hand-written prose in three new pages under
docs/modules/ROOT/pages/explanation/perf/(simd-kernels.adoc,quantized-simd-kernels.adoc,native-ffm-plan.adoc). That works for explanation-quadrant content (math, algorithms, why), but it doesn't scale for reference content — the matrix of "which provider implements which kernel for which target with what priority and what availability check".Compare: how operators handle this
For operators we already have the right answer.
skainet-lang-ksp-processorscansTensorOpsinterfaces with annotations like@NotImplemented(backends)and@InProgress(backends, owner), and emits AsciiDoc fragments intodocs/modules/ROOT/pages/reference/operators/generated/pluspartials/ops/.... The Antora build (./gradlew generateDocs) regenerates them every CI run, so the operator coverage matrix atreference/ops-status-matrix.adocis always in sync with code. Seeexplanation/operator-design.adocfor the full story.Proposal
Mirror the operator pipeline for kernels:
Annotations in
skainet-backend-api:Apply to
ScalarKernelProvider,PanamaVectorKernelProvider, and (eventually)NativeKernelProvider.KSP processor extension in
skainet-lang-ksp-processorthat scans annotated providers and emits:docs/modules/ROOT/pages/reference/kernels/generated/<provider-name>.adoc— one page per provider with the kernels it implements, priority, availability predicate, target architectures.docs/modules/ROOT/partials/kernels/<kernel-name>.adoc— one partial per*MatmulKernelinterface, listing all providers that implement it (sorted by priority).docs/modules/ROOT/pages/reference/kernels-status-matrix.adoc— coverage matrix (rows = kernels, columns = providers/targets, cells = priority + throughput note + link to source).Cross-link from the new
simd-kernels.adoc/quantized-simd-kernels.adocexplanation pages to the generated reference pages, and from the generated pages back to the explanation narrative. Same Diátaxis pattern as ops have today.Bench numbers could be wired in too if we annotate the JMH classes with their corresponding kernel — KSP would emit a "latest measured throughput" footer fed by
:skainet-backends:benchmarks:jvm-cpu-jmh:jmhResultsor a checked-in JSON snapshot. Out of scope for the first PR, but worth designing the annotation shape with this in mind.Why this matters now
Scope of the first PR
OperatorMetadataProcessor-equivalent for kernels).KernelImplementationannotation applied toScalarKernelProvider,PanamaVectorKernelProvider.reference/kernels/generated/scalar-kernel-provider.adoc,.../panama-vector-kernel-provider.adoc,kernels-status-matrix.adoc.simd-kernels.adoc/quantized-simd-kernels.adocpointing to the matrix.Out of scope (separate issues if pursued)
Decision needed
The narrower question for this issue: do we want to commit to a first-class kernel reference docs system, modeled on the operators one, before the next wave of providers arrives?
explanation/perf/. Acceptable as long as the kernel surface stays small (≤4 providers, ≤2 sibling SPIs).