Kernel reference docs (KSP-generated, parallel to ops)

## Context

0.21.0 shipped a kernel SPI — `KernelProvider`, `Fp32MatmulKernel`, `Q4KMatmulKernel`, `KernelRegistry` (PRs #554, #559, #562 and the rest of the M5 chain). Three concrete providers are registered today (`ScalarKernelProvider` priority 0, `PanamaVectorKernelProvider` priority 50, with a `NativeKernelProvider` priority 100 captured as a plan). A fourth wave of providers is plausible in 2026 — native FFM, GPU, NPU.

**Documentation about these kernels is currently hand-written prose in three new pages** under `docs/modules/ROOT/pages/explanation/perf/` (`simd-kernels.adoc`, `quantized-simd-kernels.adoc`, `native-ffm-plan.adoc`). That works for explanation-quadrant content (math, algorithms, why), but it doesn't scale for *reference* content — the matrix of "which provider implements which kernel for which target with what priority and what availability check".

## Compare: how operators handle this

For operators we already have the right answer. `skainet-lang-ksp-processor` scans `TensorOps` interfaces with annotations like `@NotImplemented(backends)` and `@InProgress(backends, owner)`, and emits AsciiDoc fragments into `docs/modules/ROOT/pages/reference/operators/generated/` plus `partials/ops/...`. The Antora build (`./gradlew generateDocs`) regenerates them every CI run, so the operator coverage matrix at `reference/ops-status-matrix.adoc` is always in sync with code. See `explanation/operator-design.adoc` for the full story.

## Proposal

Mirror the operator pipeline for kernels:

1. **Annotations** in `skainet-backend-api`:
   ```kotlin
   @Target(AnnotationTarget.CLASS)
   annotation class KernelImplementation(
       val name: String,
       val priority: Int,
       val targets: Array<String>,         // e.g. ["jvm", "jvm-aarch64", "jvm-x86_64-avx2"]
       val availability: String = "",     // human-readable predicate
   )

   @Target(AnnotationTarget.FUNCTION)
   annotation class KernelOp(
       val kernel: String,                 // "matmulFp32", "matmulQ4K", ...
       val complexity: String = "",
       val throughput: String = "",        // e.g. "~73 GFLOPS @ 4096² Apple M-series"
   )
   ```
   Apply to `ScalarKernelProvider`, `PanamaVectorKernelProvider`, and (eventually) `NativeKernelProvider`.

2. **KSP processor extension** in `skainet-lang-ksp-processor` that scans annotated providers and emits:
   - `docs/modules/ROOT/pages/reference/kernels/generated/<provider-name>.adoc` — one page per provider with the kernels it implements, priority, availability predicate, target architectures.
   - `docs/modules/ROOT/partials/kernels/<kernel-name>.adoc` — one partial per `*MatmulKernel` interface, listing all providers that implement it (sorted by priority).
   - `docs/modules/ROOT/pages/reference/kernels-status-matrix.adoc` — coverage matrix (rows = kernels, columns = providers/targets, cells = priority + throughput note + link to source).

3. **Cross-link** from the new `simd-kernels.adoc` / `quantized-simd-kernels.adoc` explanation pages to the generated reference pages, and from the generated pages back to the explanation narrative. Same Diátaxis pattern as ops have today.

4. **Bench numbers** could be wired in too if we annotate the JMH classes with their corresponding kernel — KSP would emit a "latest measured throughput" footer fed by `:skainet-backends:benchmarks:jvm-cpu-jmh:jmhResults` or a checked-in JSON snapshot. Out of scope for the first PR, but worth designing the annotation shape with this in mind.

## Why this matters now

- **Drift prevention.** When the future native FFM provider lands (see [explanation/perf/native-ffm-plan.adoc](docs/modules/ROOT/pages/explanation/perf/native-ffm-plan.adoc)), its annotations would auto-populate the reference pages and the coverage matrix the same day the code merges. No "forgot to update the docs" PR.
- **GPU / NPU readiness.** The same annotation surface works for backends we don't have yet. We're locking in the reference-doc pattern *before* the explosion of providers, not retrofitting later under churn.
- **Mirror of ops.** Operators and kernels are the two extension points readers most need a coverage matrix for. Right now ops have one, kernels don't — that asymmetry will get worse, not better.

## Scope of the first PR

- Annotations + KSP processor extension (matches the operator pipeline's shape — `OperatorMetadataProcessor`-equivalent for kernels).
- `KernelImplementation` annotation applied to `ScalarKernelProvider`, `PanamaVectorKernelProvider`.
- Generated `reference/kernels/generated/scalar-kernel-provider.adoc`, `.../panama-vector-kernel-provider.adoc`, `kernels-status-matrix.adoc`.
- Cross-link block at the top of `simd-kernels.adoc` / `quantized-simd-kernels.adoc` pointing to the matrix.

## Out of scope (separate issues if pursued)

- Bench-throughput injection from JMH JSON.
- Native FFM provider annotations — wait until the provider exists (see [explanation/perf/native-ffm-plan.adoc](docs/modules/ROOT/pages/explanation/perf/native-ffm-plan.adoc) trigger conditions).
- Kotlin/Native and JS/Wasm provider annotations — only relevant if we ever ship per-target kernels there.

## Decision needed

The narrower question for this issue: **do we want to commit to a first-class kernel reference docs system, modeled on the operators one, before the next wave of providers arrives?**

- **Yes** → schedule the KSP work for 0.22.x. Estimated 2–3 PRs (annotations + processor + initial pages, then status matrix, then cross-linking polish).
- **No** → close as won't-fix; keep kernel docs purely hand-written under `explanation/perf/`. Acceptable as long as the kernel surface stays small (≤4 providers, ≤2 sibling SPIs).
- **Defer** → revisit when the native FFM provider PR lands; that's the natural moment to either invest in generation or accept hand-written drift.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel reference docs (KSP-generated, parallel to ops) #568

Context

Compare: how operators handle this

Proposal

Why this matters now

Scope of the first PR

Out of scope (separate issues if pursued)

Decision needed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Kernel reference docs (KSP-generated, parallel to ops) #568

Description

Context

Compare: how operators handle this

Proposal

Why this matters now

Scope of the first PR

Out of scope (separate issues if pursued)

Decision needed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions