Skip to content

docs: SIMD kernels, quantized SIMD, native FFM plan; arc42 architecture#567

Merged
michalharakal merged 2 commits intodevelopfrom
feature/docs-simd-kernels-and-arc42
Apr 29, 2026
Merged

docs: SIMD kernels, quantized SIMD, native FFM plan; arc42 architecture#567
michalharakal merged 2 commits intodevelopfrom
feature/docs-simd-kernels-and-arc42

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Summary

Antora docs reflecting the 0.21.0 M5 work: three new explanation pages under docs/.../explanation/perf/, plus an arc42-style expansion of the previously-stub reference/architecture.adoc.

New pages

  • explanation/perf/simd-kernels.adoc — How the kernel SPI is structured, why it exists, the four core JDK Vector API patterns (SPECIES_PREFERRED, B^T packing, FMA + reduceLanes, 8×8×128 tile blocking), and the ServiceLoader / factory-wrapper auto-discovery story. Includes the KernelMatmulBench numbers from bench(kernel): KernelMatmulBench — scalar vs Panama (M5 evidence) #558.
  • explanation/perf/quantized-simd-kernels.adoc — Per-format pipelines (Q4_0, Q4_K, Q4_K MemSeg, Q6_K, Q8_0). Walks the ByteVector → AND/LSHR nibble extract → castShape(B2F) → fused FMA recipe, the lazy-dmin trick, the canonical Q4_K block layout (8 sub-blocks, ggml get_scale_min_k4), the Q6_K ql + qh 6-bit assembly, and the per-format coverage matrix.
  • explanation/perf/native-ffm-plan.adoc — Recovers the FFM PRD content from git history (61962def:NATIVE_FFM_KERNEL_PROVIDER.md, dropped from the 0.21.0 release per chore(release): prepare 0.21.0 #566 to "ship the release first, keep the plan in docs"). Goals, non-goals, module layout, FFM binding pattern, staged delivery (5 PRs), success metrics, 6 risks/open questions, trigger conditions for un-deferring.

Architecture

reference/architecture.adoc goes from a 4-line stub to a full arc42-ordered reference, focused on the 0.21.0 changes:

  • Section 5 (Building Block View) — module table + kernel SPI ASCII diagram (commonMain api + jvmMain auto-discovery + jvmMain providers).
  • Section 6 (Runtime View) — eager-execution flow from ctx.ops.matmul through chooseQuantizedMatmul / chooseMatmul to the resolved SPI kernel, with the lazy provider resolution and null-fall-through patterns called out.
  • Section 9 (Architecture decisions) — table covering the kernel SPI design choices, default-null accessor pattern, ServiceLoader-deferral rationale, FFM-not-JNI, Antora-not-Wiki.
  • Section 10/11 (Quality requirements + Risks) — Panama metric met, native metric still pending; Vector API still incubator; two prior reverts on develop history.

Nav

nav.adoc registers the three new explanation pages under the .Explanation section, alongside the existing jvm-cpu.adoc and java-25-cpu-backend.adoc.

Files changed

  • 3 new .adoc files
  • reference/architecture.adoc: +283 / −6
  • nav.adoc: +3

Test plan

  • Reviewer: render the Antora site locally (./gradlew :docs:antora or whatever the local target is) and visually check the three new pages plus the architecture page render correctly with the new ASCII art and tables.
  • Cross-references between the three perf docs (xref:explanation/perf/simd-kernels.adoc[] etc.) resolve.
  • No broken links from nav.adoc.

🤖 Generated with Claude Code

Three new explanation pages under docs/.../explanation/perf/ covering
the M5 work that landed in 0.21.0:

- simd-kernels.adoc — kernel SPI overview, FloatVector + FMA pattern,
  tile blocking, ServiceLoader auto-discovery + factory wrappers,
  KernelMatmulBench numbers (8.6×–10.8× over scalar at 256/512/1024
  on Apple Silicon NEON).
- quantized-simd-kernels.adoc — per-format pipelines (Q4_0, Q4_K,
  Q4_K MemSeg, Q6_K, Q8_0): ByteVector → AND/LSHR nibble extract →
  castShape(B2F) → fused FMA, plus the lazy-dmin trick and the per-
  format coverage matrix.
- native-ffm-plan.adoc — recovers the FFM PRD content from git
  history (61962de:NATIVE_FFM_KERNEL_PROVIDER.md, dropped from
  0.21.0 release per #566). Module layout, FFM binding pattern,
  staged delivery, success metrics, risks, trigger conditions.

architecture.adoc grows from a 4-line stub to an arc42-style
reference with focus on the 0.21.0 changes:

- Building Block View — module table + kernel SPI ASCII diagram
  (commonMain api + jvmMain auto-discovery + jvmMain providers).
- Runtime View — eager-execution flow from `ctx.ops.matmul` through
  `chooseQuantizedMatmul` / `chooseMatmul` to the SPI kernel, with
  the lazy provider resolution and fall-through pattern called out.
- Architecture decisions table, quality requirements, risks (Vector
  API still incubator, no native provider yet, prior reverts).

nav.adoc: register the three new explanation pages under .Explanation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

  • Operator documentation: docs/modules/operators/_generated_/
  • JSON schema output: operators.json

Artifacts:

  • Download the documentation-preview-567 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

Local Antora build (the same docker pipeline GitHub Actions runs)
emitted:

  warn: native-ffm-plan.adoc:237: list item index: expected 1, got 22

The line started with "22." after a wrap, which the asciidoctor
parser interpreted as a sibling numbered list item with an out-of-
sequence index. Re-wrap so "22." stays mid-sentence. Site rebuild
now warning-free.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

  • Operator documentation: docs/modules/operators/_generated_/
  • JSON schema output: operators.json

Artifacts:

  • Download the documentation-preview-567 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

@michalharakal michalharakal marked this pull request as ready for review April 29, 2026 06:34
@michalharakal michalharakal merged commit 90bcf1f into develop Apr 29, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant