Skip to content

Commit 0d12193

Browse files
michalharakalclaude
andcommitted
Add contributing/dtype-model.adoc (W0a + W5 of #615)
Maps the dtype-policy RFC's vocabulary onto the existing SKaiNET implementations: - source / logical / required / lowered dtype concepts, each with the file path of the SKaiNET implementation and notes on what's shipped vs. what's pending in #615. - Loader-side source→logical mapping tables for both GGUF and SafeTensors — this is the W0a audit. The tables make the silent dequant cases visible (GGUF F16 + BF16 lose their native form on load; SafeTensors F16 same; SafeTensors BF16 already has a KEEP_NATIVE policy opt-in, the prior art for the rest). - Why quantized formats (Q4_K, Q8_0) aren't DType arms — they're TensorData subtypes — and what that means for the KernelProvider.supports() string-keyed API. - The three RFC anti-patterns and how the existing SKaiNET model structurally prevents them. - KernelStrictness as the runtime equivalent of the graph-prep fail-fast. Nav entry added under the Contributing section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 2b7a383 commit 0d12193

2 files changed

Lines changed: 140 additions & 0 deletions

File tree

docs/modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
.Contributing
3333
* xref:contributing/index.adoc[Audience and scope]
3434
* xref:contributing/build-from-source.adoc[Build from source]
35+
* xref:contributing/dtype-model.adoc[The SKaiNET dtype model]
3536
* xref:contributing/benchmarks.adoc[Engine benchmark program]
3637
* xref:contributing/matmul-kernels.adoc[Reading the matmul benchmark]
3738
* xref:contributing/register-bench-runner.adoc[Register a self-hosted bench runner]
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
= The SKaiNET DType Model
2+
:description: How SKaiNET represents tensor dtypes across loaders, the kernel SPI, and the (in-progress) constraint-resolution pipeline — mapped onto the four-dtype concept from the dtype-policy RFC (#615).
3+
4+
[NOTE]
5+
====
6+
**Audience: SKaiNET maintainers and contributors.** This page maps
7+
the vocabulary used in the
8+
https://github.com/SKaiNET-developers/SKaiNET/blob/develop/rfc.md[dtype-policy
9+
RFC] (issue #615) onto the existing SKaiNET implementations.
10+
Library consumers don't need to read this — they call
11+
`tensor<FP32, Float>(ctx, FP32::class) { … }` and the engine does
12+
the rest.
13+
====
14+
15+
The RFC distinguishes four dtype concepts; the engine mostly already
16+
implements them, but under different names. This page is the
17+
glossary that keeps the two consistent.
18+
19+
== The four dtype concepts
20+
21+
[cols="1,2,2,3",options="header"]
22+
|===
23+
| RFC term | What it means | SKaiNET implementation today | Notes
24+
| **source dtype** | The dtype stored in the on-disk model file (`F16`, `F32`, `Q4_K`, `Q8_0`, …). | Set by the loader. `StreamingGgufParametersLoader` maps `GGMLQuantizationType.*` to the corresponding `TensorData` subtype. `SafeTensorsParametersLoader` maps SafeTensors `DataType` similarly. | The loader-time mapping is the **source-of-truth** for what the file actually contains.
25+
| **logical dtype** | The dtype the tensor advertises to graph code (op contracts, shape inference, dispatch). | `Tensor<T : DType, V>.dtype: KClass<T>` — the type-parameter `T` resolves to one of the sealed `DType` arms (`FP32`, `BF16`, `Int8`, …). | The logical dtype is **never** inferred from physical storage shape (no "1D byte array patched into 2D" antipattern — every quantized `TensorData` subtype carries explicit `shape: Shape`).
26+
| **required dtype** | The dtype an op, layer, or backend declares it needs. | Today: implicit in the kernel SPI accessors (`matmulFp32()`, `matmulBf16()`, `matmulQ4K()`, `matmulQ8_0()`). After W6/W7 of #615: explicit `DTypePolicy` attached to graph nodes via `attributes["dtype_policy"]`. | The `DTypePolicy` sealed type (W1, shipped in this PR series) covers the four arms from the RFC's "policy categories" section: `Any` / `Require` / `Prefer` / `OneOf`.
27+
| **lowered dtype** | The dtype actually passed to the executable kernel. | Whatever `KernelRegistry.bestAvailable()?.matmul*()` returns. `KernelProvider.supports(opName, dtypeKeys)` (W3, shipped) is the introspection query. | If a `Require` constraint can't be matched by any registered kernel and no cast kernel bridges the gap, the constraint-resolution pass (W7, pending) raises `DtypeConstraintViolationException` *before* forward execution — exactly the RFC's "fail before execution" rule.
28+
|===
29+
30+
== Loader source → logical mapping today
31+
32+
Both loaders are explicit about what each on-disk dtype becomes
33+
inside the engine. This table is the W0a audit promised by issue
34+
#615 — it makes the silent dequant cases visible so the loader-policy
35+
work (W0b / W0c) knows what to generalise.
36+
37+
=== `StreamingGgufParametersLoader` (skainet-io-gguf)
38+
39+
[cols="1m,1,2,2",options="header"]
40+
|===
41+
| GGUF source type | Logical dtype today | Storage class | Native or dequant?
42+
| F32 | `FP32` | `FloatArrayTensorData` (dense) | native
43+
| I32 | `Int32` | `IntArrayTensorData` (dense) | native
44+
| F16 | `FP32` | `FloatArrayTensorData` (dense, dequanted) | **dequant on load** — no `KEEP_NATIVE` path yet
45+
| BF16 | `FP32` | `FloatArrayTensorData` (dense, dequanted) | **dequant on load** — no `KEEP_NATIVE` path yet
46+
| Q4_K | `FP32`-tagged tensor wrapping `Q4_KBlockTensorData` | `Q4_KBlockTensorData` (packed, logical shape preserved) | native
47+
| Q8_0 | `FP32`-tagged tensor wrapping `Q8_0BlockTensorData` | `Q8_0BlockTensorData` (packed, logical shape preserved) | native
48+
|===
49+
50+
The two dequant rows (F16, BF16) are the gap. SafeTensors already
51+
has a `Bf16LoadPolicy.KEEP_NATIVE` opt-in (see below) that returns
52+
the BF16 bytes verbatim instead of expanding to FP32. The
53+
equivalent for GGUF is W0c (`StreamingGgufParametersLoader.loadWithPolicy`).
54+
55+
=== `SafeTensorsParametersLoader` (skainet-io-safetensors)
56+
57+
[cols="1m,1,2,2",options="header"]
58+
|===
59+
| SafeTensors source type | Logical dtype today | Storage class | Native or dequant?
60+
| F32 / F64 | `FP32` | `FloatArrayTensorData` | native (F64 down-cast with warning)
61+
| F16 | `FP32` | `FloatArrayTensorData` (dequanted) | **dequant on load** — no `KEEP_NATIVE` path yet
62+
| BF16 | `FP32` or `BF16`-shaped depending on `Bf16LoadPolicy` | `FloatArrayTensorData` (dequanted) or `Bf16DenseTensorData` (native) | **policy-controlled**: `DEQUANT_TO_FP32` (default) or `KEEP_NATIVE`
63+
| I32 / I16 / U16 / U32 / U64 / I8 / U8 | matching `Int*` / `UInt*` | wrapped / reinterpreted appropriately | native
64+
|===
65+
66+
The BF16 row is the prior art for the RFC's policy model. `Bf16LoadPolicy.toDTypePolicy()` (W2, shipped) maps the BF16-specific enum onto the generalised `DTypePolicy`:
67+
68+
[source,kotlin]
69+
----
70+
Bf16LoadPolicy.DEQUANT_TO_FP32.toDTypePolicy() // → DTypePolicy.Require(FP32)
71+
Bf16LoadPolicy.KEEP_NATIVE.toDTypePolicy() // → DTypePolicy.Require(BF16)
72+
----
73+
74+
W0b extends this same idea to F16 and the integer dtypes so the
75+
whole SafeTensors loader can be driven by a single `DTypePolicy`
76+
argument.
77+
78+
== The `DType` registry vs the kernel capability query
79+
80+
`DType.findByName("Float32")` returns the singleton `FP32` object —
81+
the sealed-interface registry is the source-of-truth for dtype
82+
metadata (size in bits, name, promotion rules). It currently covers
83+
floats and (un)signed integers from `Ternary` through `FP64`.
84+
85+
The quantized block formats (`Q4_K`, `Q8_0`, `Q6_K`, `Q4_0`, …)
86+
are **not** `DType` arms — they live as `TensorData` subtypes in
87+
`skainet-lang-core/tensor/data/`. That's intentional: a `DType` is
88+
a numeric type with promotion semantics, whereas Q4_K is a *packed
89+
block format* with no scalar interpretation outside its block
90+
context.
91+
92+
For the kernel capability query (`KernelProvider.supports(opName,
93+
dtypeKeys)`, W3), this means the second argument is `List<String>`
94+
rather than `List<DType>` — the strings `"Q4_K"` and `"Q8_0"` slot
95+
in alongside `"Float32"` and `"BFloat16"`. The string convention
96+
matches what GGUF / SafeTensors loaders and the StableHLO converter
97+
already use for format identification.
98+
99+
== Fail-fast: `KernelStrictness`
100+
101+
The RFC's "fail before execution" rule has a small, ready
102+
affordance today (W4, shipped):
103+
104+
[source,bash]
105+
----
106+
java -Dskainet.strict.kernels=true …
107+
----
108+
109+
When set, `DefaultCpuOpsJvm.matmul` raises
110+
`NoSuchKernelException` (with the failing dtype pair and the list
111+
of currently-registered providers) just before its silent scalar
112+
fallback would have run. Default off — adaptive behaviour is
113+
preserved.
114+
115+
The constraint-resolution pass (W7) raises the same exception
116+
shape at *graph-prep* time, before forward execution can even
117+
start. The `KernelStrictness` affordance is the runtime equivalent
118+
for cases where graph prep hasn't been run (e.g. ad-hoc tensor-op
119+
code that calls `ctx.ops.matmul` directly).
120+
121+
== Anti-patterns this model prevents
122+
123+
The RFC calls out three concrete anti-patterns the engine must
124+
avoid; SKaiNET already prevents all three.
125+
126+
[cols="2,3",options="header"]
127+
|===
128+
| Anti-pattern | What prevents it in SKaiNET today
129+
| Marker-class dtype detection (`if tensor is Q4_KMarker`) | The sealed `DType` interface carries explicit metadata (`sizeInBits`, `name`, `isCompatible`, `promoteTo`). Dispatch uses `KClass<T>` identity and the typed accessors on `KernelProvider`, not marker checks.
130+
| Packed bytes treated as logical shape (1D byte array patched into 2D after load) | Every quantized `TensorData` subtype (`Q4_KBlockTensorData`, `Q8_0BlockTensorData`, `Bf16DenseTensorData`) carries an explicit `shape: Shape` separate from its `packedData: ByteArray`. Loaders set the logical shape from the file header, not from `bytes.size`.
131+
| GGUF Q8 confused with native int8 | They're different `TensorData` subtypes. A GGUF Q8 tensor goes through `Q8_0BlockTensorData` (with FP16 scale + 32 signed int8 codes per block); a future native-int8 NPU tensor would have its own `TensorData` subtype with backend-specific layout metadata. The RFC's "GGUF Q8 ≠ native int8" rule is enforced structurally.
132+
|===
133+
134+
== Related
135+
136+
* `rfc.md` (repo root) — the design document this page implements.
137+
* Issue https://github.com/SKaiNET-developers/SKaiNET/issues/615[#615] — implementation tracker.
138+
* xref:contributing/benchmarks.adoc[Engine benchmark program] — runtime numbers that the kernel SPI produces.
139+
* xref:contributing/matmul-kernels.adoc[Reading the matmul benchmark] — how the kernel SPI's dispatch actually shows up in measurements.

0 commit comments

Comments
 (0)