|
| 1 | += The SKaiNET DType Model |
| 2 | +:description: How SKaiNET represents tensor dtypes across loaders, the kernel SPI, and the (in-progress) constraint-resolution pipeline — mapped onto the four-dtype concept from the dtype-policy RFC (#615). |
| 3 | + |
| 4 | +[NOTE] |
| 5 | +==== |
| 6 | +**Audience: SKaiNET maintainers and contributors.** This page maps |
| 7 | +the vocabulary used in the |
| 8 | +https://github.com/SKaiNET-developers/SKaiNET/blob/develop/rfc.md[dtype-policy |
| 9 | +RFC] (issue #615) onto the existing SKaiNET implementations. |
| 10 | +Library consumers don't need to read this — they call |
| 11 | +`tensor<FP32, Float>(ctx, FP32::class) { … }` and the engine does |
| 12 | +the rest. |
| 13 | +==== |
| 14 | + |
| 15 | +The RFC distinguishes four dtype concepts; the engine mostly already |
| 16 | +implements them, but under different names. This page is the |
| 17 | +glossary that keeps the two consistent. |
| 18 | + |
| 19 | +== The four dtype concepts |
| 20 | + |
| 21 | +[cols="1,2,2,3",options="header"] |
| 22 | +|=== |
| 23 | +| RFC term | What it means | SKaiNET implementation today | Notes |
| 24 | +| **source dtype** | The dtype stored in the on-disk model file (`F16`, `F32`, `Q4_K`, `Q8_0`, …). | Set by the loader. `StreamingGgufParametersLoader` maps `GGMLQuantizationType.*` to the corresponding `TensorData` subtype. `SafeTensorsParametersLoader` maps SafeTensors `DataType` similarly. | The loader-time mapping is the **source-of-truth** for what the file actually contains. |
| 25 | +| **logical dtype** | The dtype the tensor advertises to graph code (op contracts, shape inference, dispatch). | `Tensor<T : DType, V>.dtype: KClass<T>` — the type-parameter `T` resolves to one of the sealed `DType` arms (`FP32`, `BF16`, `Int8`, …). | The logical dtype is **never** inferred from physical storage shape (no "1D byte array patched into 2D" antipattern — every quantized `TensorData` subtype carries explicit `shape: Shape`). |
| 26 | +| **required dtype** | The dtype an op, layer, or backend declares it needs. | Today: implicit in the kernel SPI accessors (`matmulFp32()`, `matmulBf16()`, `matmulQ4K()`, `matmulQ8_0()`). After W6/W7 of #615: explicit `DTypePolicy` attached to graph nodes via `attributes["dtype_policy"]`. | The `DTypePolicy` sealed type (W1, shipped in this PR series) covers the four arms from the RFC's "policy categories" section: `Any` / `Require` / `Prefer` / `OneOf`. |
| 27 | +| **lowered dtype** | The dtype actually passed to the executable kernel. | Whatever `KernelRegistry.bestAvailable()?.matmul*()` returns. `KernelProvider.supports(opName, dtypeKeys)` (W3, shipped) is the introspection query. | If a `Require` constraint can't be matched by any registered kernel and no cast kernel bridges the gap, the constraint-resolution pass (W7, pending) raises `DtypeConstraintViolationException` *before* forward execution — exactly the RFC's "fail before execution" rule. |
| 28 | +|=== |
| 29 | + |
| 30 | +== Loader source → logical mapping today |
| 31 | + |
| 32 | +Both loaders are explicit about what each on-disk dtype becomes |
| 33 | +inside the engine. This table is the W0a audit promised by issue |
| 34 | +#615 — it makes the silent dequant cases visible so the loader-policy |
| 35 | +work (W0b / W0c) knows what to generalise. |
| 36 | + |
| 37 | +=== `StreamingGgufParametersLoader` (skainet-io-gguf) |
| 38 | + |
| 39 | +[cols="1m,1,2,2",options="header"] |
| 40 | +|=== |
| 41 | +| GGUF source type | Logical dtype today | Storage class | Native or dequant? |
| 42 | +| F32 | `FP32` | `FloatArrayTensorData` (dense) | native |
| 43 | +| I32 | `Int32` | `IntArrayTensorData` (dense) | native |
| 44 | +| F16 | `FP32` | `FloatArrayTensorData` (dense, dequanted) | **dequant on load** — no `KEEP_NATIVE` path yet |
| 45 | +| BF16 | `FP32` | `FloatArrayTensorData` (dense, dequanted) | **dequant on load** — no `KEEP_NATIVE` path yet |
| 46 | +| Q4_K | `FP32`-tagged tensor wrapping `Q4_KBlockTensorData` | `Q4_KBlockTensorData` (packed, logical shape preserved) | native |
| 47 | +| Q8_0 | `FP32`-tagged tensor wrapping `Q8_0BlockTensorData` | `Q8_0BlockTensorData` (packed, logical shape preserved) | native |
| 48 | +|=== |
| 49 | + |
| 50 | +The two dequant rows (F16, BF16) are the gap. SafeTensors already |
| 51 | +has a `Bf16LoadPolicy.KEEP_NATIVE` opt-in (see below) that returns |
| 52 | +the BF16 bytes verbatim instead of expanding to FP32. The |
| 53 | +equivalent for GGUF is W0c (`StreamingGgufParametersLoader.loadWithPolicy`). |
| 54 | + |
| 55 | +=== `SafeTensorsParametersLoader` (skainet-io-safetensors) |
| 56 | + |
| 57 | +[cols="1m,1,2,2",options="header"] |
| 58 | +|=== |
| 59 | +| SafeTensors source type | Logical dtype today | Storage class | Native or dequant? |
| 60 | +| F32 / F64 | `FP32` | `FloatArrayTensorData` | native (F64 down-cast with warning) |
| 61 | +| F16 | `FP32` | `FloatArrayTensorData` (dequanted) | **dequant on load** — no `KEEP_NATIVE` path yet |
| 62 | +| BF16 | `FP32` or `BF16`-shaped depending on `Bf16LoadPolicy` | `FloatArrayTensorData` (dequanted) or `Bf16DenseTensorData` (native) | **policy-controlled**: `DEQUANT_TO_FP32` (default) or `KEEP_NATIVE` |
| 63 | +| I32 / I16 / U16 / U32 / U64 / I8 / U8 | matching `Int*` / `UInt*` | wrapped / reinterpreted appropriately | native |
| 64 | +|=== |
| 65 | + |
| 66 | +The BF16 row is the prior art for the RFC's policy model. `Bf16LoadPolicy.toDTypePolicy()` (W2, shipped) maps the BF16-specific enum onto the generalised `DTypePolicy`: |
| 67 | + |
| 68 | +[source,kotlin] |
| 69 | +---- |
| 70 | +Bf16LoadPolicy.DEQUANT_TO_FP32.toDTypePolicy() // → DTypePolicy.Require(FP32) |
| 71 | +Bf16LoadPolicy.KEEP_NATIVE.toDTypePolicy() // → DTypePolicy.Require(BF16) |
| 72 | +---- |
| 73 | + |
| 74 | +W0b extends this same idea to F16 and the integer dtypes so the |
| 75 | +whole SafeTensors loader can be driven by a single `DTypePolicy` |
| 76 | +argument. |
| 77 | + |
| 78 | +== The `DType` registry vs the kernel capability query |
| 79 | + |
| 80 | +`DType.findByName("Float32")` returns the singleton `FP32` object — |
| 81 | +the sealed-interface registry is the source-of-truth for dtype |
| 82 | +metadata (size in bits, name, promotion rules). It currently covers |
| 83 | +floats and (un)signed integers from `Ternary` through `FP64`. |
| 84 | + |
| 85 | +The quantized block formats (`Q4_K`, `Q8_0`, `Q6_K`, `Q4_0`, …) |
| 86 | +are **not** `DType` arms — they live as `TensorData` subtypes in |
| 87 | +`skainet-lang-core/tensor/data/`. That's intentional: a `DType` is |
| 88 | +a numeric type with promotion semantics, whereas Q4_K is a *packed |
| 89 | +block format* with no scalar interpretation outside its block |
| 90 | +context. |
| 91 | + |
| 92 | +For the kernel capability query (`KernelProvider.supports(opName, |
| 93 | +dtypeKeys)`, W3), this means the second argument is `List<String>` |
| 94 | +rather than `List<DType>` — the strings `"Q4_K"` and `"Q8_0"` slot |
| 95 | +in alongside `"Float32"` and `"BFloat16"`. The string convention |
| 96 | +matches what GGUF / SafeTensors loaders and the StableHLO converter |
| 97 | +already use for format identification. |
| 98 | + |
| 99 | +== Fail-fast: `KernelStrictness` |
| 100 | + |
| 101 | +The RFC's "fail before execution" rule has a small, ready |
| 102 | +affordance today (W4, shipped): |
| 103 | + |
| 104 | +[source,bash] |
| 105 | +---- |
| 106 | +java -Dskainet.strict.kernels=true … |
| 107 | +---- |
| 108 | + |
| 109 | +When set, `DefaultCpuOpsJvm.matmul` raises |
| 110 | +`NoSuchKernelException` (with the failing dtype pair and the list |
| 111 | +of currently-registered providers) just before its silent scalar |
| 112 | +fallback would have run. Default off — adaptive behaviour is |
| 113 | +preserved. |
| 114 | + |
| 115 | +The constraint-resolution pass (W7) raises the same exception |
| 116 | +shape at *graph-prep* time, before forward execution can even |
| 117 | +start. The `KernelStrictness` affordance is the runtime equivalent |
| 118 | +for cases where graph prep hasn't been run (e.g. ad-hoc tensor-op |
| 119 | +code that calls `ctx.ops.matmul` directly). |
| 120 | + |
| 121 | +== Anti-patterns this model prevents |
| 122 | + |
| 123 | +The RFC calls out three concrete anti-patterns the engine must |
| 124 | +avoid; SKaiNET already prevents all three. |
| 125 | + |
| 126 | +[cols="2,3",options="header"] |
| 127 | +|=== |
| 128 | +| Anti-pattern | What prevents it in SKaiNET today |
| 129 | +| Marker-class dtype detection (`if tensor is Q4_KMarker`) | The sealed `DType` interface carries explicit metadata (`sizeInBits`, `name`, `isCompatible`, `promoteTo`). Dispatch uses `KClass<T>` identity and the typed accessors on `KernelProvider`, not marker checks. |
| 130 | +| Packed bytes treated as logical shape (1D byte array patched into 2D after load) | Every quantized `TensorData` subtype (`Q4_KBlockTensorData`, `Q8_0BlockTensorData`, `Bf16DenseTensorData`) carries an explicit `shape: Shape` separate from its `packedData: ByteArray`. Loaders set the logical shape from the file header, not from `bytes.size`. |
| 131 | +| GGUF Q8 confused with native int8 | They're different `TensorData` subtypes. A GGUF Q8 tensor goes through `Q8_0BlockTensorData` (with FP16 scale + 32 signed int8 codes per block); a future native-int8 NPU tensor would have its own `TensorData` subtype with backend-specific layout metadata. The RFC's "GGUF Q8 ≠ native int8" rule is enforced structurally. |
| 132 | +|=== |
| 133 | + |
| 134 | +== Related |
| 135 | + |
| 136 | +* `rfc.md` (repo root) — the design document this page implements. |
| 137 | +* Issue https://github.com/SKaiNET-developers/SKaiNET/issues/615[#615] — implementation tracker. |
| 138 | +* xref:contributing/benchmarks.adoc[Engine benchmark program] — runtime numbers that the kernel SPI produces. |
| 139 | +* xref:contributing/matmul-kernels.adoc[Reading the matmul benchmark] — how the kernel SPI's dispatch actually shows up in measurements. |
0 commit comments