You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Mirror the q4k fused-int8 kernel: pre-quantize the input row to symmetric
int8 (Q8) once per 256-block (reused across all output rows), unpack the
6-bit weight to centered int8 codes, and run each scale-group as an int8
dot (vdotq_s32 on dotprod targets, scalar fallback otherwise). Drops the
256-float scratch dequant + per-element float multiply.
acc = d · d_in · Σ_g sc[g]·Σ_{i∈g} q8[i]·codes[i].
This is deliberately lossy (ggml-style activation quant, ~1-3% on
worst-case uniform-random fixtures) so it is no longer bit-exact vs the
float/scalar reference. Both parity tests (jvmTest Panama, nativeTest
cinterop on linuxX64 + linuxArm64) switch from per-row relative error —
unbounded on near-zero rows of zero-mean fixtures — to the aggregate
error-energy gate RMS(error)/RMS(signal) < 0.03.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: skainet-backends/skainet-backend-native-cpu/src/jvmTest/kotlin/sk/ainet/exec/kernel/NativeQ6KMatmulKernelParityTest.kt
+27-9Lines changed: 27 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -16,9 +16,16 @@ import kotlin.test.assertTrue
16
16
* Fixture mirrors [NativeQ5KMatmulKernelParityTest]: random Q6_K bytes with
17
17
* `d` clamped to `1.0f16` (bytes 208-209), packed input-block-major
18
18
* `(blockIdx * outputDim + o) * 210`. Random `ql`/`qh`/`scales` exercise the
19
-
* 6-bit bit-assembly and the signed int8 scales. Q6_K magnitudes are larger
20
-
* than Q5_K (codes [-32, 31] × int8 scales), so absolute tolerances are a
21
-
* touch looser; the `rel < 1e-4` relative check is the real gate.
19
+
* 6-bit bit-assembly and the signed int8 scales.
20
+
*
21
+
* Like [NativeQ4KMatmulKernelParityTest], the native kernel quantizes the
22
+
* activation to int8 (Q8) for the dotprod fast path — deliberately lossy
23
+
* (ggml-style), so it is NOT bit-exact vs the float Panama reference. Per-row
24
+
* relative error is the wrong gate (a near-zero true row shows unbounded
25
+
* relative error from a tiny absolute one on zero-mean random fixtures); the
26
+
* meaningful metric is the aggregate error energy RMS(error)/RMS(signal). Real
27
+
* (smoother) LLM activations are far tighter than these worst-case fixtures;
28
+
* the end-to-end gate is the on-board generation output.
22
29
*/
23
30
classNativeQ6KMatmulKernelParityTest {
24
31
@@ -58,14 +65,25 @@ class NativeQ6KMatmulKernelParityTest {
Copy file name to clipboardExpand all lines: skainet-backends/skainet-backend-native-cpu/src/nativeTest/kotlin/sk/ainet/exec/kernel/NativeKnQ6KMatmulKernelParityTest.kt
+23-10Lines changed: 23 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -12,10 +12,13 @@ import kotlin.test.assertTrue
12
12
* `-ffast-math` reassociation tolerance.
13
13
*
14
14
* Runs on linuxX64 (host archive: scalar/auto-vectorized) AND linuxArm64
15
-
* (cross-built archive: NEON), so the aarch64 run bit-checks the
16
-
* `SKAINET_HAVE_NEON` path in q6k_matmul.c. Q6_K magnitudes (codes
17
-
* [-32, 31] × signed int8 scales) are larger than Q5_K, so absolute tolerances
18
-
* are a touch looser; the `rel < 1e-4` relative check is the real gate.
15
+
* (cross-built archive: NEON), so the aarch64 run exercises the
16
+
* `SKAINET_HAVE_NEON` / `SKAINET_HAVE_DOTPROD` path in q6k_matmul.c.
17
+
*
18
+
* The C kernel quantizes the activation to int8 (Q8) for the dotprod fast path
19
+
* — deliberately lossy (ggml-style), so it is NOT bit-exact vs the scalar
20
+
* reference. The gate is the aggregate error energy RMS(error)/RMS(signal), not
21
+
* per-row relative error (unbounded on near-zero rows of zero-mean fixtures).
19
22
*/
20
23
classNativeKnQ6KMatmulKernelParityTest {
21
24
@@ -46,14 +49,24 @@ class NativeKnQ6KMatmulKernelParityTest {
0 commit comments