Commit ddb5aee
ssjia
[ET-VK][quantized] Store dq8ca per-token zero-point as fp32
The per-token dynamic-activation-quant (`dq8ca`) zero-point was corrupted by a tensor-allocation vs shader-access dtype mismatch. The per-token zero-point tensor is created with a float dtype -- fp32, or fp16 under `USE_VULKAN_FP16_INFERENCE` -- so its backing image uses a float texel format (`rgba32f` / `rgba16f`). But the shader declared and accessed that image with an integer dtype (`int8`, an integer image format `rgba8i`). Reading a float-format image through an integer-format binding is the bug. On ARM Mali (Valhall) GPUs this mismatch corrupted the per-token zero-points: negative zero-points came back as garbage (`-k` read as `-2^23 - k`), driving the quantized activation to the int8 floor, the per-group sums to `-4096`, and the GEMM output to garbage, producing garbled, runaway generation for 8da4w models (e.g. the Llama4-mini TISO TTS backbone on Mali-G715/G710). Adreno happened to tolerate the same mismatch and read correct values, so the corruption was Mali-specific even though the mismatch itself is general.
The per-token zero-point is serialized as fp32 by torchao design: `Int8DynamicActivationIntxWeightConfig` (8da4w) uses asymmetric per-token activation quant with an explicit fp32 `zero_point_dtype`. Decoding the serialized `.pte` confirms the zero-point tensor is FLOAT32, and (like the scale) it is stored in a texture as an `rgba32f` texel -- never `rgba8i`. The float allocation is the truth; the int8 shader access was the mismatched side.
The fix is to declare, store, and read the per-token zero-point as fp32 across the dq8ca qparams shaders, so the shader access dtype matches the tensor's allocation dtype and the texture is read as the `rgba32f` image it actually is. The zero-point value is integer-valued (nudged to `[-128, 127]`), so fp32 represents it exactly and the consumer's `int(zp)` conversion for the integer dequant-correction is lossless. This touches the dq8ca qparams shaders -- `choose_qparams_per_row`, `quantize_and_pack_4h4w_with_group_sums`, `linear_dq8ca_q4gsw_tiled`, the shared `linear_int8_input_scales_zps_load` helper, and the `linear_q4gsw_coop` variant (whose zero-point binding only matches the descriptor-set layout and is never read) -- plus a documentation comment in `ChooseQParams.cpp`.
Because the per-token qparams remain in texture storage (unchanged from before) and only the zero-point dtype changes, this is a pure runtime shader fix: existing texture-qparams 8da4w `.pte` files are corrected without re-export, since the texture already bakes the zero-point as `rgba32f` and the shader now reads it as such.
Authored with Claude Code.
Differential Revision: [D109595977](https://our.internmc.facebook.com/intern/diff/D109595977/)
ghstack-source-id: 396618146
Pull-Request: #204911 parent 1621fa2 commit ddb5aee
8 files changed
Lines changed: 24 additions & 16 deletions
File tree
- backends/vulkan
- runtime/graph/ops
- glsl
- impl
- test/custom_ops
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| |||
196 | 196 | | |
197 | 197 | | |
198 | 198 | | |
199 | | - | |
| 199 | + | |
200 | 200 | | |
201 | 201 | | |
202 | 202 | | |
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
| 49 | + | |
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| |||
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
| 43 | + | |
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
| 36 | + | |
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
81 | 88 | | |
82 | 89 | | |
83 | 90 | | |
| |||
Lines changed: 6 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | | - | |
| 47 | + | |
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| |||
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
84 | | - | |
| 84 | + | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
| 87 | + | |
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
| |||
289 | 289 | | |
290 | 290 | | |
291 | 291 | | |
292 | | - | |
| 292 | + | |
293 | 293 | | |
294 | 294 | | |
295 | 295 | | |
| |||
312 | 312 | | |
313 | 313 | | |
314 | 314 | | |
315 | | - | |
| 315 | + | |
316 | 316 | | |
317 | | - | |
| 317 | + | |
318 | 318 | | |
319 | 319 | | |
320 | 320 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
| 100 | + | |
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
| |||
428 | 428 | | |
429 | 429 | | |
430 | 430 | | |
431 | | - | |
| 431 | + | |
432 | 432 | | |
433 | 433 | | |
434 | 434 | | |
| |||
462 | 462 | | |
463 | 463 | | |
464 | 464 | | |
465 | | - | |
466 | | - | |
| 465 | + | |
| 466 | + | |
467 | 467 | | |
468 | 468 | | |
469 | 469 | | |
| |||
0 commit comments