Commit a0d6e9b
authored
switch correctness checks to SNR-based assertion for cuda quant int4_matmul (#19300)
Replace torch.allclose(atol/rtol) with an SNR (signal-to-noise ratio)
assertion across all int4_matmul / int4_matvec / dequant-vs-fused tests.
Why:
- test_prefill_short was flaking on CI (A10G) with max_abs_err=1.0000.
Root cause: bf16 GEMM with K=2048 reduction produces output magnitudes
up to ~200; at that scale, the bf16 ULP gap is 0.5-1.0. Triton fused
kernel and cuBLAS reduce in different orders (and Triton autotune picks
different tile configs on different hardware), so 1-ULP element-wise
differences are unavoidable. atol/rtol false-fails on these outliers;
SNR averages them out.
- atol/rtol thresholds also depend on size: a value tuned for K=2048 is
too loose for K=64 and too tight for K=4096. SNR is size-invariant
(||signal|| and ||noise|| both scale with sqrt(N) and sqrt(K), canceling
in the ratio).
What:
- Add _assert_snr(test_case, actual, expected, label) helper that
asserts 20*log10(||expected|| / ||actual-expected||) >= 50 dB.
- Replace 4 call sites: TestInt4Matmul, TestInt4Matvec (x2),
TestDequantThenMatmul.
- 50 dB ~ 0.3% RMS error: well below observed clean noise (80-90 dB) and
well above any real functional bug (<20 dB SNR for wrong stride /
flipped nibble / off-by-one group_idx / missing mask).
Test plan:
python -m pytest backends/cuda/tests/test_int4_matmul.py -v
-> 35/35 passed1 parent ff25a2f commit a0d6e9b
1 file changed
Lines changed: 39 additions & 25 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | 22 | | |
24 | 23 | | |
25 | 24 | | |
| |||
28 | 27 | | |
29 | 28 | | |
30 | 29 | | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
31 | 65 | | |
32 | 66 | | |
33 | 67 | | |
| |||
118 | 152 | | |
119 | 153 | | |
120 | 154 | | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | | - | |
125 | | - | |
126 | | - | |
| 155 | + | |
127 | 156 | | |
128 | 157 | | |
129 | 158 | | |
| |||
189 | 218 | | |
190 | 219 | | |
191 | 220 | | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
| 221 | + | |
199 | 222 | | |
200 | 223 | | |
201 | 224 | | |
| |||
226 | 249 | | |
227 | 250 | | |
228 | 251 | | |
229 | | - | |
230 | | - | |
231 | | - | |
232 | | - | |
| 252 | + | |
233 | 253 | | |
234 | 254 | | |
235 | 255 | | |
| |||
248 | 268 | | |
249 | 269 | | |
250 | 270 | | |
251 | | - | |
252 | | - | |
253 | | - | |
254 | | - | |
255 | | - | |
256 | | - | |
257 | | - | |
| 271 | + | |
258 | 272 | | |
259 | 273 | | |
260 | 274 | | |
| |||
0 commit comments