Commit 70ade20

committed

Use precision-matched quantized references in INT4 matmul tests

Replace eager float32 references with precision-matched quantized references that align with each kernel's internal dequant precision: - dequant_w4_to_bf16: bitwise exact vs pure-Python dequant (was atol=0.01) - int4_matmul: cuBLAS bf16 GEMM reference (both truncate to bf16) - int4_matvec: f32 matmul reference (both keep dequant in f32, atol=1e-3 vs prior atol=1.0) Co-authored-by: Claude <noreplyanthropic.com>

1 parent 8ae05c2 commit 70ade20Copy full SHA for 70ade20

2 files changed

backends/cuda
- tests
  - test_int4_matmul.py
- triton/kernels
  - int4_matmul.py

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 70ade20

File tree

0 commit comments