Skip to content

Commit 2bc8a54

Browse files
committed
[Quantization] Shrink FP8 sweep parity matrix from 27 to 12 cases
Trim the parity grid to keep all three axes but with smaller per-axis ranges: 2 seeds × 2 num_blocks × 3 dtypes = 12 parametrized cases (down from 3×3×3 = 27). Still exercises every supported dtype and the small/ large num_blocks extremes that drive different autotune choices, while roughly halving the cold-compile cost on hosts where Triton compilation is expensive. Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
1 parent 8f04a9a commit 2bc8a54

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

tests/gpu/torch/quantization/test_nvfp4_fp8_sweep_kernel.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,8 @@ def _run_triton(x, per_block_amax, global_amax):
8686

8787
@requires_triton
8888
@pytest.mark.parametrize("dtype", [torch.float32, torch.float16, torch.bfloat16])
89-
@pytest.mark.parametrize("seed", [0, 1, 2])
90-
@pytest.mark.parametrize("num_blocks", [4, 64, 1024])
89+
@pytest.mark.parametrize("num_blocks", [4, 1024])
90+
@pytest.mark.parametrize("seed", [0, 1])
9191
def test_parity_random_weights(seed, num_blocks, dtype):
9292
"""Triton sweep must produce the exact same per-block amax as the reference,
9393
across every dtype supported by the NVFP4 quantizer (fp32, fp16, bf16)."""

0 commit comments

Comments
 (0)