Commit 8996ef1
Fix skip-softmax threshold formula: remove erroneous * sm_scale factor
The BLASST (https://arxiv.org/pdf/2512.12087) criterion checks
ln(lambda) on the sm_scale-SCALED attention logits a_ij = q·k/sqrt(d).
The Triton kernel stores scores as x = a * log2(e), so the correct
threshold in kernel (log2) space is log2(lambda), not log2(lambda)*sm_scale.
Previous code multiplied by sm_scale (~0.088 for head_dim=128), making
every threshold 11× too aggressive. With lambda=0.1 the kernel-space
threshold was -0.29 instead of the correct -3.32, skipping most attention
tiles and producing garbage output (PSNR~11 dB). Even lambda=0.0001 was
still too aggressive (-1.18 vs correct -13.29).
Fix: use `log2(lambda)` directly as SKIP_THRESHOLD_LOG2, and restore the
default threshold to 0.1 (the standard BLASST value).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>1 parent 3cb983c commit 8996ef1
2 files changed
Lines changed: 18 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
450 | 450 | | |
451 | 451 | | |
452 | 452 | | |
453 | | - | |
| 453 | + | |
454 | 454 | | |
455 | 455 | | |
456 | 456 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
996 | 996 | | |
997 | 997 | | |
998 | 998 | | |
999 | | - | |
1000 | | - | |
1001 | | - | |
1002 | | - | |
1003 | | - | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
1004 | 1015 | | |
1005 | 1016 | | |
1006 | | - | |
| 1017 | + | |
1007 | 1018 | | |
1008 | 1019 | | |
1009 | 1020 | | |
| |||
0 commit comments