Commit 21c2f4b
committed
Update on "Add Triton INT4 dense kernels with dequant prefill path for Qwen3.5 MoE"
Add three new Triton kernels for dense W4A16 linear projections that
replace tinygemm's tiled INT4 format with simple [N, K//2] packed weights
(same format as MoE experts):
- int4_matmul: fused dequant+tl.dot GEMM for medium M (prefill crossover)
- int4_matvec: bandwidth-optimized vec-mat for M=1 decode
- dequant_w4_to_bf16: weight dequant for large-M prefill via Inductor mm
W4DequantLinear wraps these with dual decode/prefill dispatch:
- Decode (M=1): int4_matvec (73 tok/s, ~35% slower than tinygemm)
- Prefill (M>1): dequant+F.linear via Inductor (3400 tok/s at 3K tokens,
+67% over tinygemm baseline)
Single 18GB weight blob (no duplication). Decode perf regression is a
known trade-off for uniform weight format — to be revisited with a
CUDA C++ matvec kernel.
Also adds INT8 dynamic-activation MoE tests and comprehensive correctness
tests (48 tests, all passing at rtol=0.01).
Co-authored-by: Claude <noreplyanthropic.com>
[ghstack-poisoned]3 files changed
Lines changed: 14 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
418 | 418 | | |
419 | 419 | | |
420 | 420 | | |
421 | | - | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
422 | 424 | | |
423 | 425 | | |
424 | 426 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
667 | 667 | | |
668 | 668 | | |
669 | 669 | | |
670 | | - | |
| 670 | + | |
671 | 671 | | |
672 | 672 | | |
673 | 673 | | |
674 | 674 | | |
675 | | - | |
| 675 | + | |
676 | 676 | | |
677 | 677 | | |
678 | 678 | | |
| |||
916 | 916 | | |
917 | 917 | | |
918 | 918 | | |
919 | | - | |
920 | | - | |
| 919 | + | |
| 920 | + | |
921 | 921 | | |
922 | 922 | | |
923 | 923 | | |
| |||
1087 | 1087 | | |
1088 | 1088 | | |
1089 | 1089 | | |
1090 | | - | |
| 1090 | + | |
1091 | 1091 | | |
1092 | 1092 | | |
1093 | 1093 | | |
1094 | 1094 | | |
1095 | 1095 | | |
1096 | | - | |
1097 | | - | |
| 1096 | + | |
| 1097 | + | |
| 1098 | + | |
1098 | 1099 | | |
1099 | 1100 | | |
1100 | 1101 | | |
| |||
1139 | 1140 | | |
1140 | 1141 | | |
1141 | 1142 | | |
1142 | | - | |
| 1143 | + | |
1143 | 1144 | | |
1144 | 1145 | | |
1145 | 1146 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
479 | 479 | | |
480 | 480 | | |
481 | 481 | | |
482 | | - | |
| 482 | + | |
483 | 483 | | |
484 | 484 | | |
485 | 485 | | |
| |||
498 | 498 | | |
499 | 499 | | |
500 | 500 | | |
501 | | - | |
| 501 | + | |
502 | 502 | | |
503 | 503 | | |
504 | 504 | | |
| |||
0 commit comments