Commit f85b7ca
docs: Add FLUTE kernel analysis guide for kbit GEMM reference
Comprehensive technical analysis of the FLUTE (Flexible Lookup Table
Engine) kernel from arxiv 2407.10960, covering architecture, CUTLASS 3
implementation, vectorized LUT with bank-conflict duplication, Stream-K
work distribution, and detailed comparison with the bitsandbytes kbit
GEMM design across 10 dimensions (codebook lookup, weight packing,
scale format, work distribution, bit-width support, framework, tensor
cores, pipeline, offline preparation, and trade-off summary).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 03519c1 commit f85b7ca
1 file changed
+1145
-0
lines changed
0 commit comments