Skip to content

Commit f85b7ca

Browse files
TimDettmersclaude
andcommitted
docs: Add FLUTE kernel analysis guide for kbit GEMM reference
Comprehensive technical analysis of the FLUTE (Flexible Lookup Table Engine) kernel from arxiv 2407.10960, covering architecture, CUTLASS 3 implementation, vectorized LUT with bank-conflict duplication, Stream-K work distribution, and detailed comparison with the bitsandbytes kbit GEMM design across 10 dimensions (codebook lookup, weight packing, scale format, work distribution, bit-width support, framework, tensor cores, pipeline, offline preparation, and trade-off summary). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 03519c1 commit f85b7ca

File tree

1 file changed

+1145
-0
lines changed

1 file changed

+1145
-0
lines changed

0 commit comments

Comments
 (0)