Skip to content

Commit 2f04ac7

Browse files
TimDettmersclaude
andcommitted
docs: Add GLM-4.7 355B streaming simulation and hardware analysis
Complete simulation of QLoRA fine-tuning with weight streaming for GLM-4.7 355B MoE, calibrated against RTX 4090 matmul benchmarks. - streaming_sim.py: Full simulation with memory budgets, compute/transfer overlap, optimal resident/batch sweep across 5 GPUs, 6 quant formats (NF4, NF3, NF2, NF4d+NF2e, NF4d+NF3e, NVFP4), 7 storage configs, and pipeline parallelism modeling - bench_matmul.py: BF16 and NF4 dequant+matmul benchmarks for GPU utilization calibration (measured 81-97% on RTX 4090) - GLM47_ANALYSIS.md: Complete analysis document covering the resident/batch trade-off, NVFP4 on Blackwell (2.74x effective speedup), AM5 x8/x8 validation, and 4 hardware build recommendations ($2.7K-$7.6K) Key finding: optimal resident/batch split achieves 0% streaming overhead across all tested configurations. GPU utilization calibrated at 70% (conservative vs measured 81-97%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 5263e72 commit 2f04ac7

File tree

3 files changed

+2001
-0
lines changed

3 files changed

+2001
-0
lines changed

0 commit comments

Comments
 (0)