Skip to content

Commit 6d5087e

Browse files
committed
knowledge: Scout full compression results (37.88 MB / 215.57 GB = 5,693×)
1 parent 149daf0 commit 6d5087e

1 file changed

Lines changed: 44 additions & 0 deletions

File tree

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Llama 4 Scout 17B-16E — Full Model Compression Results
2+
3+
## Pipeline
4+
5+
BF16-direct → golden-step Base17 → bgz7 container
6+
7+
- `stream_index_gguf_bf16()` with `octave_stride=16`
8+
- F64x8 SIMD: 8 rows projected in parallel per zmm register
9+
- Halftone drop: 9 of 17 golden-step positions, odd bins interpolated
10+
- No f32 intermediate allocation (BF16 → f64 inline)
11+
- Reusable u16 buffer across all tensors
12+
13+
## Results
14+
15+
| Shard | Source (BF16) | Compressed | Ratio |
16+
|-------|---------------|------------|-------|
17+
| 1 | 48.94 GB | 11.77 MB | 4,159× |
18+
| 2 | 49.96 GB | 8.32 MB | 6,005× |
19+
| 3 | 48.66 GB | 5.57 MB | 8,736× |
20+
| 4 | 49.79 GB | 4.52 MB | 11,016× |
21+
| 5 | 18.22 GB | 7.70 MB | 2,366× |
22+
| **Total** | **215.57 GB** | **37.88 MB** | **5,693×** |
23+
24+
## Observations
25+
26+
- Shard 1 (embeddings + early layers): larger output due to embedding table
27+
- Shards 3-4 (middle MoE layers): highest ratios — expert weights are
28+
highly structured, golden-step averaging captures the per-expert identity
29+
in 34 bytes per row
30+
- Shard 5 (final layers + output head): lower ratio — output projection
31+
has more variance than interior MoE expert weights
32+
33+
## Location
34+
35+
`src/hpc/openchat/weights/llama4_scout_shard{1-5}.bgz7`
36+
37+
## Implications for Maverick
38+
39+
Maverick has 128 experts (8× Scout). The MoE layers dominate even more.
40+
If the per-expert ratio holds (~6,000-11,000× on interior shards),
41+
Maverick's 801 GB could compress to 90-180 MB.
42+
43+
Conservative estimate: ~300 MB (if embedding/attention layers scale worse).
44+
Optimistic estimate: ~90 MB (if expert sparsity is even higher with 128E).

0 commit comments

Comments
 (0)