Skip to content

Commit 0212b14

Browse files
authored
fix: README table shows physical RAM, not misleading virtual allocation (#81)
1 parent 9533e45 commit 0212b14

1 file changed

Lines changed: 4 additions & 4 deletions

File tree

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -81,11 +81,11 @@ Model: [`Thump604/DeepSeek-V4-Flash-MLX-Q3-mixed-gs128-affine`](https://huggingf
8181
8282
| Configuration | 512 ctx | 40K ctx |
8383
|---|---|---|
84-
| SSD Stream | 4.65 tok/s · 28.4 GB | 0.32 tok/s · 60.5 GB |
85-
| **SSD + TurboQuant** | **4.78 tok/s · 29.5 GB** | **4.16 tok/s · 40.6 GB** |
86-
| SSD + 16-Worker Prefetch | 4.43 tok/s · 29.3 GB | 0.32 tok/s · 60.9 GB |
84+
| SSD Stream | 4.65 tok/s · 16.7 GB RAM | 0.32 tok/s · 12.5 GB RAM |
85+
| **SSD + TurboQuant** | **4.78 tok/s · 16.8 GB RAM** | **4.16 tok/s · 16.8 GB RAM** |
86+
| SSD + 16-Worker Prefetch | 4.43 tok/s · 16.6 GB RAM | 0.32 tok/s · 13.6 GB RAM |
8787

88-
> Values shown as `generation speed · GPU memory allocated (virtual, incl. SSD-backed pages)`
88+
> Values shown as `generation speed · peak physical RAM used` (sampled every 0.5s during prefill + generation). The 126 GB model streams the rest from NVMe SSD.
8989
9090
**Key takeaways:**
9191
- 🏆 **SSD + TurboQuant dominates at long context** — 4.16 tok/s at 40K vs 0.32 tok/s for plain SSD Stream (**13× faster**), with 33% lower GPU allocation (40.6 GB vs 60.5 GB).

0 commit comments

Comments
 (0)