fix: README table shows physical RAM, not virtual allocation#81
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the DeepSeek-V4-Flash benchmark section in the README to report a more user-meaningful memory metric (peak physical RAM) instead of GPU virtual allocation, aligning the table with real RAM pressure during long-context runs.
Changes:
- Replaces DeepSeek-V4-Flash table memory figures with peak physical RAM values.
- Updates the table footnote to explain sampling methodology (0.5s polling during prefill + generation) and SSD streaming context.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| > Values shown as `generation speed · peak physical RAM used` (sampled every 0.5s during prefill + generation). The 126 GB model streams the rest from NVMe SSD. | ||
|
|
||
| **Key takeaways:** | ||
| - 🏆 **SSD + TurboQuant dominates at long context** — 4.16 tok/s at 40K vs 0.32 tok/s for plain SSD Stream (**13× faster**), with 33% lower GPU allocation (40.6 GB vs 60.5 GB). |
There was a problem hiding this comment.
The DeepSeek key takeaway still refers to the old metric ("33% lower GPU allocation (40.6 GB vs 60.5 GB)") even though the table and note were updated to report peak physical RAM. This is now inconsistent/misleading; please update this takeaway to either compare the new RAM numbers or clearly label GPU_Alloc (virtual) as a separate metric if you still want to mention it.
The compact benchmark table was showing GPU virtual allocation (28–61 GB) next to speeds — misleading, since virtual alloc includes SSD-backed pages and doesn't represent actual RAM usage. Fixed to show peak physical RAM (12.5–16.8 GB), which is what users actually care about.