fix: remove virtual allocation from DeepSeek key takeaways by solderzzc · Pull Request #83 · SharpAI/SwiftLM

solderzzc · 2026-04-24T21:40:15Z

Removes '33% lower GPU allocation (40.6 GB vs 60.5 GB)' — virtual numbers that are misleading. Explains the speed difference in plain language instead.

Copilot

Pull request overview

Updates the DeepSeek-V4-Flash benchmark “Key takeaways” in the README to remove misleading virtual GPU allocation numbers and replace them with a clearer, plain-language explanation tied to KV-cache compression and SSD streaming.

Changes:

Removes the “33% lower GPU allocation (40.6 GB vs 60.5 GB)” claim from the DeepSeek key takeaways.
Adds a plain-language explanation for the long-context speedup (KV-cache compression reducing SSD streaming pressure).
Simplifies the “Peak physical RAM” takeaway to avoid referencing “GPU InUse”.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-24T21:41:54Z

 **Key takeaways:**
- 🏆 **SSD + TurboQuant dominates at long context** — 4.16 tok/s at 40K vs 0.32 tok/s for plain SSD Stream (**13× faster**), with 33% lower GPU allocation (40.6 GB vs 60.5 GB).
+- 🏆 **SSD + TurboQuant dominates at long context** — 4.16 tok/s at 40K vs 0.32 tok/s for plain SSD Stream (**13× faster**). TurboQuant compresses the KV cache so far fewer layers need to stream from SSD per token.
 - At 512-token context all configurations perform similarly (~4.4–4.8 tok/s); TurboQuant's advantage is KV-cache compression at long context.


In this Key takeaways section, the terminology is inconsistent between “KV cache” (line 91) and “KV-cache” (line 92). Please standardize the phrasing (pick one form) within the section for readability/searchability.

Suggested change

- At 512-token context all configurations perform similarly (~4.4–4.8 tok/s); TurboQuant's advantage is KV-cache compression at long context.

- At 512-token context all configurations perform similarly (~4.4–4.8 tok/s); TurboQuant's advantage is KV cache compression at long context.

fix: remove virtual allocation reference from DeepSeek key takeaways

e51a822

Copilot AI review requested due to automatic review settings April 24, 2026 21:40

solderzzc merged commit 05d0b6c into main Apr 24, 2026
1 check passed

Copilot started reviewing on behalf of solderzzc April 24, 2026 21:40 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove virtual allocation from DeepSeek key takeaways#83

fix: remove virtual allocation from DeepSeek key takeaways#83
solderzzc merged 1 commit into
mainfrom
fix/readme-takeaways-v2

solderzzc commented Apr 24, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	- At 512-token context all configurations perform similarly (~4.4–4.8 tok/s); TurboQuant's advantage is KV-cache compression at long context.
	- At 512-token context all configurations perform similarly (~4.4–4.8 tok/s); TurboQuant's advantage is KV cache compression at long context.

Conversation

solderzzc commented Apr 24, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants