Skip to content

Commit 352a02e

Browse files
unamedkrclaude
andcommitted
Fix misleading PyTorch comparison: clarify F32 vs Q4 conditions
Community feedback pointed out the 59x claim was not apples-to-apples. Added F32/Q4 labels, removed multiplier claims, added disclaimer notes, and noted upcoming llama.cpp Q4-vs-Q4 fair benchmark. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 668f017 commit 352a02e

2 files changed

Lines changed: 24 additions & 16 deletions

File tree

README.ko.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,15 @@
99
[![Build](https://img.shields.io/badge/build-passing-brightgreen)]()
1010
[![Tests](https://img.shields.io/badge/tests-70%2B%20pass-brightgreen)]()
1111
[![License](https://img.shields.io/badge/license-Apache%202.0-blue)]()
12-
[![Speed](https://img.shields.io/badge/47%20tok%2Fs-Qwen3.5--0.8B-blue)]()
12+
[![Speed](https://img.shields.io/badge/47%20tok%2Fs%20(Q4)-Qwen3.5--0.8B-blue)]()
1313

1414
```
15-
PyTorch CPU: 0.8 tok/s
16-
PyTorch GPU: 10 tok/s
17-
TurboQuant CPU: 47 tok/s ← 59배 빠름, GPU 불필요
15+
PyTorch CPU (F32): 0.8 tok/s
16+
PyTorch GPU (F32): 10 tok/s
17+
TurboQuant CPU (Q4): 47 tok/s ← GPU 불필요
1818
```
19+
> **참고:** PyTorch는 F32, TurboQuant는 Q4 — 동일 조건 비교가 아닙니다.
20+
> 핵심 기여는 KV 캐시 압축(7.5x)과 정수 어텐션이며, 비양자화 PyTorch를 이기는 것이 아닙니다.
1921
2022
---
2123

@@ -45,15 +47,17 @@ that uses artificial neural networks to learn complex patterns...
4547

4648
## 왜 TurboQuant인가?
4749

48-
| | PyTorch | TurboQuant.cpp |
50+
| | PyTorch (F32) | TurboQuant.cpp (Q4) |
4951
|---|---|---|
50-
| **속도** | 0.8 tok/s | **47 tok/s** (59배) |
52+
| **속도** | 0.8 tok/s | **47 tok/s** |
5153
| **로딩** | 3초 | **0.3초** (mmap) |
52-
| **가중치 메모리** | 1.7 GB | **270 MB** (Q4) |
54+
| **가중치 메모리** | 1.7 GB (F32) | **270 MB** (Q4) |
5355
| **KV 캐시** | 전체 크기 | **7.5배 압축** |
5456
| **의존성** | PyTorch, transformers | **없음** |
5557
| **바이너리** | ~2 GB 설치 | **~1 MB** |
56-
| **품질** | 기준 | **코사인 0.999** (PyTorch 대비) |
58+
| **품질** | 기준 (F32) | **코사인 유사도 0.999** |
59+
60+
> 속도 차이는 주로 Q4 양자화에 기인합니다. llama.cpp 대비 Q4-vs-Q4 공정 벤치마크를 준비 중입니다.
5761
5862
---
5963

README.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,15 @@ Load → Generate → Done. No Python. No GPU. Just one binary.
99
[![Build](https://img.shields.io/badge/build-passing-brightgreen)]()
1010
[![Tests](https://img.shields.io/badge/tests-70%2B%20pass-brightgreen)]()
1111
[![License](https://img.shields.io/badge/license-Apache%202.0-blue)]()
12-
[![Speed](https://img.shields.io/badge/47%20tok%2Fs-Qwen3.5--0.8B-blue)]()
12+
[![Speed](https://img.shields.io/badge/47%20tok%2Fs%20(Q4)-Qwen3.5--0.8B-blue)]()
1313

1414
```
15-
PyTorch CPU: 0.8 tok/s
16-
PyTorch GPU: 10 tok/s
17-
TurboQuant CPU: 47 tok/s ← 59x faster, no GPU needed
15+
PyTorch CPU (F32): 0.8 tok/s
16+
PyTorch GPU (F32): 10 tok/s
17+
TurboQuant CPU (Q4): 47 tok/s ← no GPU needed
1818
```
19+
> **Note:** PyTorch runs F32, TurboQuant runs Q4 — not an apples-to-apples comparison.
20+
> The real contribution is KV cache compression (7.5x) and integer attention, not beating unquantized PyTorch.
1921
2022
---
2123

@@ -45,15 +47,17 @@ that uses artificial neural networks to learn complex patterns...
4547

4648
## Why TurboQuant?
4749

48-
| | PyTorch | TurboQuant.cpp |
50+
| | PyTorch (F32) | TurboQuant.cpp (Q4) |
4951
|---|---|---|
50-
| **Speed** | 0.8 tok/s | **47 tok/s** (59x) |
52+
| **Speed** | 0.8 tok/s | **47 tok/s** |
5153
| **Loading** | 3 sec | **0.3 sec** (mmap) |
52-
| **Weight Memory** | 1.7 GB | **270 MB** (Q4) |
54+
| **Weight Memory** | 1.7 GB (F32) | **270 MB** (Q4) |
5355
| **KV Cache** | Full size | **7.5x compressed** |
5456
| **Dependencies** | PyTorch, transformers, torch | **None** |
5557
| **Binary Size** | ~2 GB installed | **~1 MB** |
56-
| **Quality** | Baseline | **0.999 cosine** (vs PyTorch) |
58+
| **Quality** | Baseline (F32) | **0.999 cosine similarity** |
59+
60+
> Speed difference is largely due to Q4 quantization. A fair Q4-vs-Q4 benchmark against llama.cpp is planned.
5761
5862
---
5963

0 commit comments

Comments
 (0)