Skip to content

Commit 17d659a

Browse files
committed
docs: refine Gemma4 perf stats, add pp/tg abbreviation key
1 parent 75c1366 commit 17d659a

2 files changed

Lines changed: 10 additions & 4 deletions

File tree

docs-site/src/content/docs/integrations/local-llms.mdx

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -437,8 +437,11 @@ llama-server \
437437

438438
**Performance (M1 Max 64 GB, ~37K input tokens):**
439439

440-
- Cold start: pp 395 tok/s, tg 40 tok/s (96s total)
441-
- Cached follow-up: pp 110 tok/s, tg 40 tok/s (6s total)
440+
pp = prompt processing, tg = token generation.
441+
442+
- Cold start: pp 395 tok/s, tg 40 tok/s (~96s total)
443+
- Cached follow-up: tg 40 tok/s (~6s total, prompt
444+
cached in ~0.4s)
442445

443446
| Quant | Size | Notes |
444447
|-------|------|-------|

docs/local-llm-setup.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -278,8 +278,11 @@ llama-server -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL \
278278

279279
**Performance (M1 Max 64 GB, ~37K input tokens):**
280280

281-
- Cold start: pp 395 tok/s, tg 40 tok/s (96s total)
282-
- Cached follow-up: pp 110 tok/s, tg 40 tok/s (6s total)
281+
pp = prompt processing, tg = token generation.
282+
283+
- Cold start: pp 395 tok/s, tg 40 tok/s (~96s total)
284+
- Cached follow-up: tg 40 tok/s (~6s total, prompt
285+
cached in ~0.4s)
283286

284287
**Quantization options:**
285288

0 commit comments

Comments
 (0)