Skip to content

Best stack followup 2026 05 17#26

Merged
Lightheartdevs merged 3 commits into
mainfrom
best-stack-followup-2026-05-17
May 17, 2026
Merged

Best stack followup 2026 05 17#26
Lightheartdevs merged 3 commits into
mainfrom
best-stack-followup-2026-05-17

Conversation

@Lightheartdevs

Copy link
Copy Markdown
Contributor

No description provided.

Michael Bradley and others added 3 commits May 17, 2026 13:54
ctx16384_gen0512_conc1: prefill 84.05 tok/s, decode 7.011 ± 0.001.
Confirms the long-context pattern from gen=128: decode steady ~7.0 tok/s,
prefill ~84 tok/s (matches gen=128's 84.08). The older-engine prefill
cost is stable across gen-length within a ctx tier.

README/findings/manifest updated 7→8 cells.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
….error)

ctx16384_gen2048_conc1 produced 10/10 timeouts at exactly 300.0 s wall time
per request. Cell artifact has .error (no .done), as expected by the
post-review fix to the bench-cell driver. Estimated true request length at
this cell is ~480 s (186 s prefill + 293 s decode) — well over the 300 s
ceiling Lemonade Server / FastAPI imposes.

The entire ctx=32K tier is expected to fail the same way: at ctx=32K,
prefill alone is ~371 s. Canonical Vulkan b9151 does not have this ceiling
on the same hardware (its ctx=32K gen=2048 cell completes in ~250 s).
This is now documented as an additional buyer-relevant finding in
findings.md and as the cells_with_error_marker bookkeeping in manifest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ror)

ctx32768_gen0128_conc1 produces 10/10 timeouts at 300.0 s — as predicted
in the previous commit's known_gaps. Prefill at ctx=32K alone is ~371 s
(84 tok/s × 31186 prompt tokens), already over the ceiling before any
decode begins.

cells_with_error_marker 1 → 2. ctx=32K gen=512 and gen=2048 still running
and are expected to fail identically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Lightheartdevs Lightheartdevs merged commit aa7b9f0 into main May 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant