Skip to content

Commit fd090ad

Browse files
wdunn001claude
andcommitted
v0.5.0: Hero banner + cohort image tags + new changelog entry
- Hero eyebrow: v0.4.1 shipping -> v0.5.0 shipping - Benchmarks card image refs: codec-sglang:v0.4.1 -> :v0.5.0, (all v0.4.1) -> (all v0.5.0) - /changelog/ gains 2026-05-18-v0-5-efficiency-observability.md covering the 4 new opt-in wire surfaces (delta-varint, discoverable zstd dicts, GPU latent quantize, bolt-on tool dispatcher), the 11-artifact cohort, the engine cohort change (TGI dropped), bench unchanged at byte level (wire-additive invariant), upstream PRs at sgl-project/sglang#25544 + vllm-project/vllm#42896, IETF I-D status. Historical v0.4.1 references in bench card subtitles / page- section comments / protocol-map descriptions left in place; they document when features landed and remain accurate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4a10e1f commit fd090ad

3 files changed

Lines changed: 75 additions & 3 deletions

File tree

src/components/Benchmarks.astro

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -569,7 +569,7 @@ const toolBars = [
569569
<header class="bench-card__head">
570570
<div>
571571
<h3 class="bench-card__title">Agent loops &mdash; end-to-end tool dispatch</h3>
572-
<p class="bench-card__sub">codec-sglang:v0.4.1 &middot; Qwen2.5-0.5B &middot; prompt &rarr; model emits tool call &rarr; real dispatch &rarr; tool result &rarr; final answer</p>
572+
<p class="bench-card__sub">codec-sglang:v0.5.0 &middot; Qwen2.5-0.5B &middot; prompt &rarr; model emits tool call &rarr; real dispatch &rarr; tool result &rarr; final answer</p>
573573
</div>
574574
<div class="bench-card__hero">
575575
<span class="bench-card__hero-num">16.9&ndash;18.0&times;</span>
@@ -1032,7 +1032,7 @@ const toolBars = [
10321032
Source: <a href="https://github.com/wdunn001/Codec/blob/main/packages/bench/results/2026-05-15T20-00-00Z/MATRIX.md" rel="noopener">cross-stack MATRIX.md</a>
10331033
&middot; <a href="https://hub.docker.com/r/wdunn001/codec-sglang" rel="noopener">codec-sglang</a>,
10341034
<a href="https://hub.docker.com/r/wdunn001/codec-vllm" rel="noopener">codec-vllm</a>,
1035-
<a href="https://hub.docker.com/r/wdunn001/codec-llamacpp" rel="noopener">codec-llamacpp</a> (all v0.4.1)
1035+
<a href="https://hub.docker.com/r/wdunn001/codec-llamacpp" rel="noopener">codec-llamacpp</a> (all v0.5.0)
10361036
&middot; Qwen-2.5 0.5B &middot; RTX&nbsp;3090 &middot; temp 0.0
10371037
&middot; reproducible from <code>packages/bench/scripts/run-all-langs.sh</code>
10381038
(cross-stack matrix), <code>synthetic_wire_bench.py</code> (§1 protocol-only),

src/components/Hero.astro

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ const codecCells = Array.from({ length: codecTokenCount }, (_unused, t) => {
6161
<section class="section section--hero hero">
6262
<div class="container">
6363
<p class="eyebrow">
64-
v0.4.1 shipping &middot; source-available &middot;
64+
v0.5.0 shipping &middot; source-available &middot;
6565
<a href="/changelog/" style="color: var(--data); text-decoration: none;">what's new &rarr;</a>
6666
</p>
6767
<h1 class="hero__title">
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
---
2+
title: v0.5.0 — efficiency, observability, and cohort honesty
3+
date: "2026-05-18"
4+
kind: release
5+
version: v0.5.0
6+
summary: Wire-additive over v0.4 (v0.4 → v0.5 happy-path bytes identical). Four new opt-in surfaces — delta-varint stream encoding, discoverable Zstandard dictionaries, GPU-side latent quantize, bolt-on tool dispatcher. 11 client artifacts bumped to 0.5.0 across npm, PyPI, NuGet, crates.io, Maven Central. Engine cohort cut to sglang + vLLM + llama.cpp + ComfyUI + diffusers (TGI dropped). 72/72 wire + 72/72 decode unanimous on the cross-stack matrix; numbers byte-identical to v0.4.1, confirming the wire-additive invariant. Upstream PRs filed at sgl-project/sglang#25544 and vllm-project/vllm#42896, both DCO-signed and through bot review.
7+
links:
8+
- label: GitHub Release v0.5.0
9+
url: https://github.com/wdunn001/Codec/releases/tag/v0.5.0
10+
- label: CHANGELOG entry
11+
url: https://github.com/wdunn001/Codec/blob/main/CHANGELOG.md#v050--2026-05-18
12+
- label: Cross-stack matrix
13+
url: https://github.com/wdunn001/Codec/blob/main/packages/bench/results/2026-05-17T23-06-45Z/MATRIX.md
14+
- label: IETF Internet-Draft (draft-dunn-codec-00)
15+
url: https://github.com/wdunn001/Codec/blob/main/docs/submissions/draft-dunn-codec-00.md
16+
- label: sglang upstream PR
17+
url: https://github.com/sgl-project/sglang/pull/25544
18+
- label: vLLM upstream PR
19+
url: https://github.com/vllm-project/vllm/pull/42896
20+
---
21+
22+
v0.5.0 ships four new wire surfaces — all opt-in — without changing the v0.4 happy path. Every existing v0.4 client decodes a v0.5 server byte-for-byte unless it explicitly negotiates a new surface via `stream_format`, `Accept-Encoding`, or a new env var.
23+
24+
## Four new opt-in wire surfaces
25+
26+
**Delta-varint stream encoding.** New `stream_format` values `"msgpack-delta"` and `"protobuf-delta"`. Frames carry `base_id` plus zigzag-encoded deltas against the prior frame's last identifier; stateless framing preserved. ~10–15% wire reduction pre-zstd, ~3–5% post-zstd. Python reference impl; engine-side emit pending in v0.5.x.
27+
28+
**Discoverable Zstandard dictionaries.** Engines now publish their pre-trained dicts at `<origin>/.well-known/codec/dicts/<sha256>.zstd`. Hash-pinned: the client MUST verify the bytes hash to the URL component. Closes the v0.4.1 silent-COPY-dicts-drop regression class — dictionary drift now fails loudly (404 or hash-mismatch) instead of falling back silently to identity bytes. Release-checklist §1.7 codifies a four-sub-gate audit; the v0.5 cut actually caught a llama.cpp regression where `master` was vanilla upstream without the codec patches and the engine was silently serving identity-encoded msgpack.
29+
30+
**GPU-side latent quantize fast path.** `LatentStreamEncoderOptions.gpu_quantize=True` accepts a CUDA `torch.Tensor`, quantizes on-device, and transfers the int4/int8 result instead of the fp16 latent. ~75% PCIe reduction on int4 SDXL; smaller wins at SD-1.5.
31+
32+
**Bolt-on tool dispatcher.** The engine can dispatch directly to tools published via the `@codecai/tool-kit` manifest, without ever detokenizing the model's `<tool_call>` region. Manifest schema + `_codec_meta` envelope let a tool author publish pre-tokenized IDs that flow into and out of the engine's generation context.
33+
34+
## 11 packages at 0.5.0
35+
36+
- **npm**: `@codecai/{web, web-safety, web-llm, maps-cli, mcp-leaf, tool-kit, wire-compress}`
37+
- **PyPI**: `codecai`
38+
- **NuGet**: `Codec.Net`
39+
- **crates.io**: `codec-rs`
40+
- **Maven Central**: `ai.codec:codec`
41+
42+
New cross-cohort surfaces: content-aware + per-stack-aware compression picker rewrite with a typed `PickReasonCode` enum, `policies-enumerate` subcommand on `@codecai/maps-cli` (resolves v0.4-OQ4), `@codecai/tool-kit` promoted to first-class family member with a runnable reference tool (`@codecai/codec-time-tool`).
43+
44+
## Engine cohort
45+
46+
`wdunn001/codec-{sglang,vllm,llamacpp,comfyui,diffusers}:v0.5.0` and `:latest` live on Docker Hub. Each image bakes the canonical zstd dicts at `/opt/codec/dicts/`, ships the `/opt/codec/check-dict-availability.sh` probe, and is dep-verified for `import brotli, zstandard, msgpack` before push.
47+
48+
Upstream PRs filed at [sgl-project/sglang#25544](https://github.com/sgl-project/sglang/pull/25544) and [vllm-project/vllm#42896](https://github.com/vllm-project/vllm/pull/42896). Both DCO-signed; both through five gemini-code-assist bot review-fix iterations (struct.unpack bytes path, hardened `_decode_varint` shift-cap, async dispatch, cached registry, manifest dict-shape guard).
49+
50+
`wdunn001/codec-tgi` is **dropped** — TGI treated as a dead project; the cohort is now five engines.
51+
52+
## Bench: byte-identical to v0.4.1
53+
54+
The §1 + §1b numbers are unchanged from v0.4.1 — which is exactly what wire-additive is supposed to mean. The §1.7 and §1.9 gates added in this release exist to guarantee that, not change it.
55+
56+
**§1b engine-output @ 2K tokens, Codec msgpack + dict-zstd:**
57+
58+
| Engine | JSON-SSE | Best Codec | Reduction |
59+
|------------|----------:|-----------:|-----------:|
60+
| llama.cpp | 528.8 KB | 140 B | **3,868×** |
61+
| sglang | 485.2 KB | 291 B | **1,707×** |
62+
| vllm | 517.8 KB | 3.9 KB | **137×** |
63+
64+
**§2 cross-language interop:** **72/72 wire-unanimous + 72/72 decode-unanimous** across three engines and six client languages. vllm required `REPS=4` to median out its documented ~10–20% scheduler variance at T=0; ran clean on the second pass.
65+
66+
## IETF Internet-Draft
67+
68+
`draft-dunn-codec-00` rewritten to RFC 2026 compliance. Required sections present, kramdown-rfc compatible frontmatter, threat model expanded with five inline Codec-specific threats (binary-WAF blindness, capability-trust, discovery cache poisoning, frame-size + varint exhaustion, sentinel-identifier integrity), explicit out-of-specification behaviour table, liberal/conservative acceptance rules, implementation-experience section. Companion `SUBMITTING.md` walkthrough covers the `kdrfc` → datatracker submission flow.
69+
70+
## Migration
71+
72+
v0.4.1 → v0.5.0 is non-breaking. Bump the package version; nothing else changes for existing v0.4 consumers. To opt into new surfaces, set the appropriate env var or request field — see the [CHANGELOG entry](https://github.com/wdunn001/Codec/blob/main/CHANGELOG.md#v050--2026-05-18) for the per-surface opt-in matrix.

0 commit comments

Comments
 (0)