Skip to content

Commit aa4e094

Browse files
PR-G3: rewrite README + add docs/quickstart.md for v0.3 GA
The pre-PR-G3 README was severely out of date: the Release badge pointed at v0.2.0, the ADR list stopped at 0006, no mention of gRPC / SessionStore / SDKs / session-bound architecture. Anyone landing on the GitHub repo today saw a v0.2-era pitch despite v0.3.0 being shipped. Two artifacts: README.md (rewrite, ~490 lines new vs ~490 lines old) - Top: badges refreshed (v0.3.0 release, ADR 0008, MIT license). Architecture diagram showing the SDK -> gRPC -> coordinators -> verifier flow. - Quickstart section near the top: 5-minute clone-to-first- generated-token. Calls out the v0.3.0 GA tag explicitly, uses source-layout PYTHONPATH (works today; PyPI/npm publishing flagged as 'queued for v0.3.1' in the Roadmap table at the bottom). - 'What's in the v0.3 architecture' table mapping each component to its module path. - 'v0.3 GA evidence' table summarizing the binding Mac M4 results: 9 ms latency drift over 14400 s (4400x improvement vs v0.2.0), 480 turns / 4 h, 1.829s p50. - SDK examples for both Python (kakeya) and TypeScript (@kakeya/runtime). - Deprecated-HTTP-shim section explaining the Deprecation / Sunset headers and the curl-friendly migration path. - 'Architecture & Background' as a SECONDARY section: keeps the v0.2 Net-Bytes-per-Token analysis + compression-regime measurements + 4-bit MLX section, but no longer fronting them as the project's primary value prop. - Project layout diagram updated for v0.3 directory shape. - Roadmap table: v0.3.0 ✅ shipped, v0.3.1 deployment polish queued, v0.4 proposer-back-in / alignment / cross-request KV reuse designing. - CI section split into Linux gate (verifier-independent) vs Mac M4 gate (verifier-dependent, label-gated). - ADR list now includes ADR 0008 with a 'load-bearing for v0.3' marker. docs/quickstart.md (new, ~390 lines) Detailed 10-minute walkthrough: - Time-budget table (Mac M4 warm/cold vs Linux x86 CPU) - Step 1: clone + checkout v0.3.0 - Step 2: install (Mac via setup_mac.sh, Linux x86, Linux CUDA, mainland-China mirror via HF_ENDPOINT, dllm import workaround documented) - Step 3: start gRPC server (full flag table) - Step 4: SDK call examples (synthetic ids; real tokenizer flow) - Step 5: multi-turn conversation pattern that demonstrates the session-bound architecture's killer feature - Step 6: graceful stop - Troubleshooting: kv_live_bytes=0 (pre-PR-E1c), capacity exhaustion, mid-conversation NotFound, alternative models (bf16 1.7B, 4-bit MLX), authentication / TLS deferral notes, mainland-China networking - Common patterns: session reset, single-session-per- process, deprecated HTTP shim usage Per ADR 0008 \u00a79: docs-only PR. No Python source changes; no unit-test impact; no Mac M4 evidence required for merge. Stack ----- PR-G3 is independent of PR-G5 (model prewarm CLI) and PR-G6 (chat REPL). The README references both as 'queued for v0.3.1' in the Roadmap table; once they ship, a small follow-up commit inserts the actual command lines into the quickstart. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
1 parent 6399546 commit aa4e094

2 files changed

Lines changed: 640 additions & 412 deletions

File tree

0 commit comments

Comments
 (0)