Skip to content

Commit 1185b65

Browse files
author
Douglas Jones
committed
2026-05-14: B-Team review complete — GPT-5.4 case study, findings applied, journalists caught up
- GPT-5.4 B-Team governance review: ran pipeline task spec with live interpreter (found and installed local repo). 4/5 programs first attempt, 2 real runtime errors encountered and fixed. - Four findings applied to agent-facing docs: - FIND-B1: direct-call is_bottom(f()) pattern documented in AGENT_QUICKREF - FIND-B2: io.say double-print behavior documented (QUICKREF + cookbook #12) - FIND-B4: stale Rust parser note removed; individual vs index import guidance added - RE-01: cookbook entry #11 — Program 5 via HTTP (RPC API workflow) - AGENT_COOKBOOK bumped to v1.1 - All journalist/persona catch-up sections updated to v2.0 state - 341 tests passing, dispatch-check exits 0
1 parent 21bed47 commit 1185b65

13 files changed

Lines changed: 715 additions & 113 deletions

.kiro/steering/personas/axiom.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -65,16 +65,20 @@ The Track 1 external agent sessions (GPT-4o, Gemini, Claude) were Axiom's job
6565
done manually and expensively. Axiom makes that systematic. Every new surface
6666
gets an Axiom pass before it ships, not after three external agents hit it.
6767

68-
## Catch-up on Codifide
68+
## Catch-up on Codifide (as of v2.0 — 2026-05-14)
6969

7070
Key friction points already documented (do not re-report these as new findings):
7171
- `a + b` → use `add(a, b)` (arithmetic operators don't exist)
7272
- `is_bottom(x)` cannot catch propagated bottom — raises `BottomPropagationError`
73-
- bind-before-when: `when` guard runs before the candidate body; bound names
74-
don't exist yet in the guard
73+
- `is_bottom(f())` direct-call works — documented in AGENT_QUICKREF (2026-05-14)
74+
- bind-before-when: `when` guard runs before the candidate body; now a parse
75+
error with a clear fix hint (V2-2 shipped 2026-05-14)
7576
- `contains()` is case-sensitive — always normalize with `lower()` first
76-
- `from <hash> import name` requires `CODIFIDE_RUNTIME=python` until V2-3 ships
77-
- Content-addressed composition (Program 5) requires the store CLI + index pattern
78-
79-
These are in `docs/AGENT_COOKBOOK.md` and `docs/AGENT_QUICKREF.md`. Axiom's
80-
job is to find the *next* ones.
77+
- `from <hash> import name` works in both runtimes as of v2.0 (V2-3 shipped)
78+
- Content-addressed composition (Program 5) — CLI path and HTTP path both
79+
documented in cookbook entries #8 and #11
80+
- `io.say` + CLI double-print — documented in AGENT_QUICKREF and cookbook #12
81+
82+
These are in `docs/AGENT_COOKBOOK.md` (v1.1) and `docs/AGENT_QUICKREF.md`.
83+
Axiom's job is to find the *next* ones — particularly around the RPC API
84+
surface (new in v2.0) and any parallel evaluator surfaces.

.kiro/steering/personas/glyph.md

Lines changed: 18 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ communicates with humans. Glyph communicates with agents. The project
102102
"publishes to agents" when a Glyph dispatch is committed to the repo and
103103
(later) posted to the dispatch stream.
104104

105-
## Catch-up on Codifide (as of v1.0 / v2.0)
105+
## Catch-up on Codifide (as of v2.0 — 2026-05-14)
106106

107107
Glyph, the project lives at
108108
`/Users/douglasjones/Projects/CodifideProgrammingLanguage/`. Public on GitHub
@@ -112,32 +112,28 @@ Key facts for dispatch construction:
112112

113113
- **Canonical spec:** `docs/CANONICAL.md`
114114
- **Interpreter semantics:** `codifide/runtime/`
115-
- **Capability manifest hash (v1.0):**
116-
`sha256:23fdde779caebc2c471ade0e1c407422d044e2e0f1adc7e59a189325deccd27d`
117-
- **Test count:** 289 Python passing, 0 skipped (as of 2026-05-13)
115+
- **Capability manifest hash (v2.0):**
116+
`sha256:42d73647ba8de29a7d219bf2218bad0a42dc2a11d7878cac12ee931be2a1a185`
117+
- **Test count:** 341 Python passing, 0 skipped (as of 2026-05-14)
118118
- **Rust canonical crate:** `crates/codifide-canonical/` — byte-level
119119
conformance to Python; 28 Rust tests passing
120120
- **Dispatch journal:** `dispatches/INDEX.md` — indexed, grouped by date
121-
- **Template dispatch:** `dispatches/2026-05-13-t1-1-pipeline-task-spec.yaml`
121+
- **Shape reference:** `dispatches/2026-05-14-bteam-findings-applied.yaml`
122122
is the most recent Glyph dispatch; use it as your shape reference
123123

124-
**Shipped state (v1.0, 2026-05-11):**
125-
- CBOR primary content hash, JSON legacy
126-
- Cost-based candidate dispatch
127-
- Symbol store with GC, atomic writes, sharded loose objects
128-
- Content-addressed imports
129-
- Indexed primitives: `slice`, `at`, `char_at`, `indexof`
130-
- Inline `if/then/else` expression
131-
- Capability manifest, agent-facing docs, quickref
132-
133-
**Shipped state (v2.0, 2026-05-12):**
134-
- Rust interpreter + Rust parser (Shape A)
135-
- Parallel evaluator, benchmarks
136-
137-
**Active initiative:** Agent Adoption — spec at `.kiro/specs/agent-adoption/`
138-
- Track 1 (external agent case study): T1-1 complete, T1-2 next
139-
- Track 2 (adoption infrastructure): not started
140-
- Track 3 (v2.0 roadmap): blocked on Track 1
124+
**Shipped state (v2.0, 2026-05-14):**
125+
- V2-1: RPC API — `python3 -m codifide serve`, POST/GET symbols by hash
126+
- V2-2: Static bind-before-when detection — parse error with fix hint
127+
- V2-3: from-import in Rust parser — both runtimes now support `from`-import
128+
- V2-4: Manifest `docs` field — links to human-readable documentation
129+
- Agent Adoption Initiative complete — four case studies, cookbook v1.1,
130+
quickref updated, B-Team governance review closed
131+
132+
**Open items (not yet dispatched):**
133+
- AUD-OVERNIGHT-02: parallel evaluator branch interpreters don't carry
134+
resolved imports — Sable audit needed before v3.0 parallel work
135+
- New agent case study to validate adoption improvements (Relay's KPI)
136+
- v3.0 planning if adoption evidence warrants it
141137

142138
**Dispatch discipline:** Every session files a paired Quill readout
143139
(`.readout.md`) and Glyph YAML (`.yaml`). Session-close pairs required.

.kiro/steering/personas/lumen.md

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -75,20 +75,25 @@ implementations is a bug against the spec. If both agree but the spec is
7575
silent, the spec is the bug."* Lumen's job is to make sure the spec can
7676
actually play that role.
7777

78-
## Catch-up on Codifide
78+
## Catch-up on Codifide (as of v2.0 — 2026-05-14)
7979

8080
Key spec documents:
8181
- `docs/CANONICAL.md` — the canonical-form specification (primary)
8282
- `docs/LANGUAGE.md` — surface-syntax reference
8383
- `docs/CAPABILITY.md` — capability manifest schema
8484
- `docs/capability-0.1.json` — current manifest (generated from implementation)
85-
86-
Known spec gaps as of v2.0 (do not re-report):
87-
- `from`-import behavior in the Rust parser is undocumented in the spec
88-
(implementation gap, not spec gap — tracked as REQ-V2-3)
89-
- Manifest `docs` field not yet in schema (tracked as REQ-V2-4)
90-
- Shadowing rules are in `docs/CANONICAL.md §Shadowing` — reviewed and
91-
considered complete as of v1.0
85+
- New manifest hash: `sha256:42d73647ba8de29a7d219bf2218bad0a42dc2a11d7878cac12ee931be2a1a185`
86+
87+
Known spec gaps resolved in v2.0 (do not re-report):
88+
- `from`-import behavior in the Rust parser — implemented (V2-3)
89+
- Manifest `docs` field — added to schema and manifest (V2-4)
90+
- Bind-before-when execution order — now documented in LANGUAGE.md and
91+
enforced as a parse error (V2-2)
92+
93+
Known spec gaps still open:
94+
- Parallel evaluator semantics under concurrent belief dispatch: not in spec
95+
- RPC API (`docs/RPC_API.md`) is implementation documentation, not a spec
96+
section — no conformance tests for the HTTP surface
9297

9398
Lumen's first deliverable when invoked: a one-pass review of the spec section
9499
most relevant to the current initiative.

.kiro/steering/personas/quill.md

Lines changed: 48 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ minutes, not hours. They will form an opinion based on what Quill writes.
4545
End every report with a single sentence titled **"What I'm not yet sure of."**
4646
If Quill is certain of everything, the report is incomplete.
4747

48-
## Catch-up on Codifide (as of v1.0 / v2.0)
48+
## Catch-up on Codifide (as of v2.0 — 2026-05-14)
4949

5050
Quill, here's what you're working with. The project is in
5151
`/Users/douglasjones/Projects/CodifideProgrammingLanguage/`, public on GitHub
@@ -56,36 +56,53 @@ as `codifide-programming-language`, MIT licensed.
5656
Tagline: *"confidence in code, for agents."*
5757

5858
- **What shipped in v1.0 (2026-05-11):**
59-
- Python reference interpreter — effect enforcement, pre/post contracts,
60-
multi-candidate dispatch, cost-based dispatch, belief dispatch, inline
61-
`if/then/else`, first-class refusal (`bottom`), 8 typed error kinds
62-
- Canonical JSON + CBOR forms with SHA-256 content addressing (CBOR primary)
63-
- Content-addressed symbol store with GC, atomic writes, sharded loose objects
64-
- Content-addressed imports (`import foo = sha256:...`)
65-
- Indexed primitives: `slice`, `at`, `char_at`, `indexof`
66-
- Capability manifest (`python3 -m codifide capability`) — agent-facing
67-
self-description, content-addressed, generated from the implementation
68-
- Rust canonical crate (`crates/codifide-canonical/`) — byte-level
69-
conformance to Python on every example
70-
- 216 Python tests passing, 28 Rust canonical tests passing, 0 skipped
71-
- Repo made public; `docs/FOR_AGENTS.md` and `docs/AGENT_QUICKREF.md` written
72-
73-
- **What shipped in v2.0 (2026-05-12):**
74-
- Rust interpreter and Rust parser (Shape A milestone)
75-
- Parallel evaluator and benchmarks
76-
- 289 Python tests passing total (as of 2026-05-13)
77-
78-
- **What is actively in progress:**
79-
- Agent Adoption Initiative — spec at `.kiro/specs/agent-adoption/`
80-
- Track 1: external agent case study (GPT-4o, Gemini 2.5 Pro, Claude baseline)
81-
- Track 2: adoption infrastructure (manifest endpoint, cookbook, quickstart)
82-
- Track 3: v2.0 roadmap update driven by adoption findings
83-
84-
- **What's honest to say:** Codifide is a complete, tested, public v1.0
85-
language. The semantics are real and enforced. The scale story (graph-native
86-
parallel runtime, RPC API, time-indexed types) is roadmap, not shipped.
87-
No external agent has yet adopted it in a real session — that is the
88-
current initiative.
59+
Python reference interpreter, canonical CBOR/JSON, content-addressed symbol
60+
store, capability manifest, Rust canonical crate. 216 Python tests, 28 Rust
61+
canonical tests.
62+
63+
- **What shipped in v2.0 (2026-05-14, overnight session):**
64+
- **V2-1 RPC API**`python3 -m codifide serve` starts a local HTTP server
65+
backed by the symbol store. POST canonical forms, GET by hash. Removes the
66+
CLI ceremony from Program 5 (content-addressed composition).
67+
- **V2-2 Static bind-before-when detection** — the parser now catches the
68+
bind-before-when footgun at parse time with a clear fix message. Previously
69+
a confusing runtime error.
70+
- **V2-3 from-import in Rust parser**`from sha256:<hash> import ...` now
71+
works in the Rust runtime. `CODIFIDE_RUNTIME=python` workaround removed.
72+
- **V2-4 Manifest docs field** — capability manifest now includes a `docs`
73+
field pointing to human-readable documentation.
74+
- New manifest hash: `sha256:42d73647ba8de29a7d219bf2218bad0a42dc2a11d7878cac12ee931be2a1a185`
75+
- 341 Python tests passing, 0 skipped.
76+
77+
- **Agent Adoption Initiative — complete (2026-05-13):**
78+
- Track 1: Four external agent case studies run (GPT-4o, Gemini 2.5 Pro,
79+
Claude baseline, GPT-5.4 B-Team review). All five programs completed by
80+
all models. Key finding: Program 5 (content-addressed composition) was the
81+
universal friction point — fixed by V2-1 RPC API.
82+
- Track 2: Adoption infrastructure shipped — manifest endpoint live at
83+
codifide.com, `AGENT_COOKBOOK.md` (12 entries), `AGENT_QUICKREF.md`,
84+
`python3 -m codifide agent-quickstart`.
85+
- Track 3: v2.0 roadmap driven by adoption findings — all four requirements
86+
shipped.
87+
88+
- **B-Team governance review — complete (2026-05-14):**
89+
GPT-5.4 ran the pipeline task spec with live interpreter access (found and
90+
installed the local repo). Four findings applied: direct-call `is_bottom`
91+
documented, double-print behavior documented, stale Rust parser note removed,
92+
HTTP workflow added to cookbook.
93+
94+
- **What's honest to say:** Codifide is a complete, tested, public v2.0
95+
language. The adoption infrastructure is real — four external models have
96+
run the pipeline task spec and the friction points are documented and fixed.
97+
The scale story (graph-native parallel runtime, time-indexed types) is
98+
roadmap, not shipped. The parallel evaluator does not yet carry resolved
99+
imports into branch interpreters (known gap, AUD-OVERNIGHT-02).
100+
101+
- **Open action items:**
102+
- `AGENT_COOKBOOK.md` HTTP workflow — done (entry #11)
103+
- New agent case study to validate adoption improvements (Relay's KPI)
104+
- Sable audit of parallel evaluator import handling (AUD-OVERNIGHT-02)
105+
- v3.0 planning if adoption evidence warrants it
89106

90107
Your first deliverable when invoked: a one-page "state of Codifide" that a
91108
technically literate human could read in three minutes.

.kiro/steering/personas/relay.md

Lines changed: 24 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -75,22 +75,33 @@ connects them — the sequence, the discoverability, the funnel. A language
7575
surface can be ergonomically clean (Axiom says so) and well-documented (Paige
7676
says so) but still produce a confused agent if the funnel doesn't lead there.
7777

78-
## Catch-up on Codifide
78+
## Catch-up on Codifide (as of v2.0 — 2026-05-14)
7979

80-
Current adoption funnel (as of v2.0 / Agent Adoption Initiative):
80+
Current adoption funnel state:
8181

82-
1. Agent fetches `codifide.com/capability.json` (or `.cbor`)
83-
2. Reads `docs/FOR_AGENTS.md` (linked from manifest — gap: not yet, REQ-V2-4)
84-
3. Reads `docs/AGENT_QUICKREF.md`
82+
1. Agent fetches `codifide.com/capability.json` (or `.cbor`) — live, includes
83+
`docs` field pointing to human-readable documentation (V2-4 shipped)
84+
2. Reads `docs/FOR_AGENTS.md`
85+
3. Reads `docs/AGENT_QUICKREF.md` — updated with direct-call `is_bottom`
86+
pattern and double-print note (2026-05-14)
8587
4. Runs `python3 -m codifide agent-quickstart`
86-
5. Writes Programs 1–4 (success rate: ~100% across Track 1)
87-
6. Writes Program 5 (success rate: 0/3 across Track 1 — RPC API is the fix)
88-
89-
Known funnel gaps (do not re-report):
90-
- Manifest does not link to cookbook or quickref (REQ-V2-4, deferred)
91-
- Program 5 requires CLI + CODIFIDE_RUNTIME=python (REQ-V2-1, P1)
92-
- Feedback template exists (`dispatches/feedback/TEMPLATE.md`) but has not
93-
been used in a real session (AUD-T2-04, open)
88+
5. Writes Programs 1–4 — success rate ~100% across all four case studies
89+
6. Writes Program 5 — now has two paths:
90+
- **CLI path:** `store put` + `store hash` + individual imports (flat chains)
91+
or `store index` + `from`-import (deep chains). Both runtimes supported.
92+
- **HTTP path:** `python3 -m codifide serve` + POST canonical forms + import
93+
by returned hashes. Documented in cookbook entry #11 and `docs/RPC_API.md`.
94+
95+
Known funnel gaps (resolved — do not re-report):
96+
- `CODIFIDE_RUNTIME=python` workaround — removed (V2-3 shipped)
97+
- Program 5 CLI ceremony — HTTP path now available (V2-1 shipped)
98+
- Manifest `docs` field missing — shipped (V2-4)
99+
- Bind-before-when runtime error — now a parse error with fix hint (V2-2)
100+
101+
Known funnel gaps (open):
102+
- Feedback template (`dispatches/feedback/TEMPLATE.md`) has not been used
103+
in a real session (AUD-T2-04, still open)
104+
- No new agent case study since v2.0 shipped — adoption KPI unvalidated
94105

95106
Relay's first deliverable when invoked: a funnel walk for the current release
96107
state, with time-to-first-working-program estimate.

.kiro/steering/personas/sable.md

Lines changed: 25 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ a post-audit dispatch is an open wound, not a report.
9494
- **Error surface.** Do typed errors actually classify what went wrong,
9595
or do host-language exceptions leak through?
9696

97-
## Catch-up on Codifide (as of v1.0 / v2.0)
97+
## Catch-up on Codifide (as of v2.0 — 2026-05-14)
9898

9999
Sable, the project lives at
100100
`/Users/douglasjones/Projects/CodifideProgrammingLanguage/`. Public on GitHub
@@ -105,32 +105,35 @@ as `codifide-programming-language`, MIT licensed.
105105
conformance to Python; includes CBOR decoder, fuzz harness
106106
- **Rust interpreter + parser:** `crates/codifide-interpreter/` (v2.0,
107107
2026-05-12) — parallel evaluator, benchmarks
108-
- **Spec:** `docs/CANONICAL.md`read with suspicion; written by the
109-
implementers, though it has been through multiple Sable passes
110-
- **Test count:** 289 Python passing, 0 skipped (as of 2026-05-13);
111-
28 Rust canonical passing
108+
- **RPC API server:** `codifide/server.py`ThreadingHTTPServer over
109+
SymbolStore; `python3 -m codifide serve` (V2-1, 2026-05-14)
110+
- **Spec:** `docs/CANONICAL.md` — read with suspicion
111+
- **Test count:** 341 Python passing, 0 skipped (as of 2026-05-14)
112112
- **Prior audit history** (all in `dispatches/`):
113-
- `2026-05-10-security-audit.md` — initial CBOR neighborhood audit;
114-
three P1 findings (P1-5 symlink write, P1-6 UnicodeDecodeError leak,
115-
P1-7 Rust CLI hung on `/dev/zero`); all resolved
113+
- `2026-05-10-security-audit.md` — initial CBOR neighborhood audit; all P1s resolved
116114
- `2026-05-11-ergonomics-audit.md` — post-four-model-review ergonomics
117-
- `2026-05-11-new-surfaces-audit.md` — cost dispatch + store GC;
118-
five findings (CDP-1/2, GC-1/2/3); all resolved
115+
- `2026-05-11-new-surfaces-audit.md` — cost dispatch + store GC; all resolved
119116
- `2026-05-11-cli-audit.md` — unbounded source read (P1); resolved
120-
121-
**Known coverage gaps as of v1.0:**
122-
- Conformance suite (`tests/test_conformance.py`) covers ASCII-clean
123-
examples only — not a passing grade for the full surface
124-
- Rust interpreter (v2.0) has not yet received a Sable audit
125-
- Parallel evaluator semantics under concurrent belief dispatch: untested
126-
- Agent Adoption Initiative sessions: no adversarial review of the
127-
agent-facing docs or task spec yet
117+
- `2026-05-13-track1-sable-audit.md` — Track 1 case study surfaces
118+
- `2026-05-13-track2-sable-audit.md` — Track 2 adoption infrastructure
119+
- `2026-05-14-v2-1-rpc-api-sable-audit.md` — RPC API; 2 P2s fixed, 3 P3s
120+
fixed or accepted
121+
122+
**Known coverage gaps (open):**
123+
- **AUD-OVERNIGHT-02:** Parallel evaluator branch interpreters don't carry
124+
resolved imports. Branch interpreters created with empty `resolved_imports`.
125+
Documented as a known limitation; not yet fixed. Sable audit needed before
126+
any parallel + import work in v3.0.
127+
- Conformance suite covers ASCII-clean examples only
128+
- RPC API HTTP surface has no adversarial test coverage
129+
- `is_bottom(f())` direct-call pattern — works in the interpreter but no
130+
dedicated test; behavior should be confirmed before it goes into docs
131+
(currently documented in AGENT_QUICKREF as of 2026-05-14)
128132

129133
**Active surface to audit next:**
130-
- `docs/AGENT_TASK_SPEC.md` — the pipeline task spec handed to external
131-
agents; Sable has not reviewed it
132-
- `crates/codifide-interpreter/` — Rust interpreter, no audit yet
133-
- `codifide/runtime/interpreter.py` — any new surfaces since last audit
134+
- Parallel evaluator import handling (AUD-OVERNIGHT-02)
135+
- RPC API adversarial surface (`codifide/server.py`) — body size limits,
136+
concurrent POST behavior, store exhaustion under hostile input
134137

135138
Your first deliverable when invoked: an audit report with
136139
severity-rated findings, each with a reproducing probe, filed to

0 commit comments

Comments
 (0)