Skip to content

Commit 8e44f3e

Browse files
committed
dev: capture agent-OS control plane notes (round 4 — 50 ideas)
Fourth and largest round of external strategic input from ChatGPT. 50 architectural ideas in ~2,300 lines, formalised under the thesis: Roam is a local control plane for autonomous coding agents. "Roam helps agents earn the right to change code." Files: * dev/chatgpt-paste-2026-05-11.md (raw, ~2,300 lines) — the full paste, preserved verbatim for reference. * dev/agent-os-control-plane-2026-05-11.md (synthesis, ~340 lines): - The 5-layer architecture (World Model, Control Plane, Agent Runtime, Evidence System, Immune Memory) - ChatGPT's top-10 power moves with our codebase status - All 50 ideas indexed with built/partial/new flag - The 4 category-defining ideas: proof-carrying PRs, attention audit, invariant mining, immune memory - Integration table — how round 4 SHARPENS R13-R25 rather than replacing them - 3 new rounds added: R26 (proof-carrying PRs), R27 (law mining), R28 (World Model expansion: side-effects + causal + transactions + idempotency) * dev/BACKLOG.md — R26-R28 queued with strategic notes + new hero-copy candidates. Companion writes (auto-memory): * memory/agent_os_control_plane_2026_05_11.md * MEMORY.md — ⭐⭐⭐ entry added The sharpest one-liner of the entire 4-round series: "Roam helps agents earn the right to change code." The most-leveraged near-term build identified across all rounds: R26 — proof-carrying PR bundle. THE differentiator for Phase-2 Roam Review vs CodeRabbit/Greptile/Qodo. The deepest moat identified across all rounds: codebase immune memory (#50) — Roam becomes smarter about each repo the longer agents work in it. Uncopyable.
1 parent 12f0761 commit 8e44f3e

3 files changed

Lines changed: 2636 additions & 0 deletions

File tree

dev/BACKLOG.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,40 @@ record → memory*.
8989

9090
---
9191

92+
## R26–R28 — control plane rounds (2026-05-11, round 4)
93+
94+
ChatGPT round 4 (50 ideas, full capture:
95+
`dev/agent-os-control-plane-2026-05-11.md`, raw paste preserved at
96+
`dev/chatgpt-paste-2026-05-11.md`). Round 4 *formalises* rounds
97+
1-3 under the thesis:
98+
99+
> Roam is a local control plane for autonomous coding agents.
100+
> *"Roam helps agents earn the right to change code."*
101+
102+
Most of round 4's 50 ideas **sharpen** R13-R25 rather than replace
103+
them — see the in-repo capture for the integration table. Three
104+
new rounds added for the genuinely category-defining ideas:
105+
106+
| Round | What | Strategic note |
107+
|---|---|---|
108+
| **R26** | **Proof-carrying PR bundle** — every PR ships `{intent, context_read, affected_symbols, risks, tests_required, tests_run, known_non_goals, roam_verdict}`. Review can BLOCK on missing proof. | **THE Roam Review differentiator** vs CodeRabbit/Greptile/Qodo. Phase-2 MVP priority. |
109+
| **R27** | **Invariant/law mining**`roam laws mine` discovers repo's unwritten rules from existing code + tests + git history; `git diff \| roam laws check` enforces. | Self-installing constitution. Pairs with R18 + R24. |
110+
| **R28** | **World Model expansion** — side-effect ledger (#42), causal graph (#41), transaction boundary detector (#43), idempotency detector (#44). | One sprint of structural-graph work; unlocks 4 commands. |
111+
112+
Round-4 also adds new hero-copy candidates worth A/B testing:
113+
114+
- **"Roam helps agents earn the right to change code."** ← sharpest
115+
- "Roam is a local control plane for autonomous coding agents."
116+
- (Round 1, still strong) "Agents should not edit blind. Roam is their map."
117+
118+
The 4 category-defining ideas across all 50:
119+
proof-carrying PRs (R26), agent attention audit (R20 + R16),
120+
invariant mining (R27), codebase immune memory (R19 + #50).
121+
Each is *uncopyable* without our graph substrate + MCP session
122+
tracking.
123+
124+
---
125+
92126
## Next pickup — pick from ROADMAP
93127

94128
When this queue clears (it has), pull from `ROADMAP.md` in this order:
Lines changed: 261 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,261 @@
1+
# Agent-OS control plane notes — 2026-05-11 (round 4)
2+
3+
Fourth and largest round of external strategic input from ChatGPT.
4+
50 architectural ideas in ~2,300 lines (raw paste preserved at
5+
`dev/chatgpt-paste-2026-05-11.md`).
6+
7+
This round **formalises** what rounds 1-3 hinted at:
8+
9+
> Roam is a **local control plane for autonomous coding agents**.
10+
11+
Not a CLI. Not a dashboard. Not a PR bot. A control system.
12+
13+
> *"Roam does not help agents write more code. Roam helps agents
14+
> earn the right to change code."*
15+
16+
That's the sharpest one-liner of the entire 4-round series.
17+
18+
---
19+
20+
## The 5-layer architecture (the framing that survives)
21+
22+
```
23+
1. World Model repo graph · effects graph · journey graph · contracts · laws
24+
2. Control Plane permissions · risk budget · semantic firewalls · capability escrow
25+
3. Agent Runtime sessions · attention audit · run ledger · stop checks · debugging
26+
4. Evidence System proof-carrying PRs · test obligations · provenance · review packets
27+
5. Immune Memory repo-specific failure patterns · rejected changes · learned laws ·
28+
fragile zones
29+
```
30+
31+
Five layers cleanly map our existing surfaces:
32+
33+
- **World Model** → 80% built: symbol graph, effects, taint, clusters,
34+
layers, clones, co-change, runtime hotspots. Missing: explicit
35+
causal-graph layer (#41), side-effect ledger (#42), journey graph (#14).
36+
- **Control Plane** → 20% built: `roam rules` + soft contract checks
37+
exist. Permissions (`roam permit`), risk budget, semantic firewalls
38+
are the Phase-0 / R18 / R24 builds.
39+
- **Agent Runtime** → 50% built: MCP session memory, `agent_contract`
40+
block, structured errors. Missing: attention audit (#2), run ledger
41+
(R20), stop-condition detector (#49).
42+
- **Evidence System** → 60% built: `roam pr-analyze`, `roam critique`,
43+
CGA attestations. Missing: proof-carrying PR bundle (#1), human
44+
review packet (#47), test obligation generator (#18).
45+
- **Immune Memory** → 0% built. This is round-4's BIGGEST new
46+
direction — *the codebase learns from each agent run and prevents
47+
future ones from repeating the mistake*. The compounding moat.
48+
49+
---
50+
51+
## ChatGPT's top-10 power moves (the priority shortlist)
52+
53+
| # | Idea | Our status | Effort |
54+
|---|---|---|---|
55+
| 1 | **Proof-carrying PRs** — every PR ships a bundle of intent/context-read/risks/tests-run/non-goals + Roam verdict | New | M — high payoff, ties Roam Review tightly to the audit-trail story |
56+
| 2 | **Agent attention audit** — what did the agent LOOK AT before editing? (catches confident-but-blind agents) | New | M-S — we have file-read tracking signal in MCP session memory; needs aggregation |
57+
| 3 | **Invariant/law mining** — discover repo's unwritten rules from existing code patterns + tests + history | New | L — research-heavy, but uniquely possible because we have the graph |
58+
| 4 | **Negative-space detection** — "you added X, in this repo X usually requires Y" (missing auth check, missing test, missing rollback) | New | M — expectation analysis, not static analysis |
59+
| 5 | **Risk/autonomy budget** — agents get a numeric risk allowance per mode (safe_edit=20, autonomous_pr=50). Each edit consumes from the budget | New | S-M — pairs with R16 agent modes from round 2 |
60+
| 6 | **Semantic firewalls** — control changes across architectural boundaries (`client/**` cannot import `server/**`) | Partial | S — extends `roam rules` |
61+
| 7 | **Test obligation generator** — "this change requires these tests; here's why" | Partial | M — extends `roam affected-tests` |
62+
| 8 | **Codebase immune memory** — long-term learning: failures, rejections, ignored warnings, fragile zones | New | L — the moat |
63+
| 9 | **Intent-to-diff contract** | = round 2 #11, already queued in R16 | S |
64+
| 10 | **Human review packet** — pre-digested, evidence-bundled review surface for the human checkpoint | New | M-S — extends `roam pr-comment-render` |
65+
66+
---
67+
68+
## The full 50-item index (condensed, with my read)
69+
70+
Cross-referenced against the codebase + earlier rounds. Notation:
71+
✅ = mostly built, 🟡 = partial substrate exists, ⬜ = new.
72+
73+
1.**Proof-carrying PRs** — see top-10
74+
2.**Agent attention audit** — see top-10
75+
3.**Invariant/law mining** — see top-10
76+
4.**Negative-space detection** — see top-10
77+
5.**Risk/autonomy budget** — see top-10
78+
6. 🟡 **Semantic firewalls** — extends `roam rules`
79+
7.**Intent-to-diff contract** — = round 2 #11 (R16)
80+
8.**Counterfactual patch planning** — "show me 3 alternative patches before committing"
81+
9.**Patch minimality engine** — "this patch could be 40% smaller; here's the irreducible diff"
82+
10.**Agent self-debugger** — translates failed-test stdout into "likely cause + next tool"
83+
11.**Codebase immune system** — short-term equivalent of #50 (immune memory)
84+
12. 🟡 **Structural regression tests**`roam fingerprint` exists; needs gating UX
85+
13.**Human-readable spec diff** — "what changed about how the system behaves"
86+
14.**Behavioral journey graph** — user-journey-level model on top of call graph
87+
15. 🟡 **Latent coupling detector**`roam coupling` covers this for co-change; semantic latent coupling is the extension
88+
16.**Change cones instead of file diffs** — visualise blast radius as a cone, not a list
89+
17.**Agent suspicion score** — how likely is this agent's run problematic, before review
90+
18.**Test obligation generator** — see top-10
91+
19.**Repo-specific agent benchmark** — mine bugfix commits, turn them into agent eval tasks. *"Which agent is safe on YOUR codebase?"* Potential paid/cloud feature
92+
20.**Semantic rollback planner** — surgical revert of bad hunks, keep the good
93+
21.**Compatibility oracle** — detects public-API breakage across boundary surfaces
94+
22.**Design pressure detector** — where the architecture is bending under repeated change
95+
23.**Architectural entropy budget** — codebase-level complexity allowance
96+
24.**Local world-model server** — long-running daemon (already noted in `watcher.py:148` as a revisit point per memory)
97+
25.**Assumption tracking** — explicit "this code assumes X" annotations
98+
26.**Question generator** — "here are 4 questions to ask before this PR is safe to merge"
99+
27. 🟡 **Change provenance graph** — CGA attestations are the substrate
100+
28.**AI change quarantine** — flag + isolate suspect changes during review
101+
29.**Semantic merge for multi-agent coding** — pairs with R21 lease system
102+
30. 🟡 **Local contract database**`_DOC_LINKS`/error contracts exist; user-facing-contracts is the extension
103+
31.**"What would break if this was wrong?"** — counterfactual fault propagation
104+
32. 🟡 **"Do not touch" inference**`_UTILITY_PATH_PATTERNS` exists; needs to be agent-facing
105+
33.**Codebase constitution compiler** — = round 3 #15 (R24)
106+
34.**Agent capability escrow** — pre-authorise specific edits inside a session
107+
35.**Agent run types as first-class objects** — typed agent operations (exploration vs editing vs reviewing)
108+
36. 🟡 **Behavior-preserving refactor checker**`roam simulate` + `roam critique` partial substrate
109+
37. 🟡 **Agent hallucination detector beyond imports**`roam critique` catches clones-not-edited; this extends to "calls function that doesn't exist with this signature"
110+
38.**Repository affordance map** — "what CAN you do here? what's idiomatic?"
111+
39.**Nearest existing pattern finder** — "before adding new code, here are 3 similar existing patterns"
112+
40.**Patch as hypothesis model** — every patch is a falsifiable claim; Roam runs the falsification
113+
41.**Causal graph, not just call graph** — see "World Model" gap above
114+
42.**Side-effect ledger** — see "World Model" gap above
115+
43.**Transaction boundary detector** — find atomicity violations across DB + email + cache writes
116+
44.**Idempotency detector** — flag endpoints that should be idempotent but aren't
117+
45. 🟡 **Security-context propagation** — taint engine is the substrate
118+
46. 🟡 **Data lineage and privacy flow** — bridges layer can be extended for this
119+
47.**Human review packet** — see top-10
120+
48.**Agent ethics: user-intent protection** — guard against scope drift
121+
49.**Stop-condition detector** — when should the agent STOP and ask
122+
50.**Codebase immune memory** — see top-10. **The moat.**
123+
124+
---
125+
126+
## The 4 truly category-defining ideas (my filter)
127+
128+
Most of the 50 are interesting; these 4 would change the product
129+
shape:
130+
131+
### A. Proof-carrying PRs (#1)
132+
> A PR shouldn't just contain code. It should carry a proof bundle:
133+
> intent, context-read, affected symbols, risks, tests required,
134+
> tests run, known non-goals, Roam verdict.
135+
136+
This becomes the **type system for AI-generated PRs**. Roam Review
137+
stops being a commenter and becomes a *gatekeeper*. Pairs perfectly
138+
with the Phase-2 Roam Review GitHub App MVP — this is the
139+
*differentiating feature* that's not a copy of CodeRabbit.
140+
141+
### B. Agent attention audit (#2)
142+
> What did the agent LOOK AT before editing? Catches the
143+
> "confident but blind" failure mode.
144+
145+
A human reviewer cannot easily know what the agent saw — but Roam
146+
can, because the MCP session already tracks tool calls. This is
147+
an **uncopyable advantage** for tools that go through Roam's MCP
148+
surface vs raw LLM agents.
149+
150+
### C. Invariant/law mining (#3)
151+
> Mine the repo's unwritten laws from code patterns + tests + git
152+
> history + naming + boundaries.
153+
154+
Combined with the round-3 graph-aware policy DSL, this is the
155+
**self-installing constitution**. Most repos won't write
156+
`.roam/constitution.yml` by hand — but `roam laws mine` populates
157+
80% of it from existing code.
158+
159+
### D. Codebase immune memory (#50)
160+
> Long-term: Roam remembers what went wrong, what agents tend to
161+
> break, what humans rejected, what warnings were ignored, what
162+
> fixes worked.
163+
164+
This is the compounding loop. Every agent run makes Roam smarter
165+
about *this specific repo* — and that knowledge is portable across
166+
agent vendors. **The deepest moat in the series.**
167+
168+
These four + the existing R18 (graph-aware policy) form the
169+
defensible-by-construction product:
170+
171+
```
172+
attention audit + proof bundle ──→ agent earns the right to change code
173+
graph policy + mined laws ──→ the law of the repo is machine-readable
174+
immune memory ──→ the law compounds with every run
175+
```
176+
177+
---
178+
179+
## Integration with existing roadmap (R13-R25)
180+
181+
R13-R17 (round 1-2) and R18-R25 (round 3) already in BACKLOG.
182+
Round 4 doesn't introduce a parallel R26+ track — instead it
183+
*sharpens* existing rounds and adds 4 elevated priorities:
184+
185+
| Existing | Sharpened by round 4 |
186+
|---|---|
187+
| **R13** agent-OS metadata pass | + `phase` / `risk_cost` per @_tool (for budget #5) |
188+
| **R14** hero copy | + new one-liner: *"Roam helps agents earn the right to change code."* |
189+
| **R15** decision engine + agents-md + prompt snippets | + #38 affordance map + #39 nearest-pattern finder + #26 question generator |
190+
| **R16** agent modes + intent-check + agent-score | + #2 attention audit (the missing piece) + #5 risk budget contract |
191+
| **R17** Cloud as agent governance | + #19 repo-specific benchmarks ("which agent is safe on YOUR codebase") + #11 immune-system dashboard |
192+
| **R18** graph-aware policy DSL | + #6 semantic firewalls + #45 security-context clauses |
193+
| **R19** repo-local memory | + #50 IMMUNE MEMORY (the moat) — the memory store IS the immune-memory substrate |
194+
| **R20** agent run ledger | + #2 attention audit data + #1 proof bundle assembly |
195+
| **R21** multi-agent lease | + #29 semantic merge |
196+
| **R22** confidence contract | + #17 agent suspicion score |
197+
| **R23** graph versioning | + #22 design-pressure detector + #23 entropy budget |
198+
| **R24** constitution | + #33 constitution compiler + #25 assumption tracking |
199+
| **R25+** plugin protocol | + #38 affordance map per framework |
200+
201+
Plus **3 new rounds** for the genuinely novel directions:
202+
203+
| Round | What | Strategic note |
204+
|---|---|---|
205+
| **R26** | **Proof-carrying PRs** (#1) — bundle generator + Review-gate logic + JSON schema | THE Roam Review differentiator. Phase-2 MVP feature, not deferrable. |
206+
| **R27** | **Invariant/law mining** (#3) — research-then-build. `roam laws mine` + `git diff \| roam laws check`. | Pairs with R18 graph-aware policy. Self-installing constitution. |
207+
| **R28** | **Side-effect ledger + causal graph + transaction boundaries** (#41, #42, #43, #44) — World Model layer expansion | One sprint of structural-graph work; unlocks 4 commands. |
208+
209+
---
210+
211+
## What this rounds adds vs rounds 1-3
212+
213+
Round 4 doesn't replace earlier guidance — it **layers**. The
214+
hierarchy is now:
215+
216+
```
217+
Round 1 — positioning "agents need a map"
218+
Round 2 — tactical features 4 missing primitives
219+
Round 3 — architectural framing agent decision substrate
220+
Round 4 — control plane / moat earn-the-right-to-change-code
221+
```
222+
223+
The most important shift from round 3 to round 4: the moat
224+
discussion moves from *"graph-aware policy is the differentiator"*
225+
(round 3) to *"immune memory + proof bundles are the
226+
compounding moat"* (round 4). Both are true; round 4 is
227+
longer-horizon.
228+
229+
---
230+
231+
## What to ship next, in priority order
232+
233+
If I had to pick a 3-round sprint that captures the round-4
234+
upgrade:
235+
236+
1. **R26 — Proof-carrying PR bundle** (Roam Review wedge)
237+
2. **R19/R50 — Repo-local memory + immune memory** (the moat)
238+
3. **R18 — Graph-aware policy DSL** (already on track)
239+
240+
R26 is the most-leveraged near-term build because it's the
241+
differentiator vs CodeRabbit/Greptile/Qodo for the Phase-2 Roam
242+
Review GitHub App. R19 + #50 is the longer-horizon moat. R18
243+
remains the structural foundation.
244+
245+
---
246+
247+
## Hero-copy candidates (round 4 additions)
248+
249+
In priority order for A/B testing:
250+
251+
1. > **"Roam helps agents earn the right to change code."**
252+
2. > Roam is a local control plane for autonomous coding agents.
253+
3. > Agents are actuators. Roam is the nervous system, immune
254+
> system, memory, and law of the codebase.
255+
4. > Proof-carrying PRs: the agent changed code AND produced evidence
256+
> that it understood the blast radius.
257+
5. > (Round 1, still strong) *Agents should not edit blind. Roam is
258+
> their map.*
259+
260+
The #1 line is the sharpest. Worth testing on /pricing and the
261+
home hero.

0 commit comments

Comments
 (0)