|
| 1 | +# Agent-OS control plane notes — 2026-05-11 (round 4) |
| 2 | + |
| 3 | +Fourth and largest round of external strategic input from ChatGPT. |
| 4 | +50 architectural ideas in ~2,300 lines (raw paste preserved at |
| 5 | +`dev/chatgpt-paste-2026-05-11.md`). |
| 6 | + |
| 7 | +This round **formalises** what rounds 1-3 hinted at: |
| 8 | + |
| 9 | +> Roam is a **local control plane for autonomous coding agents**. |
| 10 | +
|
| 11 | +Not a CLI. Not a dashboard. Not a PR bot. A control system. |
| 12 | + |
| 13 | +> *"Roam does not help agents write more code. Roam helps agents |
| 14 | +> earn the right to change code."* |
| 15 | +
|
| 16 | +That's the sharpest one-liner of the entire 4-round series. |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## The 5-layer architecture (the framing that survives) |
| 21 | + |
| 22 | +``` |
| 23 | +1. World Model repo graph · effects graph · journey graph · contracts · laws |
| 24 | +2. Control Plane permissions · risk budget · semantic firewalls · capability escrow |
| 25 | +3. Agent Runtime sessions · attention audit · run ledger · stop checks · debugging |
| 26 | +4. Evidence System proof-carrying PRs · test obligations · provenance · review packets |
| 27 | +5. Immune Memory repo-specific failure patterns · rejected changes · learned laws · |
| 28 | + fragile zones |
| 29 | +``` |
| 30 | + |
| 31 | +Five layers cleanly map our existing surfaces: |
| 32 | + |
| 33 | +- **World Model** → 80% built: symbol graph, effects, taint, clusters, |
| 34 | + layers, clones, co-change, runtime hotspots. Missing: explicit |
| 35 | + causal-graph layer (#41), side-effect ledger (#42), journey graph (#14). |
| 36 | +- **Control Plane** → 20% built: `roam rules` + soft contract checks |
| 37 | + exist. Permissions (`roam permit`), risk budget, semantic firewalls |
| 38 | + are the Phase-0 / R18 / R24 builds. |
| 39 | +- **Agent Runtime** → 50% built: MCP session memory, `agent_contract` |
| 40 | + block, structured errors. Missing: attention audit (#2), run ledger |
| 41 | + (R20), stop-condition detector (#49). |
| 42 | +- **Evidence System** → 60% built: `roam pr-analyze`, `roam critique`, |
| 43 | + CGA attestations. Missing: proof-carrying PR bundle (#1), human |
| 44 | + review packet (#47), test obligation generator (#18). |
| 45 | +- **Immune Memory** → 0% built. This is round-4's BIGGEST new |
| 46 | + direction — *the codebase learns from each agent run and prevents |
| 47 | + future ones from repeating the mistake*. The compounding moat. |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +## ChatGPT's top-10 power moves (the priority shortlist) |
| 52 | + |
| 53 | +| # | Idea | Our status | Effort | |
| 54 | +|---|---|---|---| |
| 55 | +| 1 | **Proof-carrying PRs** — every PR ships a bundle of intent/context-read/risks/tests-run/non-goals + Roam verdict | New | M — high payoff, ties Roam Review tightly to the audit-trail story | |
| 56 | +| 2 | **Agent attention audit** — what did the agent LOOK AT before editing? (catches confident-but-blind agents) | New | M-S — we have file-read tracking signal in MCP session memory; needs aggregation | |
| 57 | +| 3 | **Invariant/law mining** — discover repo's unwritten rules from existing code patterns + tests + history | New | L — research-heavy, but uniquely possible because we have the graph | |
| 58 | +| 4 | **Negative-space detection** — "you added X, in this repo X usually requires Y" (missing auth check, missing test, missing rollback) | New | M — expectation analysis, not static analysis | |
| 59 | +| 5 | **Risk/autonomy budget** — agents get a numeric risk allowance per mode (safe_edit=20, autonomous_pr=50). Each edit consumes from the budget | New | S-M — pairs with R16 agent modes from round 2 | |
| 60 | +| 6 | **Semantic firewalls** — control changes across architectural boundaries (`client/**` cannot import `server/**`) | Partial | S — extends `roam rules` | |
| 61 | +| 7 | **Test obligation generator** — "this change requires these tests; here's why" | Partial | M — extends `roam affected-tests` | |
| 62 | +| 8 | **Codebase immune memory** — long-term learning: failures, rejections, ignored warnings, fragile zones | New | L — the moat | |
| 63 | +| 9 | **Intent-to-diff contract** | = round 2 #11, already queued in R16 | S | |
| 64 | +| 10 | **Human review packet** — pre-digested, evidence-bundled review surface for the human checkpoint | New | M-S — extends `roam pr-comment-render` | |
| 65 | + |
| 66 | +--- |
| 67 | + |
| 68 | +## The full 50-item index (condensed, with my read) |
| 69 | + |
| 70 | +Cross-referenced against the codebase + earlier rounds. Notation: |
| 71 | +✅ = mostly built, 🟡 = partial substrate exists, ⬜ = new. |
| 72 | + |
| 73 | +1. ⬜ **Proof-carrying PRs** — see top-10 |
| 74 | +2. ⬜ **Agent attention audit** — see top-10 |
| 75 | +3. ⬜ **Invariant/law mining** — see top-10 |
| 76 | +4. ⬜ **Negative-space detection** — see top-10 |
| 77 | +5. ⬜ **Risk/autonomy budget** — see top-10 |
| 78 | +6. 🟡 **Semantic firewalls** — extends `roam rules` |
| 79 | +7. ⬜ **Intent-to-diff contract** — = round 2 #11 (R16) |
| 80 | +8. ⬜ **Counterfactual patch planning** — "show me 3 alternative patches before committing" |
| 81 | +9. ⬜ **Patch minimality engine** — "this patch could be 40% smaller; here's the irreducible diff" |
| 82 | +10. ⬜ **Agent self-debugger** — translates failed-test stdout into "likely cause + next tool" |
| 83 | +11. ⬜ **Codebase immune system** — short-term equivalent of #50 (immune memory) |
| 84 | +12. 🟡 **Structural regression tests** — `roam fingerprint` exists; needs gating UX |
| 85 | +13. ⬜ **Human-readable spec diff** — "what changed about how the system behaves" |
| 86 | +14. ⬜ **Behavioral journey graph** — user-journey-level model on top of call graph |
| 87 | +15. 🟡 **Latent coupling detector** — `roam coupling` covers this for co-change; semantic latent coupling is the extension |
| 88 | +16. ⬜ **Change cones instead of file diffs** — visualise blast radius as a cone, not a list |
| 89 | +17. ⬜ **Agent suspicion score** — how likely is this agent's run problematic, before review |
| 90 | +18. ⬜ **Test obligation generator** — see top-10 |
| 91 | +19. ⬜ **Repo-specific agent benchmark** — mine bugfix commits, turn them into agent eval tasks. *"Which agent is safe on YOUR codebase?"* Potential paid/cloud feature |
| 92 | +20. ⬜ **Semantic rollback planner** — surgical revert of bad hunks, keep the good |
| 93 | +21. ⬜ **Compatibility oracle** — detects public-API breakage across boundary surfaces |
| 94 | +22. ⬜ **Design pressure detector** — where the architecture is bending under repeated change |
| 95 | +23. ⬜ **Architectural entropy budget** — codebase-level complexity allowance |
| 96 | +24. ⬜ **Local world-model server** — long-running daemon (already noted in `watcher.py:148` as a revisit point per memory) |
| 97 | +25. ⬜ **Assumption tracking** — explicit "this code assumes X" annotations |
| 98 | +26. ⬜ **Question generator** — "here are 4 questions to ask before this PR is safe to merge" |
| 99 | +27. 🟡 **Change provenance graph** — CGA attestations are the substrate |
| 100 | +28. ⬜ **AI change quarantine** — flag + isolate suspect changes during review |
| 101 | +29. ⬜ **Semantic merge for multi-agent coding** — pairs with R21 lease system |
| 102 | +30. 🟡 **Local contract database** — `_DOC_LINKS`/error contracts exist; user-facing-contracts is the extension |
| 103 | +31. ⬜ **"What would break if this was wrong?"** — counterfactual fault propagation |
| 104 | +32. 🟡 **"Do not touch" inference** — `_UTILITY_PATH_PATTERNS` exists; needs to be agent-facing |
| 105 | +33. ⬜ **Codebase constitution compiler** — = round 3 #15 (R24) |
| 106 | +34. ⬜ **Agent capability escrow** — pre-authorise specific edits inside a session |
| 107 | +35. ⬜ **Agent run types as first-class objects** — typed agent operations (exploration vs editing vs reviewing) |
| 108 | +36. 🟡 **Behavior-preserving refactor checker** — `roam simulate` + `roam critique` partial substrate |
| 109 | +37. 🟡 **Agent hallucination detector beyond imports** — `roam critique` catches clones-not-edited; this extends to "calls function that doesn't exist with this signature" |
| 110 | +38. ⬜ **Repository affordance map** — "what CAN you do here? what's idiomatic?" |
| 111 | +39. ⬜ **Nearest existing pattern finder** — "before adding new code, here are 3 similar existing patterns" |
| 112 | +40. ⬜ **Patch as hypothesis model** — every patch is a falsifiable claim; Roam runs the falsification |
| 113 | +41. ⬜ **Causal graph, not just call graph** — see "World Model" gap above |
| 114 | +42. ⬜ **Side-effect ledger** — see "World Model" gap above |
| 115 | +43. ⬜ **Transaction boundary detector** — find atomicity violations across DB + email + cache writes |
| 116 | +44. ⬜ **Idempotency detector** — flag endpoints that should be idempotent but aren't |
| 117 | +45. 🟡 **Security-context propagation** — taint engine is the substrate |
| 118 | +46. 🟡 **Data lineage and privacy flow** — bridges layer can be extended for this |
| 119 | +47. ⬜ **Human review packet** — see top-10 |
| 120 | +48. ⬜ **Agent ethics: user-intent protection** — guard against scope drift |
| 121 | +49. ⬜ **Stop-condition detector** — when should the agent STOP and ask |
| 122 | +50. ⬜ **Codebase immune memory** — see top-10. **The moat.** |
| 123 | + |
| 124 | +--- |
| 125 | + |
| 126 | +## The 4 truly category-defining ideas (my filter) |
| 127 | + |
| 128 | +Most of the 50 are interesting; these 4 would change the product |
| 129 | +shape: |
| 130 | + |
| 131 | +### A. Proof-carrying PRs (#1) |
| 132 | +> A PR shouldn't just contain code. It should carry a proof bundle: |
| 133 | +> intent, context-read, affected symbols, risks, tests required, |
| 134 | +> tests run, known non-goals, Roam verdict. |
| 135 | +
|
| 136 | +This becomes the **type system for AI-generated PRs**. Roam Review |
| 137 | +stops being a commenter and becomes a *gatekeeper*. Pairs perfectly |
| 138 | +with the Phase-2 Roam Review GitHub App MVP — this is the |
| 139 | +*differentiating feature* that's not a copy of CodeRabbit. |
| 140 | + |
| 141 | +### B. Agent attention audit (#2) |
| 142 | +> What did the agent LOOK AT before editing? Catches the |
| 143 | +> "confident but blind" failure mode. |
| 144 | +
|
| 145 | +A human reviewer cannot easily know what the agent saw — but Roam |
| 146 | +can, because the MCP session already tracks tool calls. This is |
| 147 | +an **uncopyable advantage** for tools that go through Roam's MCP |
| 148 | +surface vs raw LLM agents. |
| 149 | + |
| 150 | +### C. Invariant/law mining (#3) |
| 151 | +> Mine the repo's unwritten laws from code patterns + tests + git |
| 152 | +> history + naming + boundaries. |
| 153 | +
|
| 154 | +Combined with the round-3 graph-aware policy DSL, this is the |
| 155 | +**self-installing constitution**. Most repos won't write |
| 156 | +`.roam/constitution.yml` by hand — but `roam laws mine` populates |
| 157 | +80% of it from existing code. |
| 158 | + |
| 159 | +### D. Codebase immune memory (#50) |
| 160 | +> Long-term: Roam remembers what went wrong, what agents tend to |
| 161 | +> break, what humans rejected, what warnings were ignored, what |
| 162 | +> fixes worked. |
| 163 | +
|
| 164 | +This is the compounding loop. Every agent run makes Roam smarter |
| 165 | +about *this specific repo* — and that knowledge is portable across |
| 166 | +agent vendors. **The deepest moat in the series.** |
| 167 | + |
| 168 | +These four + the existing R18 (graph-aware policy) form the |
| 169 | +defensible-by-construction product: |
| 170 | + |
| 171 | +``` |
| 172 | +attention audit + proof bundle ──→ agent earns the right to change code |
| 173 | +graph policy + mined laws ──→ the law of the repo is machine-readable |
| 174 | +immune memory ──→ the law compounds with every run |
| 175 | +``` |
| 176 | + |
| 177 | +--- |
| 178 | + |
| 179 | +## Integration with existing roadmap (R13-R25) |
| 180 | + |
| 181 | +R13-R17 (round 1-2) and R18-R25 (round 3) already in BACKLOG. |
| 182 | +Round 4 doesn't introduce a parallel R26+ track — instead it |
| 183 | +*sharpens* existing rounds and adds 4 elevated priorities: |
| 184 | + |
| 185 | +| Existing | Sharpened by round 4 | |
| 186 | +|---|---| |
| 187 | +| **R13** agent-OS metadata pass | + `phase` / `risk_cost` per @_tool (for budget #5) | |
| 188 | +| **R14** hero copy | + new one-liner: *"Roam helps agents earn the right to change code."* | |
| 189 | +| **R15** decision engine + agents-md + prompt snippets | + #38 affordance map + #39 nearest-pattern finder + #26 question generator | |
| 190 | +| **R16** agent modes + intent-check + agent-score | + #2 attention audit (the missing piece) + #5 risk budget contract | |
| 191 | +| **R17** Cloud as agent governance | + #19 repo-specific benchmarks ("which agent is safe on YOUR codebase") + #11 immune-system dashboard | |
| 192 | +| **R18** graph-aware policy DSL | + #6 semantic firewalls + #45 security-context clauses | |
| 193 | +| **R19** repo-local memory | + #50 IMMUNE MEMORY (the moat) — the memory store IS the immune-memory substrate | |
| 194 | +| **R20** agent run ledger | + #2 attention audit data + #1 proof bundle assembly | |
| 195 | +| **R21** multi-agent lease | + #29 semantic merge | |
| 196 | +| **R22** confidence contract | + #17 agent suspicion score | |
| 197 | +| **R23** graph versioning | + #22 design-pressure detector + #23 entropy budget | |
| 198 | +| **R24** constitution | + #33 constitution compiler + #25 assumption tracking | |
| 199 | +| **R25+** plugin protocol | + #38 affordance map per framework | |
| 200 | + |
| 201 | +Plus **3 new rounds** for the genuinely novel directions: |
| 202 | + |
| 203 | +| Round | What | Strategic note | |
| 204 | +|---|---|---| |
| 205 | +| **R26** | **Proof-carrying PRs** (#1) — bundle generator + Review-gate logic + JSON schema | THE Roam Review differentiator. Phase-2 MVP feature, not deferrable. | |
| 206 | +| **R27** | **Invariant/law mining** (#3) — research-then-build. `roam laws mine` + `git diff \| roam laws check`. | Pairs with R18 graph-aware policy. Self-installing constitution. | |
| 207 | +| **R28** | **Side-effect ledger + causal graph + transaction boundaries** (#41, #42, #43, #44) — World Model layer expansion | One sprint of structural-graph work; unlocks 4 commands. | |
| 208 | + |
| 209 | +--- |
| 210 | + |
| 211 | +## What this rounds adds vs rounds 1-3 |
| 212 | + |
| 213 | +Round 4 doesn't replace earlier guidance — it **layers**. The |
| 214 | +hierarchy is now: |
| 215 | + |
| 216 | +``` |
| 217 | +Round 1 — positioning "agents need a map" |
| 218 | +Round 2 — tactical features 4 missing primitives |
| 219 | +Round 3 — architectural framing agent decision substrate |
| 220 | +Round 4 — control plane / moat earn-the-right-to-change-code |
| 221 | +``` |
| 222 | + |
| 223 | +The most important shift from round 3 to round 4: the moat |
| 224 | +discussion moves from *"graph-aware policy is the differentiator"* |
| 225 | +(round 3) to *"immune memory + proof bundles are the |
| 226 | +compounding moat"* (round 4). Both are true; round 4 is |
| 227 | +longer-horizon. |
| 228 | + |
| 229 | +--- |
| 230 | + |
| 231 | +## What to ship next, in priority order |
| 232 | + |
| 233 | +If I had to pick a 3-round sprint that captures the round-4 |
| 234 | +upgrade: |
| 235 | + |
| 236 | +1. **R26 — Proof-carrying PR bundle** (Roam Review wedge) |
| 237 | +2. **R19/R50 — Repo-local memory + immune memory** (the moat) |
| 238 | +3. **R18 — Graph-aware policy DSL** (already on track) |
| 239 | + |
| 240 | +R26 is the most-leveraged near-term build because it's the |
| 241 | +differentiator vs CodeRabbit/Greptile/Qodo for the Phase-2 Roam |
| 242 | +Review GitHub App. R19 + #50 is the longer-horizon moat. R18 |
| 243 | +remains the structural foundation. |
| 244 | + |
| 245 | +--- |
| 246 | + |
| 247 | +## Hero-copy candidates (round 4 additions) |
| 248 | + |
| 249 | +In priority order for A/B testing: |
| 250 | + |
| 251 | +1. > **"Roam helps agents earn the right to change code."** |
| 252 | +2. > Roam is a local control plane for autonomous coding agents. |
| 253 | +3. > Agents are actuators. Roam is the nervous system, immune |
| 254 | +> system, memory, and law of the codebase. |
| 255 | +4. > Proof-carrying PRs: the agent changed code AND produced evidence |
| 256 | +> that it understood the blast radius. |
| 257 | +5. > (Round 1, still strong) *Agents should not edit blind. Roam is |
| 258 | +> their map.* |
| 259 | +
|
| 260 | +The #1 line is the sharpest. Worth testing on /pricing and the |
| 261 | +home hero. |
0 commit comments