|
| 1 | +# Agent-OS architecture notes — 2026-05-11 |
| 2 | + |
| 3 | +Third round of external strategic input from ChatGPT. Where rounds 1+2 |
| 4 | +(`dev/agent-os-positioning-2026-05-11.md`) focused on positioning + |
| 5 | +tactical feature gaps, **this round is architectural** — the |
| 6 | +"agent decision substrate" framing and 15 architectural directions. |
| 7 | + |
| 8 | +Read this together with the round-1/2 capture and `internal/strategy/`. |
| 9 | + |
| 10 | +--- |
| 11 | + |
| 12 | +## Unifying thesis |
| 13 | + |
| 14 | +> Roam should become **the agent's engineering brain** — |
| 15 | +> a local operating layer that lets coding agents navigate, act, and |
| 16 | +> self-check inside a codebase. |
| 17 | +
|
| 18 | +The killer architectural loop ChatGPT proposes: |
| 19 | + |
| 20 | +``` |
| 21 | +Agent gets task |
| 22 | +↓ |
| 23 | +Roam retrieves context |
| 24 | +↓ |
| 25 | +Roam plans safe path |
| 26 | +↓ |
| 27 | +Roam grants/denies permissions |
| 28 | +↓ |
| 29 | +Agent edits |
| 30 | +↓ |
| 31 | +Roam critiques diff |
| 32 | +↓ |
| 33 | +Roam records run |
| 34 | +↓ |
| 35 | +Roam updates repo memory |
| 36 | +``` |
| 37 | + |
| 38 | +This is the **demo / video / hero-story** to script. Every loop step |
| 39 | +maps to a Roam tool. The pitch becomes: *"Roam is the only thing the |
| 40 | +agent talks to between getting the task and shipping the PR."* |
| 41 | + |
| 42 | +--- |
| 43 | + |
| 44 | +## The 15 architectural directions (with my read against the codebase) |
| 45 | + |
| 46 | +### 1. Agent Decision Engine — **highest leverage** |
| 47 | +> Given task + repo state + risk level, what should the agent do next? |
| 48 | +
|
| 49 | +Central commands: `roam next`, `roam plan`, `roam permit`. Already |
| 50 | +seeded in `dev/agent-os-positioning-2026-05-11.md` as R15 work. |
| 51 | + |
| 52 | +**What we have**: `roam ask` (TF-IDF intent dispatcher), |
| 53 | +`roam_for_<situation>` family (R8.E4) — partial. |
| 54 | +**What's missing**: the *machine-readable* next-action envelope |
| 55 | +(`{recommended_tools, reason, avoid, required_order}`) that a planner |
| 56 | +can actually consume. |
| 57 | + |
| 58 | +### 2. Policy Engine — **strategic differentiator** |
| 59 | +Graph-aware policies, NOT just path-aware. This is the key insight: |
| 60 | + |
| 61 | +> Block changes to any function reachable from `payment_settlement()` |
| 62 | +> unless the task is explicitly payment-related. |
| 63 | +
|
| 64 | +That's stronger than CODEOWNERS or any pattern-based rule because it |
| 65 | +uses the call graph as the policy substrate. Our `roam impact` |
| 66 | +already computes that reachability — the missing piece is a policy |
| 67 | +DSL that consumes it. |
| 68 | + |
| 69 | +**What we have**: `roam rules` with YAML rule packs (taint detectors, |
| 70 | +gate presets). Path-aware. |
| 71 | +**What's missing**: graph-aware rule clauses |
| 72 | +(`reachable_from`, `imports_from`, `clones_with`, `tested_by`). |
| 73 | + |
| 74 | +Pairs with `roam permit` (Phase 0 monetisation freebie). |
| 75 | + |
| 76 | +### 3. Context Pack Architecture |
| 77 | +> `roam context-pack "implement password reset" --budget 12000` |
| 78 | +
|
| 79 | +**What we have**: `roam retrieve` does graph-aware FTS5 + structural |
| 80 | +rerank + token budget. ~90% there. Plus `_apply_budget` for hard cap. |
| 81 | +**What's missing**: explicit "context pack" framing in the response |
| 82 | +shape — essential-files + relevant-symbols + likely-tests + |
| 83 | +**excluded-noise** as named sections. Plus a `--budget` flag that |
| 84 | +agents use deliberately. |
| 85 | + |
| 86 | +Cheapest win in the round-3 list. Could be `roam retrieve` with a |
| 87 | +new flag, or a thin alias `roam context-pack` for marketing. |
| 88 | + |
| 89 | +### 4. Agent Run Ledger — **Phase 4 monetisation product** |
| 90 | +Audit-grade trail for agent behaviour. Maps to the Audit Trail |
| 91 | +product in `monetization_v2_subscription_pivot.md`. |
| 92 | + |
| 93 | +**What we have**: in-toto v1 CGA attestations + cosign signing chain |
| 94 | +(R7/R10.2) — proves what was *indexed*, not what the *agent did*. |
| 95 | +**What's missing**: per-agent-run event stream: |
| 96 | + |
| 97 | +``` |
| 98 | +Task given |
| 99 | +Tools called |
| 100 | +Files inspected |
| 101 | +Warnings ignored ← this is the audit-grade signal |
| 102 | +Files changed |
| 103 | +Tests suggested vs run |
| 104 | +Critique result |
| 105 | +Final risk score |
| 106 | +``` |
| 107 | + |
| 108 | +Commands ChatGPT names: `roam agent-score`, `roam replay`, |
| 109 | +`roam audit-trail`, `roam explain-run`. The CGA chain is the |
| 110 | +substrate; the ledger is the UX. |
| 111 | + |
| 112 | +### 5. Repo-Local Agent Memory — **distinct from LLM memory** |
| 113 | +Engineering facts the *repo* knows, not the *model*: |
| 114 | + |
| 115 | +``` |
| 116 | +Auth tokens expire after 15 minutes. |
| 117 | +Never call Stripe directly outside billing/. |
| 118 | +Generated Prisma files should not be edited. |
| 119 | +``` |
| 120 | + |
| 121 | +**What we have**: MCP session memory (`mcp_extras/session.py`) for |
| 122 | +per-conversation context. Not repo-persistent. |
| 123 | +**What's missing**: a `.roam/memory.jsonl` (or similar) store + |
| 124 | +`roam memory add/list/relevant`. |
| 125 | + |
| 126 | +Strategic insight: this is the engineering counterpart to model |
| 127 | +memory and *portable across agents*. "LLM memory = model-specific. |
| 128 | +Roam memory = repo-specific." Strong differentiator. |
| 129 | + |
| 130 | +### 6. Multi-Agent Lease / Territory System — **novel** |
| 131 | +> Multiple agents working in one repo need boundaries. |
| 132 | +> `roam lease request "frontend checkout form"` |
| 133 | +
|
| 134 | +**What we have**: `roam partition`, `roam orchestrate`, `roam fleet` |
| 135 | +(graph-based work-splitting for multi-agent). Static analysis layer. |
| 136 | +**What's missing**: a stateful *lease* layer — claim a graph |
| 137 | +territory, detect conflicts, release on completion. |
| 138 | + |
| 139 | +Pairs naturally with what we built. Concrete demo: "Roam assigns |
| 140 | +graph-aware work territories so agents are less likely to collide." |
| 141 | + |
| 142 | +### 7. Intent-to-Diff Architecture — (= round 2 #11) |
| 143 | +Already captured. `roam intent-check "<task>"` compares stated intent |
| 144 | +to actual diff and flags drift. |
| 145 | + |
| 146 | +### 8. Confidence / Uncertainty Architecture |
| 147 | +Every result should expose confidence: |
| 148 | + |
| 149 | +```json |
| 150 | +{ |
| 151 | + "affected_tests": [ |
| 152 | + {"file": "tests/auth.test.ts", "confidence": 0.91, |
| 153 | + "reason": "directly imports modified AuthService"}, |
| 154 | + {"file": "tests/session.test.ts", "confidence": 0.63, |
| 155 | + "reason": "shares token validation path"} |
| 156 | + ] |
| 157 | +} |
| 158 | +``` |
| 159 | + |
| 160 | +**What we have**: severity / verdict / partial_success on every |
| 161 | +envelope. Per-finding `confidence` exists on some surfaces (taint |
| 162 | +findings, clone matches) but not uniformly. |
| 163 | +**What's missing**: a contract that every list-of-findings tool |
| 164 | +returns `{value, confidence, reason}` triples. |
| 165 | + |
| 166 | +### 9. Pluggable Analyzer Architecture — **big undertaking** |
| 167 | +> `roam-plugin-nextjs`, `roam-plugin-laravel`, `roam-plugin-prisma`, … |
| 168 | +
|
| 169 | +**What we have**: `bridges/` module (Salesforce, protobuf, REST API, |
| 170 | +template, config). The substrate for plugin-style framework |
| 171 | +intelligence. Per-language extractors registered via |
| 172 | +`languages/registry.py`. |
| 173 | +**What's missing**: a public plugin protocol with stable hooks + |
| 174 | +manifest format + discovery mechanism. |
| 175 | + |
| 176 | +This is a multi-quarter direction, not a quick win. But it's the |
| 177 | +right architecture for the long term — keeps the core small while |
| 178 | +ecosystem grows. |
| 179 | + |
| 180 | +### 10. Agent Capability Registry — (= round 2 #1) |
| 181 | +First-class metadata per @_tool. Already queued as R13. |
| 182 | + |
| 183 | +### 11. Local-First Sync Architecture |
| 184 | +> Code stays local. Derived intelligence can sync. |
| 185 | +
|
| 186 | +**What we have**: local-first CLI (no telemetry). Roam Cloud planned |
| 187 | +in Phase 3 but architecture wasn't crisp. |
| 188 | +**What's missing**: a clear sync model — what *can* sync (risk |
| 189 | +scores, agent-run summaries, warnings, trend metrics, policy |
| 190 | +violations, review outcomes) vs what *cannot* (code, diffs, secrets). |
| 191 | + |
| 192 | +This sharpens the Roam Cloud story without weakening the privacy |
| 193 | +posture. Should be the load-bearing principle for the Cloud product. |
| 194 | + |
| 195 | +### 12. Evidence-Based PR Review Architecture |
| 196 | +Every PR comment points to specific graph evidence, not generic prose: |
| 197 | + |
| 198 | +``` |
| 199 | +Risk: high |
| 200 | +Evidence: |
| 201 | +- changed function has 14 callers |
| 202 | +- 3 callers are in auth-critical paths |
| 203 | +- no affected tests changed |
| 204 | +- similar clone exists in legacyAuth.ts |
| 205 | +- complexity increased from 8 → 13 |
| 206 | +``` |
| 207 | + |
| 208 | +**What we have**: `roam critique`, `roam pr-risk`, `roam diff` all |
| 209 | +produce structured findings. The PR-comment renderer |
| 210 | +(`roam pr-comment-render`) needs to surface this with explicit |
| 211 | +evidence-citations. |
| 212 | +**What's missing**: a `pr-comment` format that LEADS WITH evidence |
| 213 | +and includes per-finding graph citations. |
| 214 | + |
| 215 | +This is the answer to "how does Roam differentiate from CodeRabbit/ |
| 216 | +Greptile/Qodo?" — different *category* of evidence. |
| 217 | + |
| 218 | +### 13. Sandbox Execution Architecture |
| 219 | +> Roam runs tests / checks for the agent and translates output into |
| 220 | +> agent-readable next steps. |
| 221 | +
|
| 222 | +**What we have**: `roam affected-tests` suggests tests, doesn't run. |
| 223 | +**What's missing**: a runner (`roam run-tests --affected`) + |
| 224 | +output-parsing layer that returns |
| 225 | +`{failure_type, likely_file, suggested_next_tool, suggested_query}`. |
| 226 | + |
| 227 | +Risky area — running untrusted commands needs careful sandboxing. |
| 228 | +Defer until the rest is in place. |
| 229 | + |
| 230 | +### 14. Graph Versioning Architecture |
| 231 | +> Graph at commit A vs commit B → structural diff |
| 232 | +
|
| 233 | +**What we have**: incremental indexing (R9.B7 FTS5 incremental, R10.3 |
| 234 | +cluster cache) — we already maintain the graph efficiently. |
| 235 | +**What's missing**: an explicit *graph snapshot* per commit and a |
| 236 | +`roam graph-diff main..HEAD` / `roam architecture-drift` interface. |
| 237 | + |
| 238 | +Pairs naturally with `roam health --baseline auto` we already shipped. |
| 239 | +Marketing line: *"Not just what changed in code, but what changed in |
| 240 | +the system structure."* |
| 241 | + |
| 242 | +### 15. Agent Constitution Architecture |
| 243 | +Combines AGENTS.md + policy rules + repo memory into one declarative |
| 244 | +file (`.roam/constitution.yml`): |
| 245 | + |
| 246 | +```yaml |
| 247 | +principles: |
| 248 | + - Prefer small diffs. |
| 249 | + - Never edit generated files. |
| 250 | +critical_paths: |
| 251 | + - src/auth/** |
| 252 | + - src/billing/** |
| 253 | +required_roam_checks: |
| 254 | + before_edit: [retrieve, preflight] |
| 255 | + after_edit: [affected-tests, critique] |
| 256 | + before_pr: [pr-risk, intent-check] |
| 257 | +``` |
| 258 | +
|
| 259 | +This is the unifying primitive. AGENTS.md is the human-readable |
| 260 | +prose; constitution.yml is the machine-readable contract. |
| 261 | +
|
| 262 | +--- |
| 263 | +
|
| 264 | +## Top-5 priorities per ChatGPT, with my read |
| 265 | +
|
| 266 | +ChatGPT recommends these five compound beautifully: |
| 267 | +
|
| 268 | +1. **Agent Decision Engine** — `roam next` + decision substrate |
| 269 | +2. **Policy Engine** — graph-aware rules |
| 270 | +3. **Context Pack Architecture** — `roam context-pack --budget N` |
| 271 | +4. **Agent Run Ledger** — the Phase 4 monetisation product |
| 272 | +5. **Repo-Local Agent Memory** — `.roam/memory` |
| 273 | + |
| 274 | +My take: this ordering is right. **#1 + #2 are the highest-leverage |
| 275 | +near-term**. #3 is the cheapest win (extend `roam retrieve`). |
| 276 | +#4 is the audit-trail product we already have Phase 4 commitment |
| 277 | +to building. #5 is small-but-strategic — it makes Roam *portable |
| 278 | +across agent vendors*, which is the moat. |
| 279 | + |
| 280 | +--- |
| 281 | + |
| 282 | +## Mapping to the roadmap (R18+) |
| 283 | + |
| 284 | +R13-R17 (queued in BACKLOG) already cover rounds 1+2. Round 3 adds |
| 285 | +these directions, in proposed order: |
| 286 | + |
| 287 | +| Round | What | Round-3 ref | Notes | |
| 288 | +|---|---|---|---| |
| 289 | +| **R18** | Graph-aware policy DSL (`roam rules` extension) — `reachable_from`, `imports_from`, `clones_with`, `tested_by` clauses | #2 | Pairs with `roam permit` (Phase 0) | |
| 290 | +| **R19** | Repo-local memory store (`.roam/memory.jsonl` + `roam memory add/list/relevant`) | #5 | Small commit, big strategic win | |
| 291 | +| **R20** | Agent Run Ledger — per-agent-run event stream stored locally, signed via existing CGA chain. Powers `roam replay`, `roam agent-score`, `roam audit-trail` | #4 | Substrate exists (CGA); ledger is the UX | |
| 292 | +| **R21** | Multi-Agent Lease System — stateful claim/release over the existing graph-partition substrate | #6 | Pairs with `roam orchestrate` / `roam fleet` | |
| 293 | +| **R22** | Confidence/Uncertainty contract — every list-of-findings tool returns `{value, confidence, reason}` | #8 | Mechanical sweep | |
| 294 | +| **R23** | Graph Versioning — `roam graph-diff main..HEAD`, `roam architecture-drift` | #14 | Pairs with `roam trends` | |
| 295 | +| **R24** | Agent Constitution (`.roam/constitution.yml`) — unifies AGENTS.md + policy + required checks | #15 | Capstone primitive | |
| 296 | +| **R25+** | Pluggable Analyzer protocol (`roam-plugin-*`) | #9 | Multi-quarter direction | |
| 297 | + |
| 298 | +Plus the smaller refinements (already on positioning roadmap): |
| 299 | + |
| 300 | +- **R13** (agent-OS metadata pass) absorbs ChatGPT round-3 #10 |
| 301 | +- **R15** (`roam next`, `roam agents-md`, prompt snippets) absorbs round-3 #1 (Decision Engine) |
| 302 | +- **R16** (agent modes, intent-check, agent-score) absorbs round-3 #7 |
| 303 | +- **R17** (Cloud governance reframe) absorbs round-3 #4 + #11 (Local-First Sync architecture) |
| 304 | + |
| 305 | +--- |
| 306 | + |
| 307 | +## What this means for the monetisation pivot |
| 308 | + |
| 309 | +The round-3 capture sharpens the *story* of each paid product: |
| 310 | + |
| 311 | +- **Roam Review** = round-3 #12 (Evidence-Based PR Review) + |
| 312 | + round-3 #4 (Agent Run Ledger). Sells against CodeRabbit/Greptile/ |
| 313 | + Qodo on the basis of *different category of evidence*. |
| 314 | +- **Roam Cloud** = round-3 #11 (Local-First Sync) + dashboard surfaces |
| 315 | + derived intelligence (risk scores, run summaries, ignored-warning |
| 316 | + trails). *"Code stays local; derived intelligence can sync."* |
| 317 | +- **Roam Self-Hosted** = round-3 #2 (Policy Engine) + round-3 #15 |
| 318 | + (Agent Constitution) — the governance story for regulated |
| 319 | + industries. |
| 320 | + |
| 321 | +This means: **none of the monetisation pricing needs to change.** The |
| 322 | +positioning copy + product descriptions get sharper, but the SKU |
| 323 | +matrix from `pricing_v3_flat_launch.md` stays intact. |
| 324 | + |
| 325 | +--- |
| 326 | + |
| 327 | +## The single most-important insight from this round |
| 328 | + |
| 329 | +> Graph-aware policy is the move. |
| 330 | + |
| 331 | +Path-aware rules are commodified (every tool from CODEOWNERS to |
| 332 | +SonarQube does that). The graph-reachability primitive |
| 333 | +("block changes to anything reachable from `payment_settlement`") |
| 334 | +is something **we can do today** because we already have the graph |
| 335 | +substrate, and **competitors cannot** without rebuilding our |
| 336 | +indexing layer. |
| 337 | + |
| 338 | +This is the foundation R18 should be built on. Worth elevating in |
| 339 | +the ROADMAP as a strategic priority. |
0 commit comments