Skip to content

Commit 12f0761

Browse files
committed
dev: capture agent-OS architecture notes (round 3 — decision substrate)
Third round of external strategic input. Rounds 1+2 were positioning + tactical features; round 3 is architectural — the "agent decision substrate" framing. * dev/agent-os-architecture-2026-05-11.md — full capture (280 lines): - Unifying thesis: Roam as the agent's engineering brain - Killer loop: task -> context -> plan -> permit -> edit -> critique -> record -> memory - 15 architectural directions, each mapped against current codebase - ChatGPT's top-5 priorities, validated - R18-R25 queued in dependency order - Strategic implications for the Review / Cloud / Self-Hosted SKUs * dev/BACKLOG.md — R18-R25 queued with strategic notes. Companion writes (auto-memory): * memory/agent_os_architecture_2026_05_11.md — pointer + TL;DR * MEMORY.md — ⭐⭐⭐ entry added The single most-important insight across all three rounds: Graph-aware policy is the moat. Path-aware policy is commodified (CODEOWNERS, SonarQube, etc). Graph-reachability clauses ("block changes to anything reachable from payment_settlement()") are something we can do today because we have the graph substrate, and competitors can't without rebuilding our indexing layer. R18 is built on this foundation.
1 parent bc09161 commit 12f0761

2 files changed

Lines changed: 364 additions & 0 deletions

File tree

dev/BACKLOG.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,31 @@ monetisation Phase-0 work.
6464

6565
---
6666

67+
## R18–R25 — agent-OS architecture rounds (2026-05-11, round 3)
68+
69+
ChatGPT architectural round (full capture:
70+
`dev/agent-os-architecture-2026-05-11.md`). These are larger, more
71+
strategic builds than R13-R17.
72+
73+
| Round | What | Strategic note |
74+
|---|---|---|
75+
| **R18** | Graph-aware policy DSL — `roam rules` clauses `reachable_from`, `imports_from`, `clones_with`, `tested_by`. Pairs with `roam permit`. | **The moat.** Path-aware policy is commodified; graph-reachability is something we can do today because we have the graph substrate. |
76+
| **R19** | Repo-local agent memory — `.roam/memory.jsonl` + `roam memory add/list/relevant`. Distinct from LLM/Cursor/Claude memory. | Makes Roam *portable across agent vendors*. Strategic moat. |
77+
| **R20** | Agent Run Ledger — per-agent-run event stream signed via existing CGA chain. Powers `roam replay`, `roam agent-score`, `roam audit-trail`. | The UX layer on top of the Phase-4 Audit Trail product. |
78+
| **R21** | Multi-Agent Lease System — stateful claim/release over the existing graph-partition substrate. | Pairs with `roam orchestrate` / `roam fleet`. |
79+
| **R22** | Confidence/Uncertainty contract — every list-of-findings tool returns `{value, confidence, reason}` triples. | Mechanical sweep. |
80+
| **R23** | Graph Versioning — `roam graph-diff main..HEAD`, `roam architecture-drift`. | Pairs with `roam trends`. Marketing: *"not just what changed in code, but what changed in the system structure."* |
81+
| **R24** | Agent Constitution (`.roam/constitution.yml`) — unifies AGENTS.md + policy rules + memory + required checks. | Capstone primitive — the single declarative file an agent reads. |
82+
| **R25+** | Pluggable Analyzer protocol (`roam-plugin-*` for nextjs/laravel/prisma/django/…). | Multi-quarter direction. Bridge architecture is the substrate. |
83+
84+
**Top-5 priorities (per ChatGPT, validated against our codebase)**:
85+
R15 Decision Engine, R18 Policy Engine, R3-context-pack extension,
86+
R20 Run Ledger, R19 Repo Memory. These five compound into the
87+
killer loop: *task → context → plan → permit → edit → critique →
88+
record → memory*.
89+
90+
---
91+
6792
## Next pickup — pick from ROADMAP
6893

6994
When this queue clears (it has), pull from `ROADMAP.md` in this order:
Lines changed: 339 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,339 @@
1+
# Agent-OS architecture notes — 2026-05-11
2+
3+
Third round of external strategic input from ChatGPT. Where rounds 1+2
4+
(`dev/agent-os-positioning-2026-05-11.md`) focused on positioning +
5+
tactical feature gaps, **this round is architectural** — the
6+
"agent decision substrate" framing and 15 architectural directions.
7+
8+
Read this together with the round-1/2 capture and `internal/strategy/`.
9+
10+
---
11+
12+
## Unifying thesis
13+
14+
> Roam should become **the agent's engineering brain**
15+
> a local operating layer that lets coding agents navigate, act, and
16+
> self-check inside a codebase.
17+
18+
The killer architectural loop ChatGPT proposes:
19+
20+
```
21+
Agent gets task
22+
23+
Roam retrieves context
24+
25+
Roam plans safe path
26+
27+
Roam grants/denies permissions
28+
29+
Agent edits
30+
31+
Roam critiques diff
32+
33+
Roam records run
34+
35+
Roam updates repo memory
36+
```
37+
38+
This is the **demo / video / hero-story** to script. Every loop step
39+
maps to a Roam tool. The pitch becomes: *"Roam is the only thing the
40+
agent talks to between getting the task and shipping the PR."*
41+
42+
---
43+
44+
## The 15 architectural directions (with my read against the codebase)
45+
46+
### 1. Agent Decision Engine — **highest leverage**
47+
> Given task + repo state + risk level, what should the agent do next?
48+
49+
Central commands: `roam next`, `roam plan`, `roam permit`. Already
50+
seeded in `dev/agent-os-positioning-2026-05-11.md` as R15 work.
51+
52+
**What we have**: `roam ask` (TF-IDF intent dispatcher),
53+
`roam_for_<situation>` family (R8.E4) — partial.
54+
**What's missing**: the *machine-readable* next-action envelope
55+
(`{recommended_tools, reason, avoid, required_order}`) that a planner
56+
can actually consume.
57+
58+
### 2. Policy Engine — **strategic differentiator**
59+
Graph-aware policies, NOT just path-aware. This is the key insight:
60+
61+
> Block changes to any function reachable from `payment_settlement()`
62+
> unless the task is explicitly payment-related.
63+
64+
That's stronger than CODEOWNERS or any pattern-based rule because it
65+
uses the call graph as the policy substrate. Our `roam impact`
66+
already computes that reachability — the missing piece is a policy
67+
DSL that consumes it.
68+
69+
**What we have**: `roam rules` with YAML rule packs (taint detectors,
70+
gate presets). Path-aware.
71+
**What's missing**: graph-aware rule clauses
72+
(`reachable_from`, `imports_from`, `clones_with`, `tested_by`).
73+
74+
Pairs with `roam permit` (Phase 0 monetisation freebie).
75+
76+
### 3. Context Pack Architecture
77+
> `roam context-pack "implement password reset" --budget 12000`
78+
79+
**What we have**: `roam retrieve` does graph-aware FTS5 + structural
80+
rerank + token budget. ~90% there. Plus `_apply_budget` for hard cap.
81+
**What's missing**: explicit "context pack" framing in the response
82+
shape — essential-files + relevant-symbols + likely-tests +
83+
**excluded-noise** as named sections. Plus a `--budget` flag that
84+
agents use deliberately.
85+
86+
Cheapest win in the round-3 list. Could be `roam retrieve` with a
87+
new flag, or a thin alias `roam context-pack` for marketing.
88+
89+
### 4. Agent Run Ledger — **Phase 4 monetisation product**
90+
Audit-grade trail for agent behaviour. Maps to the Audit Trail
91+
product in `monetization_v2_subscription_pivot.md`.
92+
93+
**What we have**: in-toto v1 CGA attestations + cosign signing chain
94+
(R7/R10.2) — proves what was *indexed*, not what the *agent did*.
95+
**What's missing**: per-agent-run event stream:
96+
97+
```
98+
Task given
99+
Tools called
100+
Files inspected
101+
Warnings ignored ← this is the audit-grade signal
102+
Files changed
103+
Tests suggested vs run
104+
Critique result
105+
Final risk score
106+
```
107+
108+
Commands ChatGPT names: `roam agent-score`, `roam replay`,
109+
`roam audit-trail`, `roam explain-run`. The CGA chain is the
110+
substrate; the ledger is the UX.
111+
112+
### 5. Repo-Local Agent Memory — **distinct from LLM memory**
113+
Engineering facts the *repo* knows, not the *model*:
114+
115+
```
116+
Auth tokens expire after 15 minutes.
117+
Never call Stripe directly outside billing/.
118+
Generated Prisma files should not be edited.
119+
```
120+
121+
**What we have**: MCP session memory (`mcp_extras/session.py`) for
122+
per-conversation context. Not repo-persistent.
123+
**What's missing**: a `.roam/memory.jsonl` (or similar) store +
124+
`roam memory add/list/relevant`.
125+
126+
Strategic insight: this is the engineering counterpart to model
127+
memory and *portable across agents*. "LLM memory = model-specific.
128+
Roam memory = repo-specific." Strong differentiator.
129+
130+
### 6. Multi-Agent Lease / Territory System — **novel**
131+
> Multiple agents working in one repo need boundaries.
132+
> `roam lease request "frontend checkout form"`
133+
134+
**What we have**: `roam partition`, `roam orchestrate`, `roam fleet`
135+
(graph-based work-splitting for multi-agent). Static analysis layer.
136+
**What's missing**: a stateful *lease* layer — claim a graph
137+
territory, detect conflicts, release on completion.
138+
139+
Pairs naturally with what we built. Concrete demo: "Roam assigns
140+
graph-aware work territories so agents are less likely to collide."
141+
142+
### 7. Intent-to-Diff Architecture — (= round 2 #11)
143+
Already captured. `roam intent-check "<task>"` compares stated intent
144+
to actual diff and flags drift.
145+
146+
### 8. Confidence / Uncertainty Architecture
147+
Every result should expose confidence:
148+
149+
```json
150+
{
151+
"affected_tests": [
152+
{"file": "tests/auth.test.ts", "confidence": 0.91,
153+
"reason": "directly imports modified AuthService"},
154+
{"file": "tests/session.test.ts", "confidence": 0.63,
155+
"reason": "shares token validation path"}
156+
]
157+
}
158+
```
159+
160+
**What we have**: severity / verdict / partial_success on every
161+
envelope. Per-finding `confidence` exists on some surfaces (taint
162+
findings, clone matches) but not uniformly.
163+
**What's missing**: a contract that every list-of-findings tool
164+
returns `{value, confidence, reason}` triples.
165+
166+
### 9. Pluggable Analyzer Architecture — **big undertaking**
167+
> `roam-plugin-nextjs`, `roam-plugin-laravel`, `roam-plugin-prisma`, …
168+
169+
**What we have**: `bridges/` module (Salesforce, protobuf, REST API,
170+
template, config). The substrate for plugin-style framework
171+
intelligence. Per-language extractors registered via
172+
`languages/registry.py`.
173+
**What's missing**: a public plugin protocol with stable hooks +
174+
manifest format + discovery mechanism.
175+
176+
This is a multi-quarter direction, not a quick win. But it's the
177+
right architecture for the long term — keeps the core small while
178+
ecosystem grows.
179+
180+
### 10. Agent Capability Registry — (= round 2 #1)
181+
First-class metadata per @_tool. Already queued as R13.
182+
183+
### 11. Local-First Sync Architecture
184+
> Code stays local. Derived intelligence can sync.
185+
186+
**What we have**: local-first CLI (no telemetry). Roam Cloud planned
187+
in Phase 3 but architecture wasn't crisp.
188+
**What's missing**: a clear sync model — what *can* sync (risk
189+
scores, agent-run summaries, warnings, trend metrics, policy
190+
violations, review outcomes) vs what *cannot* (code, diffs, secrets).
191+
192+
This sharpens the Roam Cloud story without weakening the privacy
193+
posture. Should be the load-bearing principle for the Cloud product.
194+
195+
### 12. Evidence-Based PR Review Architecture
196+
Every PR comment points to specific graph evidence, not generic prose:
197+
198+
```
199+
Risk: high
200+
Evidence:
201+
- changed function has 14 callers
202+
- 3 callers are in auth-critical paths
203+
- no affected tests changed
204+
- similar clone exists in legacyAuth.ts
205+
- complexity increased from 8 → 13
206+
```
207+
208+
**What we have**: `roam critique`, `roam pr-risk`, `roam diff` all
209+
produce structured findings. The PR-comment renderer
210+
(`roam pr-comment-render`) needs to surface this with explicit
211+
evidence-citations.
212+
**What's missing**: a `pr-comment` format that LEADS WITH evidence
213+
and includes per-finding graph citations.
214+
215+
This is the answer to "how does Roam differentiate from CodeRabbit/
216+
Greptile/Qodo?" — different *category* of evidence.
217+
218+
### 13. Sandbox Execution Architecture
219+
> Roam runs tests / checks for the agent and translates output into
220+
> agent-readable next steps.
221+
222+
**What we have**: `roam affected-tests` suggests tests, doesn't run.
223+
**What's missing**: a runner (`roam run-tests --affected`) +
224+
output-parsing layer that returns
225+
`{failure_type, likely_file, suggested_next_tool, suggested_query}`.
226+
227+
Risky area — running untrusted commands needs careful sandboxing.
228+
Defer until the rest is in place.
229+
230+
### 14. Graph Versioning Architecture
231+
> Graph at commit A vs commit B → structural diff
232+
233+
**What we have**: incremental indexing (R9.B7 FTS5 incremental, R10.3
234+
cluster cache) — we already maintain the graph efficiently.
235+
**What's missing**: an explicit *graph snapshot* per commit and a
236+
`roam graph-diff main..HEAD` / `roam architecture-drift` interface.
237+
238+
Pairs naturally with `roam health --baseline auto` we already shipped.
239+
Marketing line: *"Not just what changed in code, but what changed in
240+
the system structure."*
241+
242+
### 15. Agent Constitution Architecture
243+
Combines AGENTS.md + policy rules + repo memory into one declarative
244+
file (`.roam/constitution.yml`):
245+
246+
```yaml
247+
principles:
248+
- Prefer small diffs.
249+
- Never edit generated files.
250+
critical_paths:
251+
- src/auth/**
252+
- src/billing/**
253+
required_roam_checks:
254+
before_edit: [retrieve, preflight]
255+
after_edit: [affected-tests, critique]
256+
before_pr: [pr-risk, intent-check]
257+
```
258+
259+
This is the unifying primitive. AGENTS.md is the human-readable
260+
prose; constitution.yml is the machine-readable contract.
261+
262+
---
263+
264+
## Top-5 priorities per ChatGPT, with my read
265+
266+
ChatGPT recommends these five compound beautifully:
267+
268+
1. **Agent Decision Engine** — `roam next` + decision substrate
269+
2. **Policy Engine** — graph-aware rules
270+
3. **Context Pack Architecture** — `roam context-pack --budget N`
271+
4. **Agent Run Ledger** — the Phase 4 monetisation product
272+
5. **Repo-Local Agent Memory** — `.roam/memory`
273+
274+
My take: this ordering is right. **#1 + #2 are the highest-leverage
275+
near-term**. #3 is the cheapest win (extend `roam retrieve`).
276+
#4 is the audit-trail product we already have Phase 4 commitment
277+
to building. #5 is small-but-strategic — it makes Roam *portable
278+
across agent vendors*, which is the moat.
279+
280+
---
281+
282+
## Mapping to the roadmap (R18+)
283+
284+
R13-R17 (queued in BACKLOG) already cover rounds 1+2. Round 3 adds
285+
these directions, in proposed order:
286+
287+
| Round | What | Round-3 ref | Notes |
288+
|---|---|---|---|
289+
| **R18** | Graph-aware policy DSL (`roam rules` extension) — `reachable_from`, `imports_from`, `clones_with`, `tested_by` clauses | #2 | Pairs with `roam permit` (Phase 0) |
290+
| **R19** | Repo-local memory store (`.roam/memory.jsonl` + `roam memory add/list/relevant`) | #5 | Small commit, big strategic win |
291+
| **R20** | Agent Run Ledger — per-agent-run event stream stored locally, signed via existing CGA chain. Powers `roam replay`, `roam agent-score`, `roam audit-trail` | #4 | Substrate exists (CGA); ledger is the UX |
292+
| **R21** | Multi-Agent Lease System — stateful claim/release over the existing graph-partition substrate | #6 | Pairs with `roam orchestrate` / `roam fleet` |
293+
| **R22** | Confidence/Uncertainty contract — every list-of-findings tool returns `{value, confidence, reason}` | #8 | Mechanical sweep |
294+
| **R23** | Graph Versioning — `roam graph-diff main..HEAD`, `roam architecture-drift` | #14 | Pairs with `roam trends` |
295+
| **R24** | Agent Constitution (`.roam/constitution.yml`) — unifies AGENTS.md + policy + required checks | #15 | Capstone primitive |
296+
| **R25+** | Pluggable Analyzer protocol (`roam-plugin-*`) | #9 | Multi-quarter direction |
297+
298+
Plus the smaller refinements (already on positioning roadmap):
299+
300+
- **R13** (agent-OS metadata pass) absorbs ChatGPT round-3 #10
301+
- **R15** (`roam next`, `roam agents-md`, prompt snippets) absorbs round-3 #1 (Decision Engine)
302+
- **R16** (agent modes, intent-check, agent-score) absorbs round-3 #7
303+
- **R17** (Cloud governance reframe) absorbs round-3 #4 + #11 (Local-First Sync architecture)
304+
305+
---
306+
307+
## What this means for the monetisation pivot
308+
309+
The round-3 capture sharpens the *story* of each paid product:
310+
311+
- **Roam Review** = round-3 #12 (Evidence-Based PR Review) +
312+
round-3 #4 (Agent Run Ledger). Sells against CodeRabbit/Greptile/
313+
Qodo on the basis of *different category of evidence*.
314+
- **Roam Cloud** = round-3 #11 (Local-First Sync) + dashboard surfaces
315+
derived intelligence (risk scores, run summaries, ignored-warning
316+
trails). *"Code stays local; derived intelligence can sync."*
317+
- **Roam Self-Hosted** = round-3 #2 (Policy Engine) + round-3 #15
318+
(Agent Constitution) — the governance story for regulated
319+
industries.
320+
321+
This means: **none of the monetisation pricing needs to change.** The
322+
positioning copy + product descriptions get sharper, but the SKU
323+
matrix from `pricing_v3_flat_launch.md` stays intact.
324+
325+
---
326+
327+
## The single most-important insight from this round
328+
329+
> Graph-aware policy is the move.
330+
331+
Path-aware rules are commodified (every tool from CODEOWNERS to
332+
SonarQube does that). The graph-reachability primitive
333+
("block changes to anything reachable from `payment_settlement`")
334+
is something **we can do today** because we already have the graph
335+
substrate, and **competitors cannot** without rebuilding our
336+
indexing layer.
337+
338+
This is the foundation R18 should be built on. Worth elevating in
339+
the ROADMAP as a strategic priority.

0 commit comments

Comments
 (0)