dev: capture agent-OS architecture notes (round 3 — decision substrate)

Cranot · Cranot · commit 12f07619da53 · 2026-05-11T01:08:30.000+03:00
Third round of external strategic input. Rounds 1+2 were positioning
+ tactical features; round 3 is architectural — the "agent decision
substrate" framing.

* dev/agent-os-architecture-2026-05-11.md — full capture (280 lines):
  - Unifying thesis: Roam as the agent's engineering brain
  - Killer loop: task -&gt; context -&gt; plan -&gt; permit -&gt; edit -&gt;
    critique -&gt; record -&gt; memory
  - 15 architectural directions, each mapped against current codebase
  - ChatGPT's top-5 priorities, validated
  - R18-R25 queued in dependency order
  - Strategic implications for the Review / Cloud / Self-Hosted SKUs
* dev/BACKLOG.md — R18-R25 queued with strategic notes.

Companion writes (auto-memory):
* memory/agent_os_architecture_2026_05_11.md — pointer + TL;DR
* MEMORY.md — ⭐⭐⭐ entry added

The single most-important insight across all three rounds:

  Graph-aware policy is the moat.

Path-aware policy is commodified (CODEOWNERS, SonarQube, etc).
Graph-reachability clauses ("block changes to anything reachable
from payment_settlement()") are something we can do today because
we have the graph substrate, and competitors can't without
rebuilding our indexing layer. R18 is built on this foundation.
diff --git a/dev/BACKLOG.md b/dev/BACKLOG.md
@@ -64,6 +64,31 @@ monetisation Phase-0 work.
 
 ---
 
+## R18–R25 — agent-OS architecture rounds (2026-05-11, round 3)
+
+ChatGPT architectural round (full capture:
+`dev/agent-os-architecture-2026-05-11.md`). These are larger, more
+strategic builds than R13-R17.
+
+| Round | What | Strategic note |
+|---|---|---|
+| **R18** | Graph-aware policy DSL — `roam rules` clauses `reachable_from`, `imports_from`, `clones_with`, `tested_by`. Pairs with `roam permit`. | **The moat.** Path-aware policy is commodified; graph-reachability is something we can do today because we have the graph substrate. |
+| **R19** | Repo-local agent memory — `.roam/memory.jsonl` + `roam memory add/list/relevant`. Distinct from LLM/Cursor/Claude memory. | Makes Roam *portable across agent vendors*. Strategic moat. |
+| **R20** | Agent Run Ledger — per-agent-run event stream signed via existing CGA chain. Powers `roam replay`, `roam agent-score`, `roam audit-trail`. | The UX layer on top of the Phase-4 Audit Trail product. |
+| **R21** | Multi-Agent Lease System — stateful claim/release over the existing graph-partition substrate. | Pairs with `roam orchestrate` / `roam fleet`. |
+| **R22** | Confidence/Uncertainty contract — every list-of-findings tool returns `{value, confidence, reason}` triples. | Mechanical sweep. |
+| **R23** | Graph Versioning — `roam graph-diff main..HEAD`, `roam architecture-drift`. | Pairs with `roam trends`. Marketing: *"not just what changed in code, but what changed in the system structure."* |
+| **R24** | Agent Constitution (`.roam/constitution.yml`) — unifies AGENTS.md + policy rules + memory + required checks. | Capstone primitive — the single declarative file an agent reads. |
+| **R25+** | Pluggable Analyzer protocol (`roam-plugin-*` for nextjs/laravel/prisma/django/…). | Multi-quarter direction. Bridge architecture is the substrate. |
+
+**Top-5 priorities (per ChatGPT, validated against our codebase)**:
+R15 Decision Engine, R18 Policy Engine, R3-context-pack extension,
+R20 Run Ledger, R19 Repo Memory. These five compound into the
+killer loop: *task → context → plan → permit → edit → critique →
+record → memory*.
+
+---
+
 ## Next pickup — pick from ROADMAP
 
 When this queue clears (it has), pull from `ROADMAP.md` in this order:
diff --git a/dev/agent-os-architecture-2026-05-11.md b/dev/agent-os-architecture-2026-05-11.md
@@ -0,0 +1,339 @@
+# Agent-OS architecture notes — 2026-05-11
+
+Third round of external strategic input from ChatGPT. Where rounds 1+2
+(`dev/agent-os-positioning-2026-05-11.md`) focused on positioning +
+tactical feature gaps, **this round is architectural** — the
+"agent decision substrate" framing and 15 architectural directions.
+
+Read this together with the round-1/2 capture and `internal/strategy/`.
+
+---
+
+## Unifying thesis
+
+> Roam should become **the agent's engineering brain** —
+> a local operating layer that lets coding agents navigate, act, and
+> self-check inside a codebase.
+
+The killer architectural loop ChatGPT proposes:
+
+```
+Agent gets task
+↓
+Roam retrieves context
+↓
+Roam plans safe path
+↓
+Roam grants/denies permissions
+↓
+Agent edits
+↓
+Roam critiques diff
+↓
+Roam records run
+↓
+Roam updates repo memory
+```
+
+This is the **demo / video / hero-story** to script. Every loop step
+maps to a Roam tool. The pitch becomes: *"Roam is the only thing the
+agent talks to between getting the task and shipping the PR."*
+
+---
+
+## The 15 architectural directions (with my read against the codebase)
+
+### 1. Agent Decision Engine — **highest leverage**
+> Given task + repo state + risk level, what should the agent do next?
+
+Central commands: `roam next`, `roam plan`, `roam permit`. Already
+seeded in `dev/agent-os-positioning-2026-05-11.md` as R15 work.
+
+**What we have**: `roam ask` (TF-IDF intent dispatcher),
+`roam_for_<situation>` family (R8.E4) — partial.
+**What's missing**: the *machine-readable* next-action envelope
+(`{recommended_tools, reason, avoid, required_order}`) that a planner
+can actually consume.
+
+### 2. Policy Engine — **strategic differentiator**
+Graph-aware policies, NOT just path-aware. This is the key insight:
+
+> Block changes to any function reachable from `payment_settlement()`
+> unless the task is explicitly payment-related.
+
+That's stronger than CODEOWNERS or any pattern-based rule because it
+uses the call graph as the policy substrate. Our `roam impact`
+already computes that reachability — the missing piece is a policy
+DSL that consumes it.
+
+**What we have**: `roam rules` with YAML rule packs (taint detectors,
+gate presets). Path-aware.
+**What's missing**: graph-aware rule clauses
+(`reachable_from`, `imports_from`, `clones_with`, `tested_by`).
+
+Pairs with `roam permit` (Phase 0 monetisation freebie).
+
+### 3. Context Pack Architecture
+> `roam context-pack "implement password reset" --budget 12000`
+
+**What we have**: `roam retrieve` does graph-aware FTS5 + structural
+rerank + token budget. ~90% there. Plus `_apply_budget` for hard cap.
+**What's missing**: explicit "context pack" framing in the response
+shape — essential-files + relevant-symbols + likely-tests +
+**excluded-noise** as named sections. Plus a `--budget` flag that
+agents use deliberately.
+
+Cheapest win in the round-3 list. Could be `roam retrieve` with a
+new flag, or a thin alias `roam context-pack` for marketing.
+
+### 4. Agent Run Ledger — **Phase 4 monetisation product**
+Audit-grade trail for agent behaviour. Maps to the Audit Trail
+product in `monetization_v2_subscription_pivot.md`.
+
+**What we have**: in-toto v1 CGA attestations + cosign signing chain
+(R7/R10.2) — proves what was *indexed*, not what the *agent did*.
+**What's missing**: per-agent-run event stream:
+
+```
+Task given
+Tools called
+Files inspected
+Warnings ignored          ← this is the audit-grade signal
+Files changed
+Tests suggested vs run
+Critique result
+Final risk score
+```
+
+Commands ChatGPT names: `roam agent-score`, `roam replay`,
+`roam audit-trail`, `roam explain-run`. The CGA chain is the
+substrate; the ledger is the UX.
+
+### 5. Repo-Local Agent Memory — **distinct from LLM memory**
+Engineering facts the *repo* knows, not the *model*:
+
+```
+Auth tokens expire after 15 minutes.
+Never call Stripe directly outside billing/.
+Generated Prisma files should not be edited.
+```
+
+**What we have**: MCP session memory (`mcp_extras/session.py`) for
+per-conversation context. Not repo-persistent.
+**What's missing**: a `.roam/memory.jsonl` (or similar) store +
+`roam memory add/list/relevant`.
+
+Strategic insight: this is the engineering counterpart to model
+memory and *portable across agents*. "LLM memory = model-specific.
+Roam memory = repo-specific." Strong differentiator.
+
+### 6. Multi-Agent Lease / Territory System — **novel**
+> Multiple agents working in one repo need boundaries.
+> `roam lease request "frontend checkout form"`
+
+**What we have**: `roam partition`, `roam orchestrate`, `roam fleet`
+(graph-based work-splitting for multi-agent). Static analysis layer.
+**What's missing**: a stateful *lease* layer — claim a graph
+territory, detect conflicts, release on completion.
+
+Pairs naturally with what we built. Concrete demo: "Roam assigns
+graph-aware work territories so agents are less likely to collide."
+
+### 7. Intent-to-Diff Architecture — (= round 2 #11)
+Already captured. `roam intent-check "<task>"` compares stated intent
+to actual diff and flags drift.
+
+### 8. Confidence / Uncertainty Architecture
+Every result should expose confidence:
+
+```json
+{
+  "affected_tests": [
+    {"file": "tests/auth.test.ts", "confidence": 0.91,
+     "reason": "directly imports modified AuthService"},
+    {"file": "tests/session.test.ts", "confidence": 0.63,
+     "reason": "shares token validation path"}
+  ]
+}
+```
+
+**What we have**: severity / verdict / partial_success on every
+envelope. Per-finding `confidence` exists on some surfaces (taint
+findings, clone matches) but not uniformly.
+**What's missing**: a contract that every list-of-findings tool
+returns `{value, confidence, reason}` triples.
+
+### 9. Pluggable Analyzer Architecture — **big undertaking**
+> `roam-plugin-nextjs`, `roam-plugin-laravel`, `roam-plugin-prisma`, …
+
+**What we have**: `bridges/` module (Salesforce, protobuf, REST API,
+template, config). The substrate for plugin-style framework
+intelligence. Per-language extractors registered via
+`languages/registry.py`.
+**What's missing**: a public plugin protocol with stable hooks +
+manifest format + discovery mechanism.
+
+This is a multi-quarter direction, not a quick win. But it's the
+right architecture for the long term — keeps the core small while
+ecosystem grows.
+
+### 10. Agent Capability Registry — (= round 2 #1)
+First-class metadata per @_tool. Already queued as R13.
+
+### 11. Local-First Sync Architecture
+> Code stays local. Derived intelligence can sync.
+
+**What we have**: local-first CLI (no telemetry). Roam Cloud planned
+in Phase 3 but architecture wasn't crisp.
+**What's missing**: a clear sync model — what *can* sync (risk
+scores, agent-run summaries, warnings, trend metrics, policy
+violations, review outcomes) vs what *cannot* (code, diffs, secrets).
+
+This sharpens the Roam Cloud story without weakening the privacy
+posture. Should be the load-bearing principle for the Cloud product.
+
+### 12. Evidence-Based PR Review Architecture
+Every PR comment points to specific graph evidence, not generic prose:
+
+```
+Risk: high
+Evidence:
+- changed function has 14 callers
+- 3 callers are in auth-critical paths
+- no affected tests changed
+- similar clone exists in legacyAuth.ts
+- complexity increased from 8 → 13
+```
+
+**What we have**: `roam critique`, `roam pr-risk`, `roam diff` all
+produce structured findings. The PR-comment renderer
+(`roam pr-comment-render`) needs to surface this with explicit
+evidence-citations.
+**What's missing**: a `pr-comment` format that LEADS WITH evidence
+and includes per-finding graph citations.
+
+This is the answer to "how does Roam differentiate from CodeRabbit/
+Greptile/Qodo?" — different *category* of evidence.
+
+### 13. Sandbox Execution Architecture
+> Roam runs tests / checks for the agent and translates output into
+> agent-readable next steps.
+
+**What we have**: `roam affected-tests` suggests tests, doesn't run.
+**What's missing**: a runner (`roam run-tests --affected`) +
+output-parsing layer that returns
+`{failure_type, likely_file, suggested_next_tool, suggested_query}`.
+
+Risky area — running untrusted commands needs careful sandboxing.
+Defer until the rest is in place.
+
+### 14. Graph Versioning Architecture
+> Graph at commit A vs commit B → structural diff
+
+**What we have**: incremental indexing (R9.B7 FTS5 incremental, R10.3
+cluster cache) — we already maintain the graph efficiently.
+**What's missing**: an explicit *graph snapshot* per commit and a
+`roam graph-diff main..HEAD` / `roam architecture-drift` interface.
+
+Pairs naturally with `roam health --baseline auto` we already shipped.
+Marketing line: *"Not just what changed in code, but what changed in
+the system structure."*
+
+### 15. Agent Constitution Architecture
+Combines AGENTS.md + policy rules + repo memory into one declarative
+file (`.roam/constitution.yml`):
+
+```yaml
+principles:
+  - Prefer small diffs.
+  - Never edit generated files.
+critical_paths:
+  - src/auth/**
+  - src/billing/**
+required_roam_checks:
+  before_edit: [retrieve, preflight]
+  after_edit:  [affected-tests, critique]
+  before_pr:   [pr-risk, intent-check]
+```
+
+This is the unifying primitive. AGENTS.md is the human-readable
+prose; constitution.yml is the machine-readable contract.
+
+---
+
+## Top-5 priorities per ChatGPT, with my read
+
+ChatGPT recommends these five compound beautifully:
+
+1. **Agent Decision Engine** — `roam next` + decision substrate
+2. **Policy Engine** — graph-aware rules
+3. **Context Pack Architecture** — `roam context-pack --budget N`
+4. **Agent Run Ledger** — the Phase 4 monetisation product
+5. **Repo-Local Agent Memory** — `.roam/memory`
+
+My take: this ordering is right. **#1 + #2 are the highest-leverage
+near-term**. #3 is the cheapest win (extend `roam retrieve`).
+#4 is the audit-trail product we already have Phase 4 commitment
+to building. #5 is small-but-strategic — it makes Roam *portable
+across agent vendors*, which is the moat.
+
+---
+
+## Mapping to the roadmap (R18+)
+
+R13-R17 (queued in BACKLOG) already cover rounds 1+2. Round 3 adds
+these directions, in proposed order:
+
+| Round | What | Round-3 ref | Notes |
+|---|---|---|---|
+| **R18** | Graph-aware policy DSL (`roam rules` extension) — `reachable_from`, `imports_from`, `clones_with`, `tested_by` clauses | #2 | Pairs with `roam permit` (Phase 0) |
+| **R19** | Repo-local memory store (`.roam/memory.jsonl` + `roam memory add/list/relevant`) | #5 | Small commit, big strategic win |
+| **R20** | Agent Run Ledger — per-agent-run event stream stored locally, signed via existing CGA chain. Powers `roam replay`, `roam agent-score`, `roam audit-trail` | #4 | Substrate exists (CGA); ledger is the UX |
+| **R21** | Multi-Agent Lease System — stateful claim/release over the existing graph-partition substrate | #6 | Pairs with `roam orchestrate` / `roam fleet` |
+| **R22** | Confidence/Uncertainty contract — every list-of-findings tool returns `{value, confidence, reason}` | #8 | Mechanical sweep |
+| **R23** | Graph Versioning — `roam graph-diff main..HEAD`, `roam architecture-drift` | #14 | Pairs with `roam trends` |
+| **R24** | Agent Constitution (`.roam/constitution.yml`) — unifies AGENTS.md + policy + required checks | #15 | Capstone primitive |
+| **R25+** | Pluggable Analyzer protocol (`roam-plugin-*`) | #9 | Multi-quarter direction |
+
+Plus the smaller refinements (already on positioning roadmap):
+
+- **R13** (agent-OS metadata pass) absorbs ChatGPT round-3 #10
+- **R15** (`roam next`, `roam agents-md`, prompt snippets) absorbs round-3 #1 (Decision Engine)
+- **R16** (agent modes, intent-check, agent-score) absorbs round-3 #7
+- **R17** (Cloud governance reframe) absorbs round-3 #4 + #11 (Local-First Sync architecture)
+
+---
+
+## What this means for the monetisation pivot
+
+The round-3 capture sharpens the *story* of each paid product:
+
+- **Roam Review** = round-3 #12 (Evidence-Based PR Review) +
+  round-3 #4 (Agent Run Ledger). Sells against CodeRabbit/Greptile/
+  Qodo on the basis of *different category of evidence*.
+- **Roam Cloud** = round-3 #11 (Local-First Sync) + dashboard surfaces
+  derived intelligence (risk scores, run summaries, ignored-warning
+  trails). *"Code stays local; derived intelligence can sync."*
+- **Roam Self-Hosted** = round-3 #2 (Policy Engine) + round-3 #15
+  (Agent Constitution) — the governance story for regulated
+  industries.
+
+This means: **none of the monetisation pricing needs to change.** The
+positioning copy + product descriptions get sharper, but the SKU
+matrix from `pricing_v3_flat_launch.md` stays intact.
+
+---
+
+## The single most-important insight from this round
+
+> Graph-aware policy is the move.
+
+Path-aware rules are commodified (every tool from CODEOWNERS to
+SonarQube does that). The graph-reachability primitive
+("block changes to anything reachable from `payment_settlement`")
+is something **we can do today** because we already have the graph
+substrate, and **competitors cannot** without rebuilding our
+indexing layer.
+
+This is the foundation R18 should be built on. Worth elevating in
+the ROADMAP as a strategic priority.