dev: capture agent-OS positioning notes (external strategic input)

Cranot · Cranot · commit bc0916125d44 · 2026-05-11T00:49:24.000+03:00
Two-round positioning audit from ChatGPT validates the v2 monetisation
thesis ("agent OS for codebases") and surfaces 4 high-leverage missing
primitives plus a hero-copy candidate.

* dev/agent-os-positioning-2026-05-11.md — full capture (220 lines):
  - Executive thesis from both rounds
  - Inventory of what we already have (~80% of the wishlist)
  - The 4 actual gaps: roam next, per-tool metadata,
    roam permit (already on roadmap), roam agents-md
  - 14 secondary recommendations mapped against codebase
  - Hero copy candidate: "Agents should not edit blind. Roam is
    their map."
  - 5 proposed roadmap rounds (R13-R17)
* dev/BACKLOG.md — R13-R17 queued with risk levels.

Companion writes (auto-memory):
* memory/agent_os_positioning_2026_05_11.md — pointer + TL;DR
* MEMORY.md — index updated with new ⭐⭐⭐ entry
diff --git a/dev/BACKLOG.md b/dev/BACKLOG.md
@@ -44,6 +44,26 @@ parallelisable) per the user's stated direction.
 
 ---
 
+## R13–R17 — agent-OS positioning rounds (2026-05-11)
+
+External strategic input from a ChatGPT positioning audit (full
+capture: `dev/agent-os-positioning-2026-05-11.md`). Five rounds queued
+in dependency order:
+
+| Round | What | Risk |
+|---|---|---|
+| **R13** | Agent-OS metadata pass — add `phase`, `recommended_next_tools`, `avoid_when`, `confidence_fields` to `_TOOL_METADATA` for every @_tool. Surface in `roam_catalog`. | LOW — pure substrate |
+| **R14** | Hero-copy A/B ("Agents should not edit blind. Roam is their map.") + capability-coverage reframe ("145 capabilities across 9 categories") + rename playful commands' aliases (vibe-check → intent-check, weather → churn, dark-matter → hidden-complexity) | LOW — website only |
+| **R15** | `roam agents-md` + `roam next` (agent router) + prompt snippets product surface. Free-OSS, viral. | MED — new commands |
+| **R16** | Agent modes (read_only/safe_edit/migration/autonomous_pr) + `roam intent-check` + `roam agent-score`. Pairs with `roam permit`. | MED |
+| **R17** | Reposition Roam Cloud as "governance for agent-written code" — Cloud dashboard cards for which-agents-changed-what, blast-radius distribution, ignored-warning trail. | MED — copy + dashboard |
+
+R13 is the highest-leverage low-risk item. R14 + R15 are
+parallelisable with R13. R16/R17 are mid-term, paired with the
+monetisation Phase-0 work.
+
+---
+
 ## Next pickup — pick from ROADMAP
 
 When this queue clears (it has), pull from `ROADMAP.md` in this order:
diff --git a/dev/agent-os-positioning-2026-05-11.md b/dev/agent-os-positioning-2026-05-11.md
@@ -0,0 +1,322 @@
+# Agent-OS positioning notes — 2026-05-11
+
+External strategic input from ChatGPT, in two rounds. Captured here
+because the synthesis is high-signal and crosses several roadmap lanes
+at once. Read alongside `internal/strategy/` and the
+`monetization_v2_subscription_pivot.md` memory file.
+
+---
+
+## Executive thesis (both rounds combined)
+
+> "Roam should become the thing an agent consults before every meaningful coding decision."
+
+Not "another PR reviewer." Not "AI coding IDE." A new category:
+
+- **Agent-side framing**: a *runtime / context protocol for codebase-aware agents*
+- **Buyer-side framing**: *governance for agent-written code*
+
+This is more interesting than competing with CodeRabbit / Greptile / Qodo
+on semantic-review territory — those reviewers stay in the
+human-readable diff-prose lane. Roam stays in the structural-graph lane
+and runs *alongside*, before/during/after, not instead.
+
+The 211 commands + 145 MCP tools surface is **not bloat** if the
+capability metadata is good enough that an agent never has to guess
+which tool to call. ChatGPT's framing:
+
+> Expose every meaningful capability, but make the capability graph
+> legible, typed, composable, and safe.
+
+---
+
+## Hero copy candidates
+
+The standout line from ChatGPT round 1:
+
+> **"Agents should not edit blind. Roam is their map."**
+
+That's sharper than the current home-page hero
+("Coding agents can write code. Roam is the structural intelligence
+they don't have."). Worth testing as a copy A/B.
+
+Other candidates from round 2:
+
+- *"Roam is the structural context layer for AI coding agents."*
+- *"Code agents need more than prompts. They need repo intelligence."*
+
+The framing-softener for the "no human reads code anymore" line, which
+is emotionally true for vibe-coding but alienates enterprise buyers:
+
+> *"As agents write more code, humans increasingly review intent and
+> outcomes, not every line. Roam gives agents the structural context
+> they need to operate safely between those human checkpoints."*
+
+(Already aligned with the existing positioning per
+`product_positioning_2026_05_09.md` memory.)
+
+---
+
+## What we already have (the ~80% baseline)
+
+Cross-checked against the codebase as of commit `feeafac`:
+
+### Capability discovery
+- `roam surface` — canonical capability registry as JSON or text
+- `roam_catalog` MCP tool — agent-readable surface
+- `roam ask` — TF-IDF intent dispatcher over a 24-recipe registry
+- `roam_complete` — FTS5 prefix completion (cheaper than search)
+- `roam --help` Start-here panel: 5 verbs (init / understand / context / preflight / critique / ask)
+
+### Capability metadata (per @_tool)
+- `_TOOL_METADATA[name]`: `core`, `read_only`, `destructive`, `version`, `when_to_use`, `examples`
+- MCP annotations: `readOnlyHint`, `destructiveHint`, `idempotentHint`, `openWorldHint`
+- `taskSupport` hint (optional / required / forbidden)
+- `deferLoading` annotation for context-reduction via Tool Search
+- `_DESTRUCTIVE_TOOLS`, `_NON_READ_ONLY_TOOLS`, `_CACHEABLE_COMMANDS` sets
+
+### Workflow recipes (the "canonical loops" ChatGPT recommends)
+- `roam_for_new_feature` — understand + search + context + complexity
+- `roam_for_bug_fix` — diagnose + tests + diff + context
+- `roam_for_refactor` — preflight + impact + complexity + clones
+- `roam_for_security_review` — taint + vuln + critique + adversarial
+- Plus `roam_explore`, `roam_diagnose_issue` compounds with `_COMPOUND_WORKFLOW_RECIPES` metadata
+
+### Safety contracts
+- Soft contract enforcement on `roam_mutate` (R7) — `contract_check`
+  checks for prior `roam_simulate` and surfaces `contract_compliance` advice
+- Stale-index affordance on every read tool
+- Structured errors with `retryable`, `doc_link`, `suggested_action`,
+  `severity` (R9 fix kept these in trim mode too)
+- Exit codes: `EXIT_USAGE`, `EXIT_INDEX_MISSING`, `EXIT_INDEX_STALE`,
+  `EXIT_GATE_FAILURE`, `EXIT_PARTIAL`
+- `agent_contract` block on every JSON envelope — derived
+  `{facts, risks, next_commands, confidence}` (R7.S14)
+
+### Per-agent install
+- `roam mcp-setup --write` for 6 platforms: claude-code, cursor,
+  windsurf, vscode (Copilot Agent Mode), gemini-cli, codex-cli
+- Integration tutorials at `templates/distribution/landing-page/docs/integration-tutorials.html`
+
+### Output efficiency
+- `--json` mode across all commands
+- Token-budget cap via `_apply_budget` + `--budget` flag
+- Reference-based handles for >50KB envelopes (R8.E8) —
+  content-addressed, agent fetches via `roam_fetch_handle`
+- `summarize=True` default for compound tools when `ROAM_AI_ENABLED=1`
+  (uses MCP sampling to compress to ~1-2KB prose)
+
+### Eval substrate
+- `roam eval-retrieve` with `--emit-format coderag|beir` JSONL emit
+- Bench harness at `eval/harness.py`
+
+---
+
+## The actual gaps (what's missing)
+
+ChatGPT's "build these four next" priority, with my read:
+
+### 1. `roam next` — agent router / front door (NEW)
+> Even agents can get tool-choice fatigue. Add a high-level command
+> like `roam next "I need to refactor auth middleware"` returning
+> `{recommended_tools, reason, avoid, required_order}`.
+
+We have `roam ask` which is TF-IDF intent → recipe dispatch, but it
+doesn't expose `avoid` or `required_order` as machine-readable
+fields. Closest existing primitive — `roam_for_<situation>` family,
+which is recipe-keyed not free-form-task-keyed.
+
+**Estimated work**: extend `cmd_ask` to emit a machine-readable
+`recommended_tools` block, OR ship `cmd_next` as a thinner front
+door over the same registry. Plus per-tool `phase` / `avoid_when` /
+`recommended_next_tools` metadata in `_TOOL_METADATA`. ~1-2 days.
+
+### 2. `roam plan` — task-to-execution-path generator (EXTEND)
+> Output: likely files, risks, suggested sequence (1-8 steps).
+> Not "generate code" — give the agent a safe execution path.
+
+`roam plan` exists already (in the curated-32 help-template list)
+but is more refactor-flavoured. ChatGPT's `roam plan "Add password
+reset flow"` is a *task* planner, not a refactor planner. Adjacent
+but distinct.
+
+**Estimated work**: new sub-mode or new command (`roam task-plan`?).
+Needs a small LLM-callable scaffolder OR rule-based heuristics over
+the structural graph. ~3-5 days.
+
+### 3. `roam permit` — permission model (NEW, on Phase-0 monetisation roadmap)
+> `{allowed, safe_zones, forbidden_zones, requires_preflight, required_checks}`
+
+Already queued in `build_priorities.md` as a Phase-0 freebie. ChatGPT's
+proposal matches what we'd build. **High priority, high payoff** —
+this is the wedge into "governance for agent-written code."
+
+**Estimated work**: 2-3 days for MVP. YAML policy file +
+`roam permit "<task>"` CLI + MCP tool. Pairs with the agent-modes
+concept (#6 below).
+
+### 4. `roam intent-check` — diff vs stated task (NEW or EXTEND)
+> Compares the stated task to the diff. "Diff partially matches.
+> Expected: X. Unexpected: Y." Very valuable for vibe-coding.
+
+`roam intent` exists in the curated-32 list — need to verify what it
+does today. May overlap with this. If it doesn't, this is a clear
+gap. The existing `roam critique` does *clone-not-edited* checks but
+not stated-intent-vs-diff comparison.
+
+**Estimated work**: ~2-3 days. Needs a small LLM call (or a
+rule-based heuristic over diff vs task description). Could be a
+mode of `roam critique`.
+
+---
+
+## Secondary recommendations from round 2 (14 items)
+
+In rough priority order, with my reads:
+
+| # | Recommendation | Status | Notes |
+|---|---|---|---|
+| 3 | `roam agents-md` (generate AGENTS.md) | NEW | Project-level agent manual. ~1-2 days. Easy + viral. |
+| 4 | Repo-local agent memory (`.roam/memory`) | NEW | Distinct from MCP session memory. Small concept, big lever. |
+| 5 | Permission model | (= #3 above) | Already on Phase 0 |
+| 6 | Agent modes (read_only / safe_edit / migration / autonomous_pr) | NEW | Pairs with `roam permit`. Strong differentiator. |
+| 7 | `--format compact-json` | EXTEND | We have `--json` + budget; add a `compact-json` mode that strips prose |
+| 8 | `roam compress-context "<task>" --budget N` | EXTEND | `roam retrieve` is partially this; add explicit budget framing + compression |
+| 9 | Benchmark / eval harness with agent-vs-no-agent comparisons | EXTEND | `roam eval-retrieve` exists; agent-eval is the marketing-gold version |
+| 10 | `roam agent-score` (`git diff \| roam agent-score`) | NEW | Scorecard for agent runs. Extends `critique`. |
+| 11 | `roam intent-check` | (= #4 above) | |
+| 12 | Per-agent install pages (Claude Code / Cursor / Codex / Gemini / Roo / Continue / Windsurf) | EXTEND | We have integration-tutorials.html as one page; split into 7 landing pages |
+| 13 | Prompt snippets as product surface | NEW | "Before changing code, call roam retrieve. After, pipe diff to critique." Easy + sticky |
+| 14 | Project policy rules (YAML) | EXTEND | `roam rules` exists with rule packs; extend to agent-policy domain |
+| 15 | Roam Cloud = "governance for agent-written code" reframe | POSITIONING | Site copy + product description. No code change. |
+| 16 | Killer demo (agent navigates, doesn't just code) | MARKETING | "The agent did not just code. It navigated." |
+| 17 | Rename playful commands (vibe-check → intent-check, weather → churn, dark-matter → hidden-complexity) | POSITIONING | Keep aliases; serious names in docs |
+
+---
+
+## Proposed integration into the roadmap
+
+Mapping all of this against `dev/BACKLOG.md` + `dev/ROADMAP.md` +
+the monetisation phasing in `monetization_v2_subscription_pivot.md` /
+`build_priorities.md`:
+
+### Phase 0 (free-OSS wedges) — UNCHANGED, validated
+Already on the roadmap, ChatGPT independently agrees:
+- `roam permit` (= recommendation #5/#15)
+- `roam postmortem`
+- `roam ai-governance-check` (renamed from article-12-check)
+
+Worth adding as a fourth Phase-0 freebie:
+
+- `roam next` (= ChatGPT's #1 recommendation) — agent router. Cheap
+  to ship, high signal, extends `roam ask`. Strong "this tool is
+  agent-first" positioning hook.
+
+### R13 — Agent-OS metadata pass (new round, parallelisable)
+Mechanical sweep across the @_tool registry to add the metadata
+ChatGPT round 1 called out as the actual fix:
+- `phase` per command (before_edit / during_edit / after_edit /
+  before_pr / debug / refactor / etc.)
+- `recommended_next_tools` per command
+- `avoid_when` per command
+- `confidence_fields` flag where applicable
+
+Surface in `roam_catalog`. Update `roam_for_<situation>` compounds
+to consume the new metadata. ~2-4 hours of agent work. **Pure
+substrate — no risk of regressions, no test crashes.**
+
+### R14 — Hero-copy A/B + capability-coverage reframe (website only)
+- Hero: "Agents should not edit blind. Roam is their map."
+- Compare-table cell: "Yes (145 capabilities across [9 categories])"
+  instead of "Yes (145 tools)"
+- Meta description / og:description: "Agents should not edit blind"
+  framing
+- Rename playful commands' aliases in user-facing docs
+  (vibe-check → intent-check, weather → churn,
+  dark-matter → hidden-complexity)
+
+### R15 — `roam agents-md` + prompt snippets (free-OSS, viral)
+Ship as Phase 0.5. The AGENTS.md generator is the kind of thing
+people share screenshots of. Prompt snippets land on a /agent-prompts
+page as canonical instructions.
+
+### R16 — Agent modes + `roam intent-check` + `roam agent-score` (mid-tier)
+Pairs with `roam permit` to form the "governance for agent-written
+code" enterprise wedge. This is the layer that lets a buyer say
+"Claude Code may operate in safe_edit mode, not migration mode."
+
+### R17 — Roam Cloud as agent-governance dashboard (commercial pivot)
+Reposition Cloud away from "code health dashboard" toward:
+
+- Which agents changed what
+- Which changes ignored Roam warnings
+- PR blast-radius distribution
+- Where agents are repeatedly creating complexity
+- Risky-files-most-modified ranking
+
+Per-team and per-repo trend charts stay; the *headline framing*
+shifts. This is mostly a copy + dashboard-card refresh, not a
+ground-up rebuild.
+
+---
+
+## The "killer demo" storyline
+
+Per ChatGPT round 2 #16 — worth scripting and recording:
+
+```text
+1. Agent receives task ("add password reset")
+2. Agent calls roam_for_new_feature
+   -> Roam returns: relevant files, complexity report, similar code
+3. Agent calls roam_permit "password reset"
+   -> Roam returns: allowed=true, safe_zones=[...], required_checks=[...]
+4. Agent edits only the safe-zone files
+5. Agent pipes diff into roam_critique
+   -> Roam catches: missed an affected test, branch-complexity warning
+6. Agent fixes, re-runs critique
+7. Agent generates PR summary via roam_pr_analyze
+
+The agent did not just code. It navigated.
+```
+
+This is the demo that should land on /demos and be shareable as a
+~60-second video.
+
+---
+
+## Inventory of what to capture from this round in source-of-truth
+
+To prevent this strategic input from getting lost when the user
+migrates / the next session starts:
+
+1. **This file** — `dev/agent-os-positioning-2026-05-11.md` (committed)
+2. **Memory pointer** — `~/.claude/projects/.../memory/` entry
+   pointing here
+3. **MEMORY.md index** — top-of-file entry under "Read first every session"
+4. **dev/BACKLOG.md** — add R13/R14/R15/R16/R17 as queued rounds
+
+---
+
+## My honest take
+
+ChatGPT's framing genuinely advances our positioning. The core
+insight that crosses both rounds:
+
+> The 211 commands + 145 MCP tools is not bloat — it's a wide
+> capability surface. The fix is per-tool metadata + workflow
+> recipes + agent modes, not pruning.
+
+We've already built ~80% of what was recommended. The remaining
+20% is high-leverage:
+
+- **`roam next`** + per-tool `phase`/`avoid_when`/`recommended_next_tools`
+  metadata is the single most-impactful gap. It turns the
+  "agents-need-to-pick-tools" problem into a one-call lookup.
+- **`roam permit` + agent modes** is the enterprise wedge for
+  governance.
+- **"Agents should not edit blind. Roam is their map."** is hero copy
+  worth shipping on the next website pass.
+
+The strategic direction is consistent with the existing v2
+monetisation thesis — these are *tactical refinements* of a
+direction we're already going. None of it requires re-pivoting.