|
| 1 | +# Agent-OS positioning notes — 2026-05-11 |
| 2 | + |
| 3 | +External strategic input from ChatGPT, in two rounds. Captured here |
| 4 | +because the synthesis is high-signal and crosses several roadmap lanes |
| 5 | +at once. Read alongside `internal/strategy/` and the |
| 6 | +`monetization_v2_subscription_pivot.md` memory file. |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## Executive thesis (both rounds combined) |
| 11 | + |
| 12 | +> "Roam should become the thing an agent consults before every meaningful coding decision." |
| 13 | +
|
| 14 | +Not "another PR reviewer." Not "AI coding IDE." A new category: |
| 15 | + |
| 16 | +- **Agent-side framing**: a *runtime / context protocol for codebase-aware agents* |
| 17 | +- **Buyer-side framing**: *governance for agent-written code* |
| 18 | + |
| 19 | +This is more interesting than competing with CodeRabbit / Greptile / Qodo |
| 20 | +on semantic-review territory — those reviewers stay in the |
| 21 | +human-readable diff-prose lane. Roam stays in the structural-graph lane |
| 22 | +and runs *alongside*, before/during/after, not instead. |
| 23 | + |
| 24 | +The 211 commands + 145 MCP tools surface is **not bloat** if the |
| 25 | +capability metadata is good enough that an agent never has to guess |
| 26 | +which tool to call. ChatGPT's framing: |
| 27 | + |
| 28 | +> Expose every meaningful capability, but make the capability graph |
| 29 | +> legible, typed, composable, and safe. |
| 30 | +
|
| 31 | +--- |
| 32 | + |
| 33 | +## Hero copy candidates |
| 34 | + |
| 35 | +The standout line from ChatGPT round 1: |
| 36 | + |
| 37 | +> **"Agents should not edit blind. Roam is their map."** |
| 38 | +
|
| 39 | +That's sharper than the current home-page hero |
| 40 | +("Coding agents can write code. Roam is the structural intelligence |
| 41 | +they don't have."). Worth testing as a copy A/B. |
| 42 | + |
| 43 | +Other candidates from round 2: |
| 44 | + |
| 45 | +- *"Roam is the structural context layer for AI coding agents."* |
| 46 | +- *"Code agents need more than prompts. They need repo intelligence."* |
| 47 | + |
| 48 | +The framing-softener for the "no human reads code anymore" line, which |
| 49 | +is emotionally true for vibe-coding but alienates enterprise buyers: |
| 50 | + |
| 51 | +> *"As agents write more code, humans increasingly review intent and |
| 52 | +> outcomes, not every line. Roam gives agents the structural context |
| 53 | +> they need to operate safely between those human checkpoints."* |
| 54 | +
|
| 55 | +(Already aligned with the existing positioning per |
| 56 | +`product_positioning_2026_05_09.md` memory.) |
| 57 | + |
| 58 | +--- |
| 59 | + |
| 60 | +## What we already have (the ~80% baseline) |
| 61 | + |
| 62 | +Cross-checked against the codebase as of commit `feeafac`: |
| 63 | + |
| 64 | +### Capability discovery |
| 65 | +- `roam surface` — canonical capability registry as JSON or text |
| 66 | +- `roam_catalog` MCP tool — agent-readable surface |
| 67 | +- `roam ask` — TF-IDF intent dispatcher over a 24-recipe registry |
| 68 | +- `roam_complete` — FTS5 prefix completion (cheaper than search) |
| 69 | +- `roam --help` Start-here panel: 5 verbs (init / understand / context / preflight / critique / ask) |
| 70 | + |
| 71 | +### Capability metadata (per @_tool) |
| 72 | +- `_TOOL_METADATA[name]`: `core`, `read_only`, `destructive`, `version`, `when_to_use`, `examples` |
| 73 | +- MCP annotations: `readOnlyHint`, `destructiveHint`, `idempotentHint`, `openWorldHint` |
| 74 | +- `taskSupport` hint (optional / required / forbidden) |
| 75 | +- `deferLoading` annotation for context-reduction via Tool Search |
| 76 | +- `_DESTRUCTIVE_TOOLS`, `_NON_READ_ONLY_TOOLS`, `_CACHEABLE_COMMANDS` sets |
| 77 | + |
| 78 | +### Workflow recipes (the "canonical loops" ChatGPT recommends) |
| 79 | +- `roam_for_new_feature` — understand + search + context + complexity |
| 80 | +- `roam_for_bug_fix` — diagnose + tests + diff + context |
| 81 | +- `roam_for_refactor` — preflight + impact + complexity + clones |
| 82 | +- `roam_for_security_review` — taint + vuln + critique + adversarial |
| 83 | +- Plus `roam_explore`, `roam_diagnose_issue` compounds with `_COMPOUND_WORKFLOW_RECIPES` metadata |
| 84 | + |
| 85 | +### Safety contracts |
| 86 | +- Soft contract enforcement on `roam_mutate` (R7) — `contract_check` |
| 87 | + checks for prior `roam_simulate` and surfaces `contract_compliance` advice |
| 88 | +- Stale-index affordance on every read tool |
| 89 | +- Structured errors with `retryable`, `doc_link`, `suggested_action`, |
| 90 | + `severity` (R9 fix kept these in trim mode too) |
| 91 | +- Exit codes: `EXIT_USAGE`, `EXIT_INDEX_MISSING`, `EXIT_INDEX_STALE`, |
| 92 | + `EXIT_GATE_FAILURE`, `EXIT_PARTIAL` |
| 93 | +- `agent_contract` block on every JSON envelope — derived |
| 94 | + `{facts, risks, next_commands, confidence}` (R7.S14) |
| 95 | + |
| 96 | +### Per-agent install |
| 97 | +- `roam mcp-setup --write` for 6 platforms: claude-code, cursor, |
| 98 | + windsurf, vscode (Copilot Agent Mode), gemini-cli, codex-cli |
| 99 | +- Integration tutorials at `templates/distribution/landing-page/docs/integration-tutorials.html` |
| 100 | + |
| 101 | +### Output efficiency |
| 102 | +- `--json` mode across all commands |
| 103 | +- Token-budget cap via `_apply_budget` + `--budget` flag |
| 104 | +- Reference-based handles for >50KB envelopes (R8.E8) — |
| 105 | + content-addressed, agent fetches via `roam_fetch_handle` |
| 106 | +- `summarize=True` default for compound tools when `ROAM_AI_ENABLED=1` |
| 107 | + (uses MCP sampling to compress to ~1-2KB prose) |
| 108 | + |
| 109 | +### Eval substrate |
| 110 | +- `roam eval-retrieve` with `--emit-format coderag|beir` JSONL emit |
| 111 | +- Bench harness at `eval/harness.py` |
| 112 | + |
| 113 | +--- |
| 114 | + |
| 115 | +## The actual gaps (what's missing) |
| 116 | + |
| 117 | +ChatGPT's "build these four next" priority, with my read: |
| 118 | + |
| 119 | +### 1. `roam next` — agent router / front door (NEW) |
| 120 | +> Even agents can get tool-choice fatigue. Add a high-level command |
| 121 | +> like `roam next "I need to refactor auth middleware"` returning |
| 122 | +> `{recommended_tools, reason, avoid, required_order}`. |
| 123 | +
|
| 124 | +We have `roam ask` which is TF-IDF intent → recipe dispatch, but it |
| 125 | +doesn't expose `avoid` or `required_order` as machine-readable |
| 126 | +fields. Closest existing primitive — `roam_for_<situation>` family, |
| 127 | +which is recipe-keyed not free-form-task-keyed. |
| 128 | + |
| 129 | +**Estimated work**: extend `cmd_ask` to emit a machine-readable |
| 130 | +`recommended_tools` block, OR ship `cmd_next` as a thinner front |
| 131 | +door over the same registry. Plus per-tool `phase` / `avoid_when` / |
| 132 | +`recommended_next_tools` metadata in `_TOOL_METADATA`. ~1-2 days. |
| 133 | + |
| 134 | +### 2. `roam plan` — task-to-execution-path generator (EXTEND) |
| 135 | +> Output: likely files, risks, suggested sequence (1-8 steps). |
| 136 | +> Not "generate code" — give the agent a safe execution path. |
| 137 | +
|
| 138 | +`roam plan` exists already (in the curated-32 help-template list) |
| 139 | +but is more refactor-flavoured. ChatGPT's `roam plan "Add password |
| 140 | +reset flow"` is a *task* planner, not a refactor planner. Adjacent |
| 141 | +but distinct. |
| 142 | + |
| 143 | +**Estimated work**: new sub-mode or new command (`roam task-plan`?). |
| 144 | +Needs a small LLM-callable scaffolder OR rule-based heuristics over |
| 145 | +the structural graph. ~3-5 days. |
| 146 | + |
| 147 | +### 3. `roam permit` — permission model (NEW, on Phase-0 monetisation roadmap) |
| 148 | +> `{allowed, safe_zones, forbidden_zones, requires_preflight, required_checks}` |
| 149 | +
|
| 150 | +Already queued in `build_priorities.md` as a Phase-0 freebie. ChatGPT's |
| 151 | +proposal matches what we'd build. **High priority, high payoff** — |
| 152 | +this is the wedge into "governance for agent-written code." |
| 153 | + |
| 154 | +**Estimated work**: 2-3 days for MVP. YAML policy file + |
| 155 | +`roam permit "<task>"` CLI + MCP tool. Pairs with the agent-modes |
| 156 | +concept (#6 below). |
| 157 | + |
| 158 | +### 4. `roam intent-check` — diff vs stated task (NEW or EXTEND) |
| 159 | +> Compares the stated task to the diff. "Diff partially matches. |
| 160 | +> Expected: X. Unexpected: Y." Very valuable for vibe-coding. |
| 161 | +
|
| 162 | +`roam intent` exists in the curated-32 list — need to verify what it |
| 163 | +does today. May overlap with this. If it doesn't, this is a clear |
| 164 | +gap. The existing `roam critique` does *clone-not-edited* checks but |
| 165 | +not stated-intent-vs-diff comparison. |
| 166 | + |
| 167 | +**Estimated work**: ~2-3 days. Needs a small LLM call (or a |
| 168 | +rule-based heuristic over diff vs task description). Could be a |
| 169 | +mode of `roam critique`. |
| 170 | + |
| 171 | +--- |
| 172 | + |
| 173 | +## Secondary recommendations from round 2 (14 items) |
| 174 | + |
| 175 | +In rough priority order, with my reads: |
| 176 | + |
| 177 | +| # | Recommendation | Status | Notes | |
| 178 | +|---|---|---|---| |
| 179 | +| 3 | `roam agents-md` (generate AGENTS.md) | NEW | Project-level agent manual. ~1-2 days. Easy + viral. | |
| 180 | +| 4 | Repo-local agent memory (`.roam/memory`) | NEW | Distinct from MCP session memory. Small concept, big lever. | |
| 181 | +| 5 | Permission model | (= #3 above) | Already on Phase 0 | |
| 182 | +| 6 | Agent modes (read_only / safe_edit / migration / autonomous_pr) | NEW | Pairs with `roam permit`. Strong differentiator. | |
| 183 | +| 7 | `--format compact-json` | EXTEND | We have `--json` + budget; add a `compact-json` mode that strips prose | |
| 184 | +| 8 | `roam compress-context "<task>" --budget N` | EXTEND | `roam retrieve` is partially this; add explicit budget framing + compression | |
| 185 | +| 9 | Benchmark / eval harness with agent-vs-no-agent comparisons | EXTEND | `roam eval-retrieve` exists; agent-eval is the marketing-gold version | |
| 186 | +| 10 | `roam agent-score` (`git diff \| roam agent-score`) | NEW | Scorecard for agent runs. Extends `critique`. | |
| 187 | +| 11 | `roam intent-check` | (= #4 above) | | |
| 188 | +| 12 | Per-agent install pages (Claude Code / Cursor / Codex / Gemini / Roo / Continue / Windsurf) | EXTEND | We have integration-tutorials.html as one page; split into 7 landing pages | |
| 189 | +| 13 | Prompt snippets as product surface | NEW | "Before changing code, call roam retrieve. After, pipe diff to critique." Easy + sticky | |
| 190 | +| 14 | Project policy rules (YAML) | EXTEND | `roam rules` exists with rule packs; extend to agent-policy domain | |
| 191 | +| 15 | Roam Cloud = "governance for agent-written code" reframe | POSITIONING | Site copy + product description. No code change. | |
| 192 | +| 16 | Killer demo (agent navigates, doesn't just code) | MARKETING | "The agent did not just code. It navigated." | |
| 193 | +| 17 | Rename playful commands (vibe-check → intent-check, weather → churn, dark-matter → hidden-complexity) | POSITIONING | Keep aliases; serious names in docs | |
| 194 | + |
| 195 | +--- |
| 196 | + |
| 197 | +## Proposed integration into the roadmap |
| 198 | + |
| 199 | +Mapping all of this against `dev/BACKLOG.md` + `dev/ROADMAP.md` + |
| 200 | +the monetisation phasing in `monetization_v2_subscription_pivot.md` / |
| 201 | +`build_priorities.md`: |
| 202 | + |
| 203 | +### Phase 0 (free-OSS wedges) — UNCHANGED, validated |
| 204 | +Already on the roadmap, ChatGPT independently agrees: |
| 205 | +- `roam permit` (= recommendation #5/#15) |
| 206 | +- `roam postmortem` |
| 207 | +- `roam ai-governance-check` (renamed from article-12-check) |
| 208 | + |
| 209 | +Worth adding as a fourth Phase-0 freebie: |
| 210 | + |
| 211 | +- `roam next` (= ChatGPT's #1 recommendation) — agent router. Cheap |
| 212 | + to ship, high signal, extends `roam ask`. Strong "this tool is |
| 213 | + agent-first" positioning hook. |
| 214 | + |
| 215 | +### R13 — Agent-OS metadata pass (new round, parallelisable) |
| 216 | +Mechanical sweep across the @_tool registry to add the metadata |
| 217 | +ChatGPT round 1 called out as the actual fix: |
| 218 | +- `phase` per command (before_edit / during_edit / after_edit / |
| 219 | + before_pr / debug / refactor / etc.) |
| 220 | +- `recommended_next_tools` per command |
| 221 | +- `avoid_when` per command |
| 222 | +- `confidence_fields` flag where applicable |
| 223 | + |
| 224 | +Surface in `roam_catalog`. Update `roam_for_<situation>` compounds |
| 225 | +to consume the new metadata. ~2-4 hours of agent work. **Pure |
| 226 | +substrate — no risk of regressions, no test crashes.** |
| 227 | + |
| 228 | +### R14 — Hero-copy A/B + capability-coverage reframe (website only) |
| 229 | +- Hero: "Agents should not edit blind. Roam is their map." |
| 230 | +- Compare-table cell: "Yes (145 capabilities across [9 categories])" |
| 231 | + instead of "Yes (145 tools)" |
| 232 | +- Meta description / og:description: "Agents should not edit blind" |
| 233 | + framing |
| 234 | +- Rename playful commands' aliases in user-facing docs |
| 235 | + (vibe-check → intent-check, weather → churn, |
| 236 | + dark-matter → hidden-complexity) |
| 237 | + |
| 238 | +### R15 — `roam agents-md` + prompt snippets (free-OSS, viral) |
| 239 | +Ship as Phase 0.5. The AGENTS.md generator is the kind of thing |
| 240 | +people share screenshots of. Prompt snippets land on a /agent-prompts |
| 241 | +page as canonical instructions. |
| 242 | + |
| 243 | +### R16 — Agent modes + `roam intent-check` + `roam agent-score` (mid-tier) |
| 244 | +Pairs with `roam permit` to form the "governance for agent-written |
| 245 | +code" enterprise wedge. This is the layer that lets a buyer say |
| 246 | +"Claude Code may operate in safe_edit mode, not migration mode." |
| 247 | + |
| 248 | +### R17 — Roam Cloud as agent-governance dashboard (commercial pivot) |
| 249 | +Reposition Cloud away from "code health dashboard" toward: |
| 250 | + |
| 251 | +- Which agents changed what |
| 252 | +- Which changes ignored Roam warnings |
| 253 | +- PR blast-radius distribution |
| 254 | +- Where agents are repeatedly creating complexity |
| 255 | +- Risky-files-most-modified ranking |
| 256 | + |
| 257 | +Per-team and per-repo trend charts stay; the *headline framing* |
| 258 | +shifts. This is mostly a copy + dashboard-card refresh, not a |
| 259 | +ground-up rebuild. |
| 260 | + |
| 261 | +--- |
| 262 | + |
| 263 | +## The "killer demo" storyline |
| 264 | + |
| 265 | +Per ChatGPT round 2 #16 — worth scripting and recording: |
| 266 | + |
| 267 | +```text |
| 268 | +1. Agent receives task ("add password reset") |
| 269 | +2. Agent calls roam_for_new_feature |
| 270 | + -> Roam returns: relevant files, complexity report, similar code |
| 271 | +3. Agent calls roam_permit "password reset" |
| 272 | + -> Roam returns: allowed=true, safe_zones=[...], required_checks=[...] |
| 273 | +4. Agent edits only the safe-zone files |
| 274 | +5. Agent pipes diff into roam_critique |
| 275 | + -> Roam catches: missed an affected test, branch-complexity warning |
| 276 | +6. Agent fixes, re-runs critique |
| 277 | +7. Agent generates PR summary via roam_pr_analyze |
| 278 | +
|
| 279 | +The agent did not just code. It navigated. |
| 280 | +``` |
| 281 | + |
| 282 | +This is the demo that should land on /demos and be shareable as a |
| 283 | +~60-second video. |
| 284 | + |
| 285 | +--- |
| 286 | + |
| 287 | +## Inventory of what to capture from this round in source-of-truth |
| 288 | + |
| 289 | +To prevent this strategic input from getting lost when the user |
| 290 | +migrates / the next session starts: |
| 291 | + |
| 292 | +1. **This file** — `dev/agent-os-positioning-2026-05-11.md` (committed) |
| 293 | +2. **Memory pointer** — `~/.claude/projects/.../memory/` entry |
| 294 | + pointing here |
| 295 | +3. **MEMORY.md index** — top-of-file entry under "Read first every session" |
| 296 | +4. **dev/BACKLOG.md** — add R13/R14/R15/R16/R17 as queued rounds |
| 297 | + |
| 298 | +--- |
| 299 | + |
| 300 | +## My honest take |
| 301 | + |
| 302 | +ChatGPT's framing genuinely advances our positioning. The core |
| 303 | +insight that crosses both rounds: |
| 304 | + |
| 305 | +> The 211 commands + 145 MCP tools is not bloat — it's a wide |
| 306 | +> capability surface. The fix is per-tool metadata + workflow |
| 307 | +> recipes + agent modes, not pruning. |
| 308 | +
|
| 309 | +We've already built ~80% of what was recommended. The remaining |
| 310 | +20% is high-leverage: |
| 311 | + |
| 312 | +- **`roam next`** + per-tool `phase`/`avoid_when`/`recommended_next_tools` |
| 313 | + metadata is the single most-impactful gap. It turns the |
| 314 | + "agents-need-to-pick-tools" problem into a one-call lookup. |
| 315 | +- **`roam permit` + agent modes** is the enterprise wedge for |
| 316 | + governance. |
| 317 | +- **"Agents should not edit blind. Roam is their map."** is hero copy |
| 318 | + worth shipping on the next website pass. |
| 319 | + |
| 320 | +The strategic direction is consistent with the existing v2 |
| 321 | +monetisation thesis — these are *tactical refinements* of a |
| 322 | +direction we're already going. None of it requires re-pivoting. |
0 commit comments