Skip to content

Commit bc09161

Browse files
committed
dev: capture agent-OS positioning notes (external strategic input)
Two-round positioning audit from ChatGPT validates the v2 monetisation thesis ("agent OS for codebases") and surfaces 4 high-leverage missing primitives plus a hero-copy candidate. * dev/agent-os-positioning-2026-05-11.md — full capture (220 lines): - Executive thesis from both rounds - Inventory of what we already have (~80% of the wishlist) - The 4 actual gaps: roam next, per-tool metadata, roam permit (already on roadmap), roam agents-md - 14 secondary recommendations mapped against codebase - Hero copy candidate: "Agents should not edit blind. Roam is their map." - 5 proposed roadmap rounds (R13-R17) * dev/BACKLOG.md — R13-R17 queued with risk levels. Companion writes (auto-memory): * memory/agent_os_positioning_2026_05_11.md — pointer + TL;DR * MEMORY.md — index updated with new ⭐⭐⭐ entry
1 parent feeafac commit bc09161

2 files changed

Lines changed: 342 additions & 0 deletions

File tree

dev/BACKLOG.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,26 @@ parallelisable) per the user's stated direction.
4444

4545
---
4646

47+
## R13–R17 — agent-OS positioning rounds (2026-05-11)
48+
49+
External strategic input from a ChatGPT positioning audit (full
50+
capture: `dev/agent-os-positioning-2026-05-11.md`). Five rounds queued
51+
in dependency order:
52+
53+
| Round | What | Risk |
54+
|---|---|---|
55+
| **R13** | Agent-OS metadata pass — add `phase`, `recommended_next_tools`, `avoid_when`, `confidence_fields` to `_TOOL_METADATA` for every @_tool. Surface in `roam_catalog`. | LOW — pure substrate |
56+
| **R14** | Hero-copy A/B ("Agents should not edit blind. Roam is their map.") + capability-coverage reframe ("145 capabilities across 9 categories") + rename playful commands' aliases (vibe-check → intent-check, weather → churn, dark-matter → hidden-complexity) | LOW — website only |
57+
| **R15** | `roam agents-md` + `roam next` (agent router) + prompt snippets product surface. Free-OSS, viral. | MED — new commands |
58+
| **R16** | Agent modes (read_only/safe_edit/migration/autonomous_pr) + `roam intent-check` + `roam agent-score`. Pairs with `roam permit`. | MED |
59+
| **R17** | Reposition Roam Cloud as "governance for agent-written code" — Cloud dashboard cards for which-agents-changed-what, blast-radius distribution, ignored-warning trail. | MED — copy + dashboard |
60+
61+
R13 is the highest-leverage low-risk item. R14 + R15 are
62+
parallelisable with R13. R16/R17 are mid-term, paired with the
63+
monetisation Phase-0 work.
64+
65+
---
66+
4767
## Next pickup — pick from ROADMAP
4868

4969
When this queue clears (it has), pull from `ROADMAP.md` in this order:
Lines changed: 322 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,322 @@
1+
# Agent-OS positioning notes — 2026-05-11
2+
3+
External strategic input from ChatGPT, in two rounds. Captured here
4+
because the synthesis is high-signal and crosses several roadmap lanes
5+
at once. Read alongside `internal/strategy/` and the
6+
`monetization_v2_subscription_pivot.md` memory file.
7+
8+
---
9+
10+
## Executive thesis (both rounds combined)
11+
12+
> "Roam should become the thing an agent consults before every meaningful coding decision."
13+
14+
Not "another PR reviewer." Not "AI coding IDE." A new category:
15+
16+
- **Agent-side framing**: a *runtime / context protocol for codebase-aware agents*
17+
- **Buyer-side framing**: *governance for agent-written code*
18+
19+
This is more interesting than competing with CodeRabbit / Greptile / Qodo
20+
on semantic-review territory — those reviewers stay in the
21+
human-readable diff-prose lane. Roam stays in the structural-graph lane
22+
and runs *alongside*, before/during/after, not instead.
23+
24+
The 211 commands + 145 MCP tools surface is **not bloat** if the
25+
capability metadata is good enough that an agent never has to guess
26+
which tool to call. ChatGPT's framing:
27+
28+
> Expose every meaningful capability, but make the capability graph
29+
> legible, typed, composable, and safe.
30+
31+
---
32+
33+
## Hero copy candidates
34+
35+
The standout line from ChatGPT round 1:
36+
37+
> **"Agents should not edit blind. Roam is their map."**
38+
39+
That's sharper than the current home-page hero
40+
("Coding agents can write code. Roam is the structural intelligence
41+
they don't have."). Worth testing as a copy A/B.
42+
43+
Other candidates from round 2:
44+
45+
- *"Roam is the structural context layer for AI coding agents."*
46+
- *"Code agents need more than prompts. They need repo intelligence."*
47+
48+
The framing-softener for the "no human reads code anymore" line, which
49+
is emotionally true for vibe-coding but alienates enterprise buyers:
50+
51+
> *"As agents write more code, humans increasingly review intent and
52+
> outcomes, not every line. Roam gives agents the structural context
53+
> they need to operate safely between those human checkpoints."*
54+
55+
(Already aligned with the existing positioning per
56+
`product_positioning_2026_05_09.md` memory.)
57+
58+
---
59+
60+
## What we already have (the ~80% baseline)
61+
62+
Cross-checked against the codebase as of commit `feeafac`:
63+
64+
### Capability discovery
65+
- `roam surface` — canonical capability registry as JSON or text
66+
- `roam_catalog` MCP tool — agent-readable surface
67+
- `roam ask` — TF-IDF intent dispatcher over a 24-recipe registry
68+
- `roam_complete` — FTS5 prefix completion (cheaper than search)
69+
- `roam --help` Start-here panel: 5 verbs (init / understand / context / preflight / critique / ask)
70+
71+
### Capability metadata (per @_tool)
72+
- `_TOOL_METADATA[name]`: `core`, `read_only`, `destructive`, `version`, `when_to_use`, `examples`
73+
- MCP annotations: `readOnlyHint`, `destructiveHint`, `idempotentHint`, `openWorldHint`
74+
- `taskSupport` hint (optional / required / forbidden)
75+
- `deferLoading` annotation for context-reduction via Tool Search
76+
- `_DESTRUCTIVE_TOOLS`, `_NON_READ_ONLY_TOOLS`, `_CACHEABLE_COMMANDS` sets
77+
78+
### Workflow recipes (the "canonical loops" ChatGPT recommends)
79+
- `roam_for_new_feature` — understand + search + context + complexity
80+
- `roam_for_bug_fix` — diagnose + tests + diff + context
81+
- `roam_for_refactor` — preflight + impact + complexity + clones
82+
- `roam_for_security_review` — taint + vuln + critique + adversarial
83+
- Plus `roam_explore`, `roam_diagnose_issue` compounds with `_COMPOUND_WORKFLOW_RECIPES` metadata
84+
85+
### Safety contracts
86+
- Soft contract enforcement on `roam_mutate` (R7) — `contract_check`
87+
checks for prior `roam_simulate` and surfaces `contract_compliance` advice
88+
- Stale-index affordance on every read tool
89+
- Structured errors with `retryable`, `doc_link`, `suggested_action`,
90+
`severity` (R9 fix kept these in trim mode too)
91+
- Exit codes: `EXIT_USAGE`, `EXIT_INDEX_MISSING`, `EXIT_INDEX_STALE`,
92+
`EXIT_GATE_FAILURE`, `EXIT_PARTIAL`
93+
- `agent_contract` block on every JSON envelope — derived
94+
`{facts, risks, next_commands, confidence}` (R7.S14)
95+
96+
### Per-agent install
97+
- `roam mcp-setup --write` for 6 platforms: claude-code, cursor,
98+
windsurf, vscode (Copilot Agent Mode), gemini-cli, codex-cli
99+
- Integration tutorials at `templates/distribution/landing-page/docs/integration-tutorials.html`
100+
101+
### Output efficiency
102+
- `--json` mode across all commands
103+
- Token-budget cap via `_apply_budget` + `--budget` flag
104+
- Reference-based handles for >50KB envelopes (R8.E8) —
105+
content-addressed, agent fetches via `roam_fetch_handle`
106+
- `summarize=True` default for compound tools when `ROAM_AI_ENABLED=1`
107+
(uses MCP sampling to compress to ~1-2KB prose)
108+
109+
### Eval substrate
110+
- `roam eval-retrieve` with `--emit-format coderag|beir` JSONL emit
111+
- Bench harness at `eval/harness.py`
112+
113+
---
114+
115+
## The actual gaps (what's missing)
116+
117+
ChatGPT's "build these four next" priority, with my read:
118+
119+
### 1. `roam next` — agent router / front door (NEW)
120+
> Even agents can get tool-choice fatigue. Add a high-level command
121+
> like `roam next "I need to refactor auth middleware"` returning
122+
> `{recommended_tools, reason, avoid, required_order}`.
123+
124+
We have `roam ask` which is TF-IDF intent → recipe dispatch, but it
125+
doesn't expose `avoid` or `required_order` as machine-readable
126+
fields. Closest existing primitive — `roam_for_<situation>` family,
127+
which is recipe-keyed not free-form-task-keyed.
128+
129+
**Estimated work**: extend `cmd_ask` to emit a machine-readable
130+
`recommended_tools` block, OR ship `cmd_next` as a thinner front
131+
door over the same registry. Plus per-tool `phase` / `avoid_when` /
132+
`recommended_next_tools` metadata in `_TOOL_METADATA`. ~1-2 days.
133+
134+
### 2. `roam plan` — task-to-execution-path generator (EXTEND)
135+
> Output: likely files, risks, suggested sequence (1-8 steps).
136+
> Not "generate code" — give the agent a safe execution path.
137+
138+
`roam plan` exists already (in the curated-32 help-template list)
139+
but is more refactor-flavoured. ChatGPT's `roam plan "Add password
140+
reset flow"` is a *task* planner, not a refactor planner. Adjacent
141+
but distinct.
142+
143+
**Estimated work**: new sub-mode or new command (`roam task-plan`?).
144+
Needs a small LLM-callable scaffolder OR rule-based heuristics over
145+
the structural graph. ~3-5 days.
146+
147+
### 3. `roam permit` — permission model (NEW, on Phase-0 monetisation roadmap)
148+
> `{allowed, safe_zones, forbidden_zones, requires_preflight, required_checks}`
149+
150+
Already queued in `build_priorities.md` as a Phase-0 freebie. ChatGPT's
151+
proposal matches what we'd build. **High priority, high payoff**
152+
this is the wedge into "governance for agent-written code."
153+
154+
**Estimated work**: 2-3 days for MVP. YAML policy file +
155+
`roam permit "<task>"` CLI + MCP tool. Pairs with the agent-modes
156+
concept (#6 below).
157+
158+
### 4. `roam intent-check` — diff vs stated task (NEW or EXTEND)
159+
> Compares the stated task to the diff. "Diff partially matches.
160+
> Expected: X. Unexpected: Y." Very valuable for vibe-coding.
161+
162+
`roam intent` exists in the curated-32 list — need to verify what it
163+
does today. May overlap with this. If it doesn't, this is a clear
164+
gap. The existing `roam critique` does *clone-not-edited* checks but
165+
not stated-intent-vs-diff comparison.
166+
167+
**Estimated work**: ~2-3 days. Needs a small LLM call (or a
168+
rule-based heuristic over diff vs task description). Could be a
169+
mode of `roam critique`.
170+
171+
---
172+
173+
## Secondary recommendations from round 2 (14 items)
174+
175+
In rough priority order, with my reads:
176+
177+
| # | Recommendation | Status | Notes |
178+
|---|---|---|---|
179+
| 3 | `roam agents-md` (generate AGENTS.md) | NEW | Project-level agent manual. ~1-2 days. Easy + viral. |
180+
| 4 | Repo-local agent memory (`.roam/memory`) | NEW | Distinct from MCP session memory. Small concept, big lever. |
181+
| 5 | Permission model | (= #3 above) | Already on Phase 0 |
182+
| 6 | Agent modes (read_only / safe_edit / migration / autonomous_pr) | NEW | Pairs with `roam permit`. Strong differentiator. |
183+
| 7 | `--format compact-json` | EXTEND | We have `--json` + budget; add a `compact-json` mode that strips prose |
184+
| 8 | `roam compress-context "<task>" --budget N` | EXTEND | `roam retrieve` is partially this; add explicit budget framing + compression |
185+
| 9 | Benchmark / eval harness with agent-vs-no-agent comparisons | EXTEND | `roam eval-retrieve` exists; agent-eval is the marketing-gold version |
186+
| 10 | `roam agent-score` (`git diff \| roam agent-score`) | NEW | Scorecard for agent runs. Extends `critique`. |
187+
| 11 | `roam intent-check` | (= #4 above) | |
188+
| 12 | Per-agent install pages (Claude Code / Cursor / Codex / Gemini / Roo / Continue / Windsurf) | EXTEND | We have integration-tutorials.html as one page; split into 7 landing pages |
189+
| 13 | Prompt snippets as product surface | NEW | "Before changing code, call roam retrieve. After, pipe diff to critique." Easy + sticky |
190+
| 14 | Project policy rules (YAML) | EXTEND | `roam rules` exists with rule packs; extend to agent-policy domain |
191+
| 15 | Roam Cloud = "governance for agent-written code" reframe | POSITIONING | Site copy + product description. No code change. |
192+
| 16 | Killer demo (agent navigates, doesn't just code) | MARKETING | "The agent did not just code. It navigated." |
193+
| 17 | Rename playful commands (vibe-check → intent-check, weather → churn, dark-matter → hidden-complexity) | POSITIONING | Keep aliases; serious names in docs |
194+
195+
---
196+
197+
## Proposed integration into the roadmap
198+
199+
Mapping all of this against `dev/BACKLOG.md` + `dev/ROADMAP.md` +
200+
the monetisation phasing in `monetization_v2_subscription_pivot.md` /
201+
`build_priorities.md`:
202+
203+
### Phase 0 (free-OSS wedges) — UNCHANGED, validated
204+
Already on the roadmap, ChatGPT independently agrees:
205+
- `roam permit` (= recommendation #5/#15)
206+
- `roam postmortem`
207+
- `roam ai-governance-check` (renamed from article-12-check)
208+
209+
Worth adding as a fourth Phase-0 freebie:
210+
211+
- `roam next` (= ChatGPT's #1 recommendation) — agent router. Cheap
212+
to ship, high signal, extends `roam ask`. Strong "this tool is
213+
agent-first" positioning hook.
214+
215+
### R13 — Agent-OS metadata pass (new round, parallelisable)
216+
Mechanical sweep across the @_tool registry to add the metadata
217+
ChatGPT round 1 called out as the actual fix:
218+
- `phase` per command (before_edit / during_edit / after_edit /
219+
before_pr / debug / refactor / etc.)
220+
- `recommended_next_tools` per command
221+
- `avoid_when` per command
222+
- `confidence_fields` flag where applicable
223+
224+
Surface in `roam_catalog`. Update `roam_for_<situation>` compounds
225+
to consume the new metadata. ~2-4 hours of agent work. **Pure
226+
substrate — no risk of regressions, no test crashes.**
227+
228+
### R14 — Hero-copy A/B + capability-coverage reframe (website only)
229+
- Hero: "Agents should not edit blind. Roam is their map."
230+
- Compare-table cell: "Yes (145 capabilities across [9 categories])"
231+
instead of "Yes (145 tools)"
232+
- Meta description / og:description: "Agents should not edit blind"
233+
framing
234+
- Rename playful commands' aliases in user-facing docs
235+
(vibe-check → intent-check, weather → churn,
236+
dark-matter → hidden-complexity)
237+
238+
### R15 — `roam agents-md` + prompt snippets (free-OSS, viral)
239+
Ship as Phase 0.5. The AGENTS.md generator is the kind of thing
240+
people share screenshots of. Prompt snippets land on a /agent-prompts
241+
page as canonical instructions.
242+
243+
### R16 — Agent modes + `roam intent-check` + `roam agent-score` (mid-tier)
244+
Pairs with `roam permit` to form the "governance for agent-written
245+
code" enterprise wedge. This is the layer that lets a buyer say
246+
"Claude Code may operate in safe_edit mode, not migration mode."
247+
248+
### R17 — Roam Cloud as agent-governance dashboard (commercial pivot)
249+
Reposition Cloud away from "code health dashboard" toward:
250+
251+
- Which agents changed what
252+
- Which changes ignored Roam warnings
253+
- PR blast-radius distribution
254+
- Where agents are repeatedly creating complexity
255+
- Risky-files-most-modified ranking
256+
257+
Per-team and per-repo trend charts stay; the *headline framing*
258+
shifts. This is mostly a copy + dashboard-card refresh, not a
259+
ground-up rebuild.
260+
261+
---
262+
263+
## The "killer demo" storyline
264+
265+
Per ChatGPT round 2 #16 — worth scripting and recording:
266+
267+
```text
268+
1. Agent receives task ("add password reset")
269+
2. Agent calls roam_for_new_feature
270+
-> Roam returns: relevant files, complexity report, similar code
271+
3. Agent calls roam_permit "password reset"
272+
-> Roam returns: allowed=true, safe_zones=[...], required_checks=[...]
273+
4. Agent edits only the safe-zone files
274+
5. Agent pipes diff into roam_critique
275+
-> Roam catches: missed an affected test, branch-complexity warning
276+
6. Agent fixes, re-runs critique
277+
7. Agent generates PR summary via roam_pr_analyze
278+
279+
The agent did not just code. It navigated.
280+
```
281+
282+
This is the demo that should land on /demos and be shareable as a
283+
~60-second video.
284+
285+
---
286+
287+
## Inventory of what to capture from this round in source-of-truth
288+
289+
To prevent this strategic input from getting lost when the user
290+
migrates / the next session starts:
291+
292+
1. **This file**`dev/agent-os-positioning-2026-05-11.md` (committed)
293+
2. **Memory pointer**`~/.claude/projects/.../memory/` entry
294+
pointing here
295+
3. **MEMORY.md index** — top-of-file entry under "Read first every session"
296+
4. **dev/BACKLOG.md** — add R13/R14/R15/R16/R17 as queued rounds
297+
298+
---
299+
300+
## My honest take
301+
302+
ChatGPT's framing genuinely advances our positioning. The core
303+
insight that crosses both rounds:
304+
305+
> The 211 commands + 145 MCP tools is not bloat — it's a wide
306+
> capability surface. The fix is per-tool metadata + workflow
307+
> recipes + agent modes, not pruning.
308+
309+
We've already built ~80% of what was recommended. The remaining
310+
20% is high-leverage:
311+
312+
- **`roam next`** + per-tool `phase`/`avoid_when`/`recommended_next_tools`
313+
metadata is the single most-impactful gap. It turns the
314+
"agents-need-to-pick-tools" problem into a one-call lookup.
315+
- **`roam permit` + agent modes** is the enterprise wedge for
316+
governance.
317+
- **"Agents should not edit blind. Roam is their map."** is hero copy
318+
worth shipping on the next website pass.
319+
320+
The strategic direction is consistent with the existing v2
321+
monetisation thesis — these are *tactical refinements* of a
322+
direction we're already going. None of it requires re-pivoting.

0 commit comments

Comments
 (0)