Skip to content

Commit 48259be

Browse files
committed
rfc: 0001 reframe — honest about asymmetry between designed and noticed concerns
The index now distinguishes "designed" (state coordination, with companion doc) from "noticed, not yet planned" (two concerns surfaced via filtered review of PR siddharthvaddem#350). Audit doc dropped — PR-review observations belong on PR siddharthvaddem#350 itself, not in the RFC.
1 parent c03e360 commit 48259be

2 files changed

Lines changed: 29 additions & 270 deletions

File tree

Lines changed: 29 additions & 121 deletions
Original file line numberDiff line numberDiff line change
@@ -1,155 +1,63 @@
11
---
22
RFC: 0001
3-
Title: Agent architecture v1
3+
Title: Agent architecture: long-term planning
44
Author: enriquefft
55
Status: Draft
66
Created: 2026-04-27
77
Tracking: https://github.com/siddharthvaddem/openscreen/issues/349
88
---
99

10-
# RFC 0001 — Agent architecture v1
10+
# Agent architecture: long-term planning
1111

1212
## Summary
1313

14-
Shift verbs from CLI action handlers (load-mutate-save inline) to pure state transitions with metadata, so any transport — Commander CLI, MCP server, in-process IPC against the running GUI, future SDK — can fan out from one definition. PR #350 establishes the verb vocabulary and the agent-callable thesis; this RFC proposes the shape that lets that vocabulary scale past the eight verbs it ships with.
14+
Long-term architectural concerns exist for the agent surface beyond what PR #350 ships in scope. This RFC proposes a short planning window before the first split-PR lands, to surface and weigh these concerns properly.
1515

16-
## Motivation
16+
## Why now
1717

18-
PR #350 ships eight verbs (`project`, `zoom`, `trim`, `speed`, `annotate`, `shortcuts`, `render`, `gif`) as Commander action handlers that load the project file, mutate it, and save it. The architecture is sound for eight. It will not scale to fifty.
18+
PR #350 establishes the agent-surface foundation. Once verbs start landing on top, retrofit cost grows with verb count. A few days of planning before the first split-PR ships is cheap insurance against compounding decisions.
1919

20-
The roadmap implied by issue #349 and existing GUI features pushes well past eight: keyframe animations, multi-region annotations, audio waveform editing, transitions, captions, layouts. Every one of those is at least one verb category, and most are several. The CLI surface today is already a load-bearing public contract once anyone builds tooling on top of `--json` output.
20+
## Designed
2121

22-
Three forces compound per verb under the current shape:
22+
### State coordination
2323

24-
1. Each new verb gets re-implemented in every transport (CLI today, MCP soon, IPC against the running GUI eventually).
25-
2. Each new verb writes the project file directly, which means each new verb is a new chance to desync against the running GUI (see [state coordination companion](./0001-agent-architecture/state-coordination.md)).
26-
3. Each new verb's argument shape and result envelope are public on first ship, so the ergonomic decisions made today are the ones agents will be coding against tomorrow.
24+
About 12 GUI/CLI desync, race, and attribution scenarios collapse to one decision: state source is currently hardcoded to disk. Editor-as-server + file-as-persistence + operation-log model proposed.
2725

28-
The cost of fixing this *now*, with eight verbs and zero external consumers, is bounded. The cost of fixing it after fifty verbs and a published MCP server is not. This RFC proposes the shape; the [PR #350 audit companion](./0001-agent-architecture/audit-pr350.md) grounds the argument in concrete code.
26+
[Companion doc →](./0001-agent-architecture/state-coordination.md)
2927

30-
## Architectural concerns
28+
## Noticed, not yet planned
3129

32-
Nine concrete shape problems in the current design. Each is a load-bearing decision that will be expensive to revisit later.
30+
These surfaced during a filtered review of PR #350. They are starting points for investigation, not validated designs. Each may become a companion doc, fold into another, or be dropped after deeper review.
3331

34-
### 1. Verbs aren't pure `(state, args) → (state', result)`
32+
### Verbs do load-mutate-save inline; there is no pure transition primitive
3533

36-
`addZoomRegion(path, args)` does load-mutate-save inline (`cli/src/commands/zoom.ts:18-35`). Every verb owns its own filesystem dance. There is no way to:
34+
**Where:** `cli/src/core/region-manager.ts:43-73` (and recurring across the file for six verbs in ~200 lines); `cli/src/core/project-manager.ts:237-263`.
3735

38-
- run a verb in-process against a running GUI's state (no GUI integration without re-implementation),
39-
- replay a verb from a log (no reproducible sessions, no cross-process undo),
40-
- batch verbs atomically (every verb is its own write, see #8),
41-
- test a verb without a real filesystem fixture.
36+
**Why it may compound:** every verb today inlines the same `loadProject` → mutate → `saveProject` triplet. There is no `applyOperation(state, op) → state` primitive verbs compose around. Adding cross-cutting capabilities later — `--dry-run`, batch mode, schema migration on read, transactions across multiple ops, optimistic preview — means editing every verb body. With N verbs that's N sites of retrofit, mechanically.
4237

43-
Pure transitions buy all four for free.
38+
This is distinct from state coordination: even in a single-process world with no GUI running, the verb's internal transition being a side effect rather than a value still compounds with verb count.
4439

45-
### 2. No verb registry
40+
### No unified verb contract across transports
4641

47-
The CLI imports the managers (`project-manager`, `region-manager`, etc.) directly from each command handler. `cli/src/mcp-server.ts` imports neither — its only tool is `openscreen_help`, which serves concatenated skill markdown. The moment MCP needs real action verbs, every verb will be defined twice: once as a Commander handler, once as a tool. Future SDK and IPC are a third and fourth.
42+
**Where:** Tier-1 verbs return objects via `outputSuccess` (`cli/src/commands/zoom.ts:18-35`). Tier-2 verbs spawn Electron and stream NDJSON `__cli` envelopes (`cli/src/core/electron-bridge.ts:111-208`). MCP exposes no verbs today, only a docs lookup (`cli/src/mcp-server.ts:86-102`).
4843

49-
A single `defineVerb({ name, args, returns, mutates, replay, exec })` is the SSOT for fan-out. Transports become thin shells over the registry.
44+
**Why it may compound:** a verb today is whatever its `.action(...)` body happens to do — sync return, async streaming, or subprocess IPC. There is no `VerbDefinition` (input schema, run, result schema, streaming events) that all transports bind to. Per new transport (MCP-as-command-surface, in-process IPC from a running GUI, library API, eventual HTTP), the cost is N verbs × bespoke adapter.
5045

51-
### 3. No transport abstraction
46+
This is distinct from state coordination: even if there were only one writer to one project at a time, the question of whether one invocation of one verb has a portable contract — independent of who invoked it — still compounds with verb count.
5247

53-
`outputJson(data: unknown)` at `cli/src/output.ts:16` plus a module-global `jsonMode` is welded to Commander. The contract between "verb finished" and "user sees a result" is implicit. MCP, SDK, and IPC each have to re-implement parse + dispatch + result shaping. They will diverge.
48+
## Likely missing
5449

55-
### 4. GUI/CLI state coordination is unspecified
50+
Other concerns probably exist that nobody has surfaced yet. The point of the planning window is to find them, not just address what is listed above. Anyone reading PR #350 who notices something architectural that compounds with verb count should raise it here.
5651

57-
There are roughly twelve desync, race, and attribution scenarios that exist or will exist once the CLI ships (see [state coordination companion](./0001-agent-architecture/state-coordination.md) for the full table). They collapse to one decision: the state source is hardcoded to disk. Every verb does `loadProject(path)`, mutates, and writes back. The running GUI is invisible to the CLI and the CLI is invisible to the GUI.
52+
## Proposal
5853

59-
The verb-shape decision and the state-source decision are coupled: `defineVerb` is what makes "state source is pluggable (Disk / RunningEditor / InMemoryTest)" cheap. Without it, every verb hardcodes its source.
54+
A short planning period — a few days — before split-PR-A (the `src/shared/` extraction) lands. Output: expanded companion docs for any noticed items that survive scrutiny, or explicit defer decisions for items that do not warrant a doc yet. The planning is parallel to the split work; nothing about restructuring PR #350 blocks on this.
6055

61-
### 5. `outputJson(data: unknown)` is already the public contract
56+
Architectural decisions identified here can land incrementally — none require the whole edifice in place before they pay rent. State coordination is the most concrete because it already has a companion doc; the others need their own work before any commitment.
6257

63-
The first external script that depends on a current `--json` shape freezes that shape. There is no version tag, no discriminator, no schema. Today's `outputJson(data: unknown)` is tomorrow's "we can't change the result envelope without breaking three integrations."
58+
## References
6459

65-
A `CommandResult<TVerb>` discriminated union (`{ ok: true, verb, data } | { ok: false, verb, error }`), with the `data` shape derived from the verb's `returns` schema, costs almost nothing to introduce *before* there are external consumers. After is a different story.
66-
67-
### 6. No project-file migration runner
68-
69-
`schemaVersion: 1` is declared (`src/shared/project-schema.ts`) without a `migrate(v1 → v2)` framework. When the schema first evolves — adding a field, renaming one, splitting a structure — every existing user file needs to round-trip safely. A migration runner is two functions and a registry; retrofitting it after v2 ships is a coordination problem.
70-
71-
### 7. `useEditorHistory` is React-state-only
72-
73-
The undo/redo stack lives in `src/hooks/useEditorHistory.ts:85` and is consumed only from `VideoEditor.tsx:87`. CLI edits today are completely invisible to it: a CLI edit followed by a GUI undo silently reverts the CLI work, because undo restores the GUI's last in-memory snapshot, which predates the CLI write.
74-
75-
History needs to live at the state layer (where verbs commit), with React as a subscriber. This is a precondition for cross-process undo, which is a precondition for the CLI being safe to use against an open project.
76-
77-
### 8. No batch primitive
78-
79-
Each verb is its own load-mutate-save. A 10-verb keyframe batch is 10 reads, 10 writes, 10 history entries, 10 chances to half-apply on crash. A batch primitive — `executeBatch([verb, verb, verb])` that loads once, mutates n times, writes once, records one history entry — is something verbs *don't get for free* under the current shape.
80-
81-
### 9. No actor in the verb signature
82-
83-
The project file records nothing about who made an edit. Audit, attribution, per-verb permission gates (e.g. "MCP cannot delete projects"), rate limiting — all of these want `actor` as a first-class arg of `executeVerb`, not glued on by transport. Threading it later means touching every handler.
84-
85-
## Detailed design
86-
87-
The shape is a verb registry whose entries are pure transitions plus metadata.
88-
89-
```ts
90-
defineVerb({
91-
name: "zoom.add",
92-
args: ZoomAddArgsSchema, // Zod
93-
returns: ZoomAddResultSchema, // Zod
94-
mutates: true, // false → read-only verb
95-
replay: false, // true → safe to re-execute from log
96-
exec: (state, args, ctx) => {
97-
// pure: returns next state + result, no I/O, no global side effects
98-
return { state: nextState, result };
99-
},
100-
});
101-
```
102-
103-
Four observations collapse out of this shape:
104-
105-
**(a) Verbs are pure transitions.** No filesystem, no IPC, no Electron, no React. Just `(state, args, ctx) → (state', result)`. The same `exec` runs against the file, against the running GUI's in-memory state, against an in-memory test fixture, or against a replayed log entry, with no per-verb conditional code.
106-
107-
**(b) Transports are thin shells.** Commander, MCP, SDK, and IPC each parse their input, look up the verb in the registry, validate args via the verb's Zod schema, call a shared `executeVerb(verb, args, ctx)`, and shape the return as `CommandResult<TVerb>`. None of them re-implement the verb. Adding a transport is a fixed cost, not a per-verb cost.
108-
109-
**(c) State source is pluggable.** `executeVerb` takes a `StateSource` from `ctx`:
110-
111-
- `DiskStateSource` — load file, run exec, save atomically with `editVersion` OCC.
112-
- `RunningEditorStateSource` — connect over Unix socket to the GUI, submit the verb call, get the result back. The GUI's main process runs `exec` against its in-memory state, dispatches the result through `useEditorHistory`, and replies.
113-
- `InMemoryStateSource` — for tests. No filesystem, no IPC.
114-
115-
The choice happens once per process, at startup, by detecting `editor.lock` and matching project paths. Verbs do not see the difference.
116-
117-
**(d) History at the state layer.** The reducer that owns "current state + undo stack" lives one level below React. `useEditorHistory` becomes a subscriber that re-renders on commits. CLI edits, MCP edits, IPC edits all funnel through the same commit path and produce the same history entries. Cross-process undo falls out: when the GUI launches and finds the project's `editVersion` ahead of its last-seen version, it replays log entries since that version into the history stack.
118-
119-
The companion [state coordination doc](./0001-agent-architecture/state-coordination.md) covers the full design for the editor-as-server model, the OCC discipline on the file, and the operation log. The [PR #350 audit](./0001-agent-architecture/audit-pr350.md) grounds the per-verb concerns above in specific code references.
120-
121-
## Drawbacks
122-
123-
**Refactor cost on the eight existing verbs.** Each handler in `cli/src/commands/` has to be split into a pure `exec` and a thin transport shell. Estimated at one to two days for the eight, mostly mechanical.
124-
125-
**A small abstraction layer that agents and CLI both pay for.** `defineVerb` plus `executeVerb` plus `StateSource` plus `CommandResult` is maybe 400 LOC of plumbing. Every verb pays a small indirection cost at call time. For the audience (agents and scripts) the indirection is invisible.
126-
127-
**Possible YAGNI if MCP/SDK ambitions are scaled back.** If the project decides MCP is documentation-only forever, IPC is never needed, and the CLI is the only transport that will ever exist, then `defineVerb` is overkill and direct Commander handlers are fine. Issue #349 reads as the opposite trajectory, which is why this RFC is being written now.
128-
129-
## Alternatives considered
130-
131-
**(a) Keep the current shape and accept duplication across transports.** Cheapest today. Each new verb gets re-implemented in MCP, then again in IPC, then again in SDK. Each duplicate is a chance for behavior drift. Once external scripts depend on `--json` output shapes, those shapes are also frozen across all four transports independently. This is the path the codebase is on by default if no shape decision is made.
132-
133-
**(b) CRDT model (Figma-style).** Server-mediated CRDT means all operations commute and all clients converge. Clean, well-understood. Wildly overkill for a single-user desktop tool where the second writer is an agent the same user invoked. Costs: a server, a CRDT representation of every project node, an ops-not-files persistence model. Buys nothing the proposed model doesn't already buy for our use case. See the prior-art section of the [state coordination doc](./0001-agent-architecture/state-coordination.md).
134-
135-
**(c) Daemon-only model (Emacs-style).** The GUI is always the source of truth; the CLI cannot run standalone, only as a client of a running editor. Conceptually clean — eliminates the dual-source problem entirely. Loses the headless render path and the "edit a file from a Makefile" workflow. The hybrid (editor-as-server when the GUI is running, atomic file writes when it isn't) covers more ground for slightly more complexity. Covered in detail in the companion doc.
136-
137-
## Rollout plan
138-
139-
PR #350 has discussed splitting itself into smaller pieces. The RFC's preferred sequence:
140-
141-
1. **Verb registry foundation.** Add `defineVerb`, `executeVerb`, `StateSource`, `CommandResult`. Migrate two of the eight existing verbs (e.g. `zoom.add`, `zoom.remove`) as proof. No external behavior change.
142-
2. **Migrate the rest of the eight verbs** to the registry. Still no external behavior change.
143-
3. **Wire the MCP server to the registry.** Each verb becomes an MCP tool, namespaced `openscreen.{category}.{verb}`. Long-running verbs (`render`, `gif`) get progress notifications. Skill markdowns move to MCP resources.
144-
4. **Add `editVersion` + atomic write on the GUI side.** Smallest piece of the state coordination work. Closes the two-CLI race and the partial-write surface. No new architecture.
145-
5. **Add the operation log and editor-as-server IPC.** The big unlock — closes the GUI-open-while-CLI-edits surfaces. Builds on (1)–(4).
146-
147-
**Gate before the next verb category lands:** the next major verb category (annotations expansion, keyframes, animations) does not start until at least PRs (1) and (2) are in. The architectural cost of new verbs under the new shape is bounded; under the current shape it compounds.
148-
149-
Each step is independently shippable and the verb count is small enough that the migration is bounded. None of the steps require the whole edifice to be in place before they pay rent.
150-
151-
## Companion docs
152-
153-
- **[State coordination](./0001-agent-architecture/state-coordination.md)** — full design for the twelve GUI/CLI desync surfaces, the editor-as-server model, file-as-persistence with optimistic concurrency, and the operation log. The deep dive behind concern #4.
154-
155-
- **[Audit: PR #350](./0001-agent-architecture/audit-pr350.md)** — current-state findings from the PR review. Concrete code references that motivate this RFC's claims; not a personal review, evidence base.
60+
- [State coordination](./0001-agent-architecture/state-coordination.md) — companion doc.
61+
- PR #350`feat: CLI and Agent Interface for OpenScreen`.
62+
- Issue #349 — umbrella feature request.
63+
- [RFC convention](./README.md).

0 commit comments

Comments
 (0)