Skip to content

Commit 74a8e8b

Browse files
committed
Harden env storage and add Codex URL resolution
1 parent bfdd5b1 commit 74a8e8b

20 files changed

Lines changed: 6008 additions & 258 deletions
Lines changed: 389 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,389 @@
1+
# Codex-First Control Plane Roadmap
2+
3+
This document proposes how Flow should evolve from "helpful CLI + local skills"
4+
into a Codex-first control plane where the user stays inside Codex and Flow
5+
handles routing, memory, execution, and learning behind the scenes.
6+
7+
## Goal
8+
9+
Target state:
10+
11+
- the user speaks natural intent in Codex
12+
- Flow resolves references, routes workflows, fetches secure context, and runs
13+
the right tool/task
14+
- Codex sees only the smallest useful context for the current turn
15+
- repeated phrasing becomes reusable system knowledge without turning every repo
16+
preamble into a wall of rules
17+
18+
Example desired behavior:
19+
20+
- `document it` resolves to the docs write flow
21+
- a pasted Linear URL is unrolled before planning
22+
- `continue the last deploy investigation` finds the right session/worktree
23+
- the user does not need to remember `forge doc`, `forge linear inspect`, or
24+
repo-specific wrappers
25+
26+
## Problem
27+
28+
Current Flow has strong building blocks but they are still separate:
29+
30+
- task skills are generated and reloaded for Codex
31+
- sessions are stored and recoverable
32+
- env storage is becoming secure enough for org use
33+
- router telemetry already exists
34+
- repo-specific systems like Forge can mine aliases and inject lean workflow
35+
rules
36+
37+
But the user still pays too much cognitive cost:
38+
39+
- wrappers like `L` and repo-specific launchers carry logic outside Flow
40+
- repo preambles grow whenever a new shortcut is taught
41+
- skill learning is mostly manual
42+
- URL/reference unrolling is repo-specific instead of generic
43+
- Codex app-server connections are process-per-query in some paths
44+
45+
The result is "good pieces, weak control plane".
46+
47+
## Design Principles
48+
49+
1. Flow is the control plane; repo tools remain domain executors.
50+
2. Skills stay thin; runtime resolution carries the real behavior.
51+
3. Reference unrolling is deterministic first, model-assisted only if needed.
52+
4. Learning produces suggestions, not prompt bloat.
53+
5. No default context should be paid for behavior that is not active.
54+
55+
## Existing Flow Building Blocks
56+
57+
- task-synced Codex skill metadata in [src/skills.rs](/Users/nikitavoloboev/code/flow/src/skills.rs#L378) and [src/skills.rs](/Users/nikitavoloboev/code/flow/src/skills.rs#L443)
58+
- Codex skill cache reload in [src/skills.rs](/Users/nikitavoloboev/code/flow/src/skills.rs#L1224)
59+
- configurable Codex wrapper transport in [src/commit.rs](/Users/nikitavoloboev/code/flow/src/commit.rs#L5414)
60+
- multi-provider session recovery and copy flows in [src/ai.rs](/Users/nikitavoloboev/code/flow/src/ai.rs#L1)
61+
- router telemetry hooks in [src/rl_signals.rs](/Users/nikitavoloboev/code/flow/src/rl_signals.rs#L307)
62+
- current Codex session resolver direction in [codex-openai-session-resolver.md](/Users/nikitavoloboev/code/flow/docs/codex-openai-session-resolver.md#L1)
63+
64+
These are enough to start. The missing work is unification.
65+
66+
## Proposed Architecture
67+
68+
### 1. `codexd`: long-lived Codex control daemon
69+
70+
Add a Flow-managed daemon, either as an extension of `ai-taskd` or as a focused
71+
`codexd`, with one warm `codex app-server` connection per repo.
72+
73+
Responsibilities:
74+
75+
- maintain repo-scoped Codex app-server sessions
76+
- cache recent threads, active skills, and repo metadata
77+
- expose fast local RPC for lookup, runtime-skill injection, and doctor output
78+
- resolve references before they reach Codex as plain text
79+
- own the "what extra context is actually needed for this turn?" decision
80+
81+
This should absorb behavior that currently lives in wrappers like `L`.
82+
83+
### 2. Intent registry
84+
85+
Promote Forge-style phrase aliasing into Flow as a generic feature.
86+
87+
Each intent has:
88+
89+
- canonical name
90+
- phrase aliases
91+
- optional repo/path scope
92+
- resolver/action target
93+
- confidence policy
94+
- evidence counters for suggested future aliases
95+
96+
Examples:
97+
98+
- `doc-it`
99+
- `linear-reference`
100+
- `session-recover`
101+
- `review-intent-comment`
102+
103+
Intent matching must stay deterministic and cheap.
104+
105+
### 3. Reference resolvers
106+
107+
Flow should ship a generic resolver layer for pasted references:
108+
109+
- Linear issue URLs
110+
- Linear project URLs
111+
- GitHub PR / issue URLs
112+
- repo file paths
113+
- commit SHAs
114+
- saved Flow session names or IDs
115+
116+
Resolvers return structured payloads, not prose. Repo-local executors like
117+
Forge can register resolver commands for domain-specific expansion.
118+
119+
### 4. Runtime skills
120+
121+
Split Codex knowledge into two layers:
122+
123+
- baseline skills: always available, minimal repo guidance
124+
- runtime skills: ephemeral, injected only when a matched intent or resolver
125+
requires them
126+
127+
Examples:
128+
129+
- user says `document it`
130+
- inject tiny docs-routing runtime skill
131+
- user pastes a Linear URL
132+
- inject tiny linear-unrolled runtime context
133+
- user asks to recover recent work
134+
- inject session-recovery runtime context only for that request
135+
136+
Runtime skills should expire automatically and be bounded by a strict budget.
137+
138+
### 5. Suggestion loop, not self-bloating memory
139+
140+
Use router telemetry plus transcript mining to propose:
141+
142+
- new aliases
143+
- new reference patterns
144+
- candidate runtime skills
145+
- stale skills that should be removed
146+
147+
Important:
148+
149+
- do not auto-install every observed phrase
150+
- require evidence thresholds
151+
- prefer suggested changes that collapse multiple variants into one canonical
152+
intent
153+
154+
## Flow Commands
155+
156+
Add a small command family around the new control plane:
157+
158+
```bash
159+
f codex open [query]
160+
f codex resolve "<text-or-url>" [--json]
161+
f codex runtime
162+
f codex runtime show
163+
f codex runtime clear
164+
f codex teach suggest
165+
f codex teach accept <intent-or-suggestion-id>
166+
f codex teach reject <intent-or-suggestion-id>
167+
f codex doctor
168+
f codexd start|stop|status
169+
```
170+
171+
Intended behavior:
172+
173+
- `f codex open` replaces personal wrappers like `L`
174+
- `f codex resolve` shows what Flow would unroll or route before Codex sees it
175+
- `f codex runtime show` explains which runtime skills/context are active
176+
- `f codex teach suggest` presents evidence-backed alias/intent suggestions
177+
- `f codex doctor` exposes repo path, active app-server connection, runtime
178+
budget, skill count, and recent resolver hits
179+
180+
## Config Shape
181+
182+
Proposed `flow.toml` additions:
183+
184+
```toml
185+
[codex]
186+
control_plane = "daemon"
187+
warm_app_server = true
188+
runtime_skill_budget_chars = 1200
189+
auto_resolve_references = true
190+
auto_learn = "suggest-only"
191+
192+
[codex.session]
193+
open_command = "codex"
194+
prefer_last_active = true
195+
repo_scoped_lookup = true
196+
197+
[[codex.intent]]
198+
name = "doc-it"
199+
phrases = ["doc it", "document it", "write this down", "save this in docs"]
200+
resolver = "docs.route_write"
201+
scope = ["repo", "personal"]
202+
203+
[[codex.intent]]
204+
name = "session-recover"
205+
phrases = ["what was i doing", "recover recent context", "continue the work"]
206+
resolver = "session.recover"
207+
208+
[[codex.reference_resolver]]
209+
name = "linear"
210+
match = ["https://linear.app/*/issue/*", "https://linear.app/*/project/*"]
211+
command = "forge linear inspect {{ref}} --json"
212+
inject_as = "linear"
213+
214+
[[codex.reference_resolver]]
215+
name = "docs"
216+
match = ["doc it", "document it"]
217+
command = "forge doc route --title {{title}} --json"
218+
inject_as = "docs"
219+
```
220+
221+
Also add a personal/global config file for user-specific phrase preferences:
222+
223+
- `~/.config/flow/codex-intents.toml`
224+
225+
Use this for personal language variants that should not live in repo config.
226+
227+
## Daemon Responsibilities
228+
229+
`codexd` should own:
230+
231+
- app-server lifecycle
232+
- repo session caches
233+
- runtime skill activation/deactivation
234+
- resolver execution
235+
- secure env lookups for active workflows
236+
- bounded prompt-context assembly
237+
- suggestion generation from telemetry/history
238+
- compatibility with existing `f skills reload` and `f ai codex ...` flows
239+
240+
It should not:
241+
242+
- replace repo-specific executors like Forge
243+
- run opaque model-based routing in the hot path
244+
- inject large transcript summaries into every turn
245+
246+
## Prompt Budget Policy
247+
248+
The runtime layer needs hard limits:
249+
250+
- baseline repo guidance stays small
251+
- runtime additions must fit a bounded char/token budget
252+
- each resolved intent/reference should justify its own inclusion
253+
- unused runtime skills expire quickly
254+
255+
Budget policy should prefer:
256+
257+
1. structured resolver output
258+
2. one tiny runtime skill
259+
3. one short recovery summary
260+
4. nothing else
261+
262+
## Learning Loop
263+
264+
Inputs:
265+
266+
- router telemetry
267+
- accepted/overridden task choices
268+
- resolver hits
269+
- successful tool invocations
270+
- session transcript mining
271+
272+
Outputs:
273+
274+
- proposed alias additions
275+
- proposed resolver registrations
276+
- dead-skill cleanup suggestions
277+
- better default repo baselines
278+
279+
Approval model:
280+
281+
- repo suggestions require explicit accept
282+
- personal suggestions can default to personal scope
283+
- org/shared suggestions should stay gated
284+
285+
## Relationship To Forge
286+
287+
Forge should remain the Prom executor for Prom-specific workflows.
288+
289+
Flow should absorb the generic pieces Forge proved useful:
290+
291+
- intent aliasing
292+
- reference unrolling
293+
- thin runtime teaching
294+
- lean docs workflow activation
295+
296+
That means:
297+
298+
- Prom keeps `forge linear inspect`, `forge doc`, and similar domain commands
299+
- Flow becomes the generic router that decides when to call them
300+
301+
## Rollout Phases
302+
303+
### Phase 0: unify wrappers
304+
305+
- move `L`-style session open/recover behavior into `f codex open`
306+
- make repo-scoped Codex session resolution first-class
307+
- expose a `doctor` view for current skill/runtime state
308+
309+
### Phase 1: warm daemon
310+
311+
- add `codexd` with persistent app-server connection per repo
312+
- keep recent thread cache and skills cache warm
313+
- remove process-per-query overhead for session lookup/reload paths
314+
315+
### Phase 2: intent registry + resolvers
316+
317+
- add config-backed intent aliases
318+
- add generic reference resolver interface
319+
- ship built-ins for session recovery, docs routing, and Linear URLs
320+
321+
### Phase 3: runtime skills
322+
323+
- inject temporary runtime skills/context instead of growing repo preambles
324+
- enforce runtime budget caps
325+
- surface active runtime state in `f codex runtime show`
326+
327+
### Phase 4: learning loop
328+
329+
- mine telemetry + sessions for candidate aliases and resolver patterns
330+
- generate suggestions only after evidence thresholds
331+
- add accept/reject workflow
332+
333+
### Phase 5: provider expansion
334+
335+
- reuse the same intent/resolver plane for Claude and Cursor transcript-backed
336+
workflows where useful
337+
- keep Codex as the first-class interactive target
338+
339+
## First Implementation Slice
340+
341+
The highest-value first slice is:
342+
343+
1. `f codex open`
344+
2. `codexd` with warm repo-scoped app-server
345+
3. `f codex resolve`
346+
4. config-backed intents
347+
5. built-in resolvers for:
348+
- docs intents
349+
- Linear URLs
350+
- session recovery prompts
351+
6. `f codex runtime show`
352+
353+
Why this first:
354+
355+
- it removes the most command-memory burden immediately
356+
- it uses Flow’s existing app-server + skills + session foundations
357+
- it keeps the prompt surface thin
358+
- it gives a concrete place to move personal wrapper logic
359+
360+
## Success Metrics
361+
362+
- p50 `f codex open` latency
363+
- number of user prompts that required remembering a repo command
364+
- average runtime-context bytes injected per turn
365+
- resolver hit rate
366+
- accepted suggestion rate
367+
- count of active baseline skills versus runtime skills
368+
369+
## Non-Goals
370+
371+
- full semantic agent routing in the hot path
372+
- unbounded transcript mining into prompt context
373+
- replacing repo executors with Flow clones
374+
- auto-learning every phrase without evidence or approval
375+
376+
## Summary
377+
378+
The target system is not "more AGENTS text" and not "more commands for the
379+
user to remember".
380+
381+
It is:
382+
383+
- thin baseline repo guidance
384+
- a warm Flow Codex control daemon
385+
- deterministic intent/reference resolution
386+
- ephemeral runtime skills
387+
- evidence-backed learning with approval
388+
389+
That is how Flow becomes truly Codex-first while keeping context cost low.

0 commit comments

Comments
 (0)