|
| 1 | +# Codex-First Control Plane Roadmap |
| 2 | + |
| 3 | +This document proposes how Flow should evolve from "helpful CLI + local skills" |
| 4 | +into a Codex-first control plane where the user stays inside Codex and Flow |
| 5 | +handles routing, memory, execution, and learning behind the scenes. |
| 6 | + |
| 7 | +## Goal |
| 8 | + |
| 9 | +Target state: |
| 10 | + |
| 11 | +- the user speaks natural intent in Codex |
| 12 | +- Flow resolves references, routes workflows, fetches secure context, and runs |
| 13 | + the right tool/task |
| 14 | +- Codex sees only the smallest useful context for the current turn |
| 15 | +- repeated phrasing becomes reusable system knowledge without turning every repo |
| 16 | + preamble into a wall of rules |
| 17 | + |
| 18 | +Example desired behavior: |
| 19 | + |
| 20 | +- `document it` resolves to the docs write flow |
| 21 | +- a pasted Linear URL is unrolled before planning |
| 22 | +- `continue the last deploy investigation` finds the right session/worktree |
| 23 | +- the user does not need to remember `forge doc`, `forge linear inspect`, or |
| 24 | + repo-specific wrappers |
| 25 | + |
| 26 | +## Problem |
| 27 | + |
| 28 | +Current Flow has strong building blocks but they are still separate: |
| 29 | + |
| 30 | +- task skills are generated and reloaded for Codex |
| 31 | +- sessions are stored and recoverable |
| 32 | +- env storage is becoming secure enough for org use |
| 33 | +- router telemetry already exists |
| 34 | +- repo-specific systems like Forge can mine aliases and inject lean workflow |
| 35 | + rules |
| 36 | + |
| 37 | +But the user still pays too much cognitive cost: |
| 38 | + |
| 39 | +- wrappers like `L` and repo-specific launchers carry logic outside Flow |
| 40 | +- repo preambles grow whenever a new shortcut is taught |
| 41 | +- skill learning is mostly manual |
| 42 | +- URL/reference unrolling is repo-specific instead of generic |
| 43 | +- Codex app-server connections are process-per-query in some paths |
| 44 | + |
| 45 | +The result is "good pieces, weak control plane". |
| 46 | + |
| 47 | +## Design Principles |
| 48 | + |
| 49 | +1. Flow is the control plane; repo tools remain domain executors. |
| 50 | +2. Skills stay thin; runtime resolution carries the real behavior. |
| 51 | +3. Reference unrolling is deterministic first, model-assisted only if needed. |
| 52 | +4. Learning produces suggestions, not prompt bloat. |
| 53 | +5. No default context should be paid for behavior that is not active. |
| 54 | + |
| 55 | +## Existing Flow Building Blocks |
| 56 | + |
| 57 | +- task-synced Codex skill metadata in [src/skills.rs](/Users/nikitavoloboev/code/flow/src/skills.rs#L378) and [src/skills.rs](/Users/nikitavoloboev/code/flow/src/skills.rs#L443) |
| 58 | +- Codex skill cache reload in [src/skills.rs](/Users/nikitavoloboev/code/flow/src/skills.rs#L1224) |
| 59 | +- configurable Codex wrapper transport in [src/commit.rs](/Users/nikitavoloboev/code/flow/src/commit.rs#L5414) |
| 60 | +- multi-provider session recovery and copy flows in [src/ai.rs](/Users/nikitavoloboev/code/flow/src/ai.rs#L1) |
| 61 | +- router telemetry hooks in [src/rl_signals.rs](/Users/nikitavoloboev/code/flow/src/rl_signals.rs#L307) |
| 62 | +- current Codex session resolver direction in [codex-openai-session-resolver.md](/Users/nikitavoloboev/code/flow/docs/codex-openai-session-resolver.md#L1) |
| 63 | + |
| 64 | +These are enough to start. The missing work is unification. |
| 65 | + |
| 66 | +## Proposed Architecture |
| 67 | + |
| 68 | +### 1. `codexd`: long-lived Codex control daemon |
| 69 | + |
| 70 | +Add a Flow-managed daemon, either as an extension of `ai-taskd` or as a focused |
| 71 | +`codexd`, with one warm `codex app-server` connection per repo. |
| 72 | + |
| 73 | +Responsibilities: |
| 74 | + |
| 75 | +- maintain repo-scoped Codex app-server sessions |
| 76 | +- cache recent threads, active skills, and repo metadata |
| 77 | +- expose fast local RPC for lookup, runtime-skill injection, and doctor output |
| 78 | +- resolve references before they reach Codex as plain text |
| 79 | +- own the "what extra context is actually needed for this turn?" decision |
| 80 | + |
| 81 | +This should absorb behavior that currently lives in wrappers like `L`. |
| 82 | + |
| 83 | +### 2. Intent registry |
| 84 | + |
| 85 | +Promote Forge-style phrase aliasing into Flow as a generic feature. |
| 86 | + |
| 87 | +Each intent has: |
| 88 | + |
| 89 | +- canonical name |
| 90 | +- phrase aliases |
| 91 | +- optional repo/path scope |
| 92 | +- resolver/action target |
| 93 | +- confidence policy |
| 94 | +- evidence counters for suggested future aliases |
| 95 | + |
| 96 | +Examples: |
| 97 | + |
| 98 | +- `doc-it` |
| 99 | +- `linear-reference` |
| 100 | +- `session-recover` |
| 101 | +- `review-intent-comment` |
| 102 | + |
| 103 | +Intent matching must stay deterministic and cheap. |
| 104 | + |
| 105 | +### 3. Reference resolvers |
| 106 | + |
| 107 | +Flow should ship a generic resolver layer for pasted references: |
| 108 | + |
| 109 | +- Linear issue URLs |
| 110 | +- Linear project URLs |
| 111 | +- GitHub PR / issue URLs |
| 112 | +- repo file paths |
| 113 | +- commit SHAs |
| 114 | +- saved Flow session names or IDs |
| 115 | + |
| 116 | +Resolvers return structured payloads, not prose. Repo-local executors like |
| 117 | +Forge can register resolver commands for domain-specific expansion. |
| 118 | + |
| 119 | +### 4. Runtime skills |
| 120 | + |
| 121 | +Split Codex knowledge into two layers: |
| 122 | + |
| 123 | +- baseline skills: always available, minimal repo guidance |
| 124 | +- runtime skills: ephemeral, injected only when a matched intent or resolver |
| 125 | + requires them |
| 126 | + |
| 127 | +Examples: |
| 128 | + |
| 129 | +- user says `document it` |
| 130 | + - inject tiny docs-routing runtime skill |
| 131 | +- user pastes a Linear URL |
| 132 | + - inject tiny linear-unrolled runtime context |
| 133 | +- user asks to recover recent work |
| 134 | + - inject session-recovery runtime context only for that request |
| 135 | + |
| 136 | +Runtime skills should expire automatically and be bounded by a strict budget. |
| 137 | + |
| 138 | +### 5. Suggestion loop, not self-bloating memory |
| 139 | + |
| 140 | +Use router telemetry plus transcript mining to propose: |
| 141 | + |
| 142 | +- new aliases |
| 143 | +- new reference patterns |
| 144 | +- candidate runtime skills |
| 145 | +- stale skills that should be removed |
| 146 | + |
| 147 | +Important: |
| 148 | + |
| 149 | +- do not auto-install every observed phrase |
| 150 | +- require evidence thresholds |
| 151 | +- prefer suggested changes that collapse multiple variants into one canonical |
| 152 | + intent |
| 153 | + |
| 154 | +## Flow Commands |
| 155 | + |
| 156 | +Add a small command family around the new control plane: |
| 157 | + |
| 158 | +```bash |
| 159 | +f codex open [query] |
| 160 | +f codex resolve "<text-or-url>" [--json] |
| 161 | +f codex runtime |
| 162 | +f codex runtime show |
| 163 | +f codex runtime clear |
| 164 | +f codex teach suggest |
| 165 | +f codex teach accept <intent-or-suggestion-id> |
| 166 | +f codex teach reject <intent-or-suggestion-id> |
| 167 | +f codex doctor |
| 168 | +f codexd start|stop|status |
| 169 | +``` |
| 170 | + |
| 171 | +Intended behavior: |
| 172 | + |
| 173 | +- `f codex open` replaces personal wrappers like `L` |
| 174 | +- `f codex resolve` shows what Flow would unroll or route before Codex sees it |
| 175 | +- `f codex runtime show` explains which runtime skills/context are active |
| 176 | +- `f codex teach suggest` presents evidence-backed alias/intent suggestions |
| 177 | +- `f codex doctor` exposes repo path, active app-server connection, runtime |
| 178 | + budget, skill count, and recent resolver hits |
| 179 | + |
| 180 | +## Config Shape |
| 181 | + |
| 182 | +Proposed `flow.toml` additions: |
| 183 | + |
| 184 | +```toml |
| 185 | +[codex] |
| 186 | +control_plane = "daemon" |
| 187 | +warm_app_server = true |
| 188 | +runtime_skill_budget_chars = 1200 |
| 189 | +auto_resolve_references = true |
| 190 | +auto_learn = "suggest-only" |
| 191 | + |
| 192 | +[codex.session] |
| 193 | +open_command = "codex" |
| 194 | +prefer_last_active = true |
| 195 | +repo_scoped_lookup = true |
| 196 | + |
| 197 | +[[codex.intent]] |
| 198 | +name = "doc-it" |
| 199 | +phrases = ["doc it", "document it", "write this down", "save this in docs"] |
| 200 | +resolver = "docs.route_write" |
| 201 | +scope = ["repo", "personal"] |
| 202 | + |
| 203 | +[[codex.intent]] |
| 204 | +name = "session-recover" |
| 205 | +phrases = ["what was i doing", "recover recent context", "continue the work"] |
| 206 | +resolver = "session.recover" |
| 207 | + |
| 208 | +[[codex.reference_resolver]] |
| 209 | +name = "linear" |
| 210 | +match = ["https://linear.app/*/issue/*", "https://linear.app/*/project/*"] |
| 211 | +command = "forge linear inspect {{ref}} --json" |
| 212 | +inject_as = "linear" |
| 213 | + |
| 214 | +[[codex.reference_resolver]] |
| 215 | +name = "docs" |
| 216 | +match = ["doc it", "document it"] |
| 217 | +command = "forge doc route --title {{title}} --json" |
| 218 | +inject_as = "docs" |
| 219 | +``` |
| 220 | + |
| 221 | +Also add a personal/global config file for user-specific phrase preferences: |
| 222 | + |
| 223 | +- `~/.config/flow/codex-intents.toml` |
| 224 | + |
| 225 | +Use this for personal language variants that should not live in repo config. |
| 226 | + |
| 227 | +## Daemon Responsibilities |
| 228 | + |
| 229 | +`codexd` should own: |
| 230 | + |
| 231 | +- app-server lifecycle |
| 232 | +- repo session caches |
| 233 | +- runtime skill activation/deactivation |
| 234 | +- resolver execution |
| 235 | +- secure env lookups for active workflows |
| 236 | +- bounded prompt-context assembly |
| 237 | +- suggestion generation from telemetry/history |
| 238 | +- compatibility with existing `f skills reload` and `f ai codex ...` flows |
| 239 | + |
| 240 | +It should not: |
| 241 | + |
| 242 | +- replace repo-specific executors like Forge |
| 243 | +- run opaque model-based routing in the hot path |
| 244 | +- inject large transcript summaries into every turn |
| 245 | + |
| 246 | +## Prompt Budget Policy |
| 247 | + |
| 248 | +The runtime layer needs hard limits: |
| 249 | + |
| 250 | +- baseline repo guidance stays small |
| 251 | +- runtime additions must fit a bounded char/token budget |
| 252 | +- each resolved intent/reference should justify its own inclusion |
| 253 | +- unused runtime skills expire quickly |
| 254 | + |
| 255 | +Budget policy should prefer: |
| 256 | + |
| 257 | +1. structured resolver output |
| 258 | +2. one tiny runtime skill |
| 259 | +3. one short recovery summary |
| 260 | +4. nothing else |
| 261 | + |
| 262 | +## Learning Loop |
| 263 | + |
| 264 | +Inputs: |
| 265 | + |
| 266 | +- router telemetry |
| 267 | +- accepted/overridden task choices |
| 268 | +- resolver hits |
| 269 | +- successful tool invocations |
| 270 | +- session transcript mining |
| 271 | + |
| 272 | +Outputs: |
| 273 | + |
| 274 | +- proposed alias additions |
| 275 | +- proposed resolver registrations |
| 276 | +- dead-skill cleanup suggestions |
| 277 | +- better default repo baselines |
| 278 | + |
| 279 | +Approval model: |
| 280 | + |
| 281 | +- repo suggestions require explicit accept |
| 282 | +- personal suggestions can default to personal scope |
| 283 | +- org/shared suggestions should stay gated |
| 284 | + |
| 285 | +## Relationship To Forge |
| 286 | + |
| 287 | +Forge should remain the Prom executor for Prom-specific workflows. |
| 288 | + |
| 289 | +Flow should absorb the generic pieces Forge proved useful: |
| 290 | + |
| 291 | +- intent aliasing |
| 292 | +- reference unrolling |
| 293 | +- thin runtime teaching |
| 294 | +- lean docs workflow activation |
| 295 | + |
| 296 | +That means: |
| 297 | + |
| 298 | +- Prom keeps `forge linear inspect`, `forge doc`, and similar domain commands |
| 299 | +- Flow becomes the generic router that decides when to call them |
| 300 | + |
| 301 | +## Rollout Phases |
| 302 | + |
| 303 | +### Phase 0: unify wrappers |
| 304 | + |
| 305 | +- move `L`-style session open/recover behavior into `f codex open` |
| 306 | +- make repo-scoped Codex session resolution first-class |
| 307 | +- expose a `doctor` view for current skill/runtime state |
| 308 | + |
| 309 | +### Phase 1: warm daemon |
| 310 | + |
| 311 | +- add `codexd` with persistent app-server connection per repo |
| 312 | +- keep recent thread cache and skills cache warm |
| 313 | +- remove process-per-query overhead for session lookup/reload paths |
| 314 | + |
| 315 | +### Phase 2: intent registry + resolvers |
| 316 | + |
| 317 | +- add config-backed intent aliases |
| 318 | +- add generic reference resolver interface |
| 319 | +- ship built-ins for session recovery, docs routing, and Linear URLs |
| 320 | + |
| 321 | +### Phase 3: runtime skills |
| 322 | + |
| 323 | +- inject temporary runtime skills/context instead of growing repo preambles |
| 324 | +- enforce runtime budget caps |
| 325 | +- surface active runtime state in `f codex runtime show` |
| 326 | + |
| 327 | +### Phase 4: learning loop |
| 328 | + |
| 329 | +- mine telemetry + sessions for candidate aliases and resolver patterns |
| 330 | +- generate suggestions only after evidence thresholds |
| 331 | +- add accept/reject workflow |
| 332 | + |
| 333 | +### Phase 5: provider expansion |
| 334 | + |
| 335 | +- reuse the same intent/resolver plane for Claude and Cursor transcript-backed |
| 336 | + workflows where useful |
| 337 | +- keep Codex as the first-class interactive target |
| 338 | + |
| 339 | +## First Implementation Slice |
| 340 | + |
| 341 | +The highest-value first slice is: |
| 342 | + |
| 343 | +1. `f codex open` |
| 344 | +2. `codexd` with warm repo-scoped app-server |
| 345 | +3. `f codex resolve` |
| 346 | +4. config-backed intents |
| 347 | +5. built-in resolvers for: |
| 348 | + - docs intents |
| 349 | + - Linear URLs |
| 350 | + - session recovery prompts |
| 351 | +6. `f codex runtime show` |
| 352 | + |
| 353 | +Why this first: |
| 354 | + |
| 355 | +- it removes the most command-memory burden immediately |
| 356 | +- it uses Flow’s existing app-server + skills + session foundations |
| 357 | +- it keeps the prompt surface thin |
| 358 | +- it gives a concrete place to move personal wrapper logic |
| 359 | + |
| 360 | +## Success Metrics |
| 361 | + |
| 362 | +- p50 `f codex open` latency |
| 363 | +- number of user prompts that required remembering a repo command |
| 364 | +- average runtime-context bytes injected per turn |
| 365 | +- resolver hit rate |
| 366 | +- accepted suggestion rate |
| 367 | +- count of active baseline skills versus runtime skills |
| 368 | + |
| 369 | +## Non-Goals |
| 370 | + |
| 371 | +- full semantic agent routing in the hot path |
| 372 | +- unbounded transcript mining into prompt context |
| 373 | +- replacing repo executors with Flow clones |
| 374 | +- auto-learning every phrase without evidence or approval |
| 375 | + |
| 376 | +## Summary |
| 377 | + |
| 378 | +The target system is not "more AGENTS text" and not "more commands for the |
| 379 | +user to remember". |
| 380 | + |
| 381 | +It is: |
| 382 | + |
| 383 | +- thin baseline repo guidance |
| 384 | +- a warm Flow Codex control daemon |
| 385 | +- deterministic intent/reference resolution |
| 386 | +- ephemeral runtime skills |
| 387 | +- evidence-backed learning with approval |
| 388 | + |
| 389 | +That is how Flow becomes truly Codex-first while keeping context cost low. |
0 commit comments