|
| 1 | +# Pre-Sandbox Phase Plan |
| 2 | + |
| 3 | +## Goal |
| 4 | + |
| 5 | +Add a configurable pre-sandbox phase to `ai-workflow` that runs before any Vercel Sandbox is provisioned. The phase lets the service execute pluggable server-side steps, such as an AI SDK ticket check, and use their output to either enrich downstream agent prompts or halt before sandbox creation. |
| 6 | + |
| 7 | +## Decisions |
| 8 | + |
| 9 | +- Config lives in the `ai-workflow` service repo. |
| 10 | +- The config file is required at the repo root as `pre-sandbox.yaml`. |
| 11 | +- Minimal valid config: |
| 12 | + |
| 13 | +```yaml |
| 14 | +preSandbox: |
| 15 | + runOn: |
| 16 | + newTicket: true |
| 17 | + existingPr: true |
| 18 | + mergeConflict: true |
| 19 | + steps: [] |
| 20 | +``` |
| 21 | +
|
| 22 | +- Missing or invalid config fails the build. |
| 23 | +- Step implementations live in `src/pre-sandbox/steps/*.ts`. |
| 24 | +- Config references registered step names. Adding a new step requires code changes and redeploy. |
| 25 | +- Steps run sequentially. |
| 26 | +- Steps run server-side in Workflow step functions, not inside the Vercel Sandbox. |
| 27 | +- Steps may use AI SDK and tools. |
| 28 | +- Steps may halt sandbox creation. |
| 29 | +- Workflow remains responsible for standard Jira and Slack communication. |
| 30 | +- Step output can be injected into research, implementation, and review prompts. |
| 31 | +- No retries in the first version. Support timeout and failure behavior only. |
| 32 | + |
| 33 | +## Success Criteria |
| 34 | + |
| 35 | +- `pnpm build` fails when `pre-sandbox.yaml` is missing or invalid. |
| 36 | +- Config cannot reference an unknown pre-sandbox step. |
| 37 | +- A configured pre-sandbox step runs before `provisionSandbox(...)`. |
| 38 | +- A halting step prevents sandbox provisioning. |
| 39 | +- A halting step can trigger the existing clarification or failure notification path. |
| 40 | +- Prompt additions from a step appear only in the selected downstream prompts. |
| 41 | +- The showcase AI SDK step can evaluate ticket complexity without repo knowledge. |
| 42 | + |
| 43 | +## Config Shape |
| 44 | + |
| 45 | +Initial config example: |
| 46 | + |
| 47 | +```yaml |
| 48 | +preSandbox: |
| 49 | + runOn: |
| 50 | + newTicket: true |
| 51 | + existingPr: true |
| 52 | + mergeConflict: true |
| 53 | +
|
| 54 | + steps: |
| 55 | + - uses: ticket-complexity-check |
| 56 | + name: Ticket Complexity Check |
| 57 | + timeoutMs: 120000 |
| 58 | + onFailure: fail |
| 59 | + with: |
| 60 | + input: |
| 61 | + ticket: |
| 62 | + - identifier |
| 63 | + - title |
| 64 | + - description |
| 65 | + - acceptanceCriteria |
| 66 | + - comments |
| 67 | +``` |
| 68 | + |
| 69 | +### Fields |
| 70 | + |
| 71 | +- `preSandbox.runOn.newTicket`: run when no PR exists yet for the ticket branch. |
| 72 | +- `preSandbox.runOn.existingPr`: run when a PR already exists for the ticket branch. |
| 73 | +- `preSandbox.runOn.mergeConflict`: run when an existing PR has conflicts. |
| 74 | +- `steps[].uses`: registered step id from `src/pre-sandbox/steps/index.ts`. |
| 75 | +- `steps[].name`: display name used in logs and prompt sections. |
| 76 | +- `steps[].timeoutMs`: maximum duration for the step. |
| 77 | +- `steps[].onFailure`: one of `continue`, `fail`, or `move_to_backlog`. |
| 78 | +- `steps[].with`: step-specific config passed to the step implementation. |
| 79 | + |
| 80 | +## Runtime Contract |
| 81 | + |
| 82 | +Create shared types in `src/pre-sandbox/types.ts`. |
| 83 | + |
| 84 | +```ts |
| 85 | +export type PreSandboxPromptTarget = "research" | "implementation" | "review"; |
| 86 | +
|
| 87 | +export interface PreSandboxPromptAddition { |
| 88 | + target: PreSandboxPromptTarget[]; |
| 89 | + title: string; |
| 90 | + content: string; |
| 91 | +} |
| 92 | +
|
| 93 | +export type PreSandboxStepResult = |
| 94 | + | { |
| 95 | + status: "continue"; |
| 96 | + promptAdditions?: PreSandboxPromptAddition[]; |
| 97 | + } |
| 98 | + | { |
| 99 | + status: "halt"; |
| 100 | + outcome: "needs_clarification" | "failed"; |
| 101 | + message: string; |
| 102 | + questions?: string[]; |
| 103 | + promptAdditions?: PreSandboxPromptAddition[]; |
| 104 | + }; |
| 105 | +``` |
| 106 | + |
| 107 | +`message` is the human-readable reason used for logs and workflow notifications. It is not a separate control path. |
| 108 | + |
| 109 | +## Step Input Contract |
| 110 | + |
| 111 | +The runner builds a controlled input object and passes only the fields selected by config. |
| 112 | + |
| 113 | +```ts |
| 114 | +export interface PreSandboxStepContext { |
| 115 | + ticket: { |
| 116 | + identifier?: string; |
| 117 | + title?: string; |
| 118 | + description?: string; |
| 119 | + acceptanceCriteria?: string; |
| 120 | + comments?: Array<{ author: string; body: string; createdAt?: string }>; |
| 121 | + labels?: string[]; |
| 122 | + }; |
| 123 | + run: { |
| 124 | + branchName: string; |
| 125 | + isNewTicket: boolean; |
| 126 | + hasExistingPr: boolean; |
| 127 | + hasMergeConflict: boolean; |
| 128 | + }; |
| 129 | +} |
| 130 | +``` |
| 131 | + |
| 132 | +For the first version, input field selection only needs to support ticket fields. Additional fields can be added later without changing the step result contract. |
| 133 | + |
| 134 | +## Build-Time Validation |
| 135 | + |
| 136 | +Add: |
| 137 | + |
| 138 | +- `src/pre-sandbox/config.ts` |
| 139 | +- `src/pre-sandbox/steps/index.ts` |
| 140 | +- `scripts/validate-pre-sandbox-config.ts` |
| 141 | + |
| 142 | +Validation rules: |
| 143 | + |
| 144 | +- `pre-sandbox.yaml` must exist. |
| 145 | +- Root key must be `preSandbox`. |
| 146 | +- `runOn` booleans must be present. |
| 147 | +- `steps` must be an array. |
| 148 | +- Each `steps[].uses` must exist in the step registry. |
| 149 | +- `timeoutMs`, when present, must be a positive integer. |
| 150 | +- `onFailure` must be `continue`, `fail`, or `move_to_backlog`. |
| 151 | +- `name`, when present, must be non-empty. |
| 152 | + |
| 153 | +Update `package.json`: |
| 154 | + |
| 155 | +```json |
| 156 | +{ |
| 157 | + "scripts": { |
| 158 | + "validate:pre-sandbox": "tsx scripts/validate-pre-sandbox-config.ts", |
| 159 | + "build": "pnpm validate:pre-sandbox && rm -rf .nitro/workflow && NODE_OPTIONS=--max-old-space-size=8192 nitro build" |
| 160 | + } |
| 161 | +} |
| 162 | +``` |
| 163 | + |
| 164 | +The repo does not currently include a YAML parser dependency. Add a focused YAML parser dependency, then validate the parsed object with Zod. |
| 165 | + |
| 166 | +## Workflow Integration |
| 167 | + |
| 168 | +Current flow in `src/workflows/agent.ts`: |
| 169 | + |
| 170 | +1. Fetch and validate ticket. |
| 171 | +2. Load prompts. |
| 172 | +3. Notify started. |
| 173 | +4. Resolve branch and PR context. |
| 174 | +5. Create branch if needed. |
| 175 | +6. Fetch attachments. |
| 176 | +7. Ensure Arthur task. |
| 177 | +8. Resolve agent kind. |
| 178 | +9. Provision sandbox. |
| 179 | + |
| 180 | +New flow: |
| 181 | + |
| 182 | +1. Fetch and validate ticket. |
| 183 | +2. Load prompts. |
| 184 | +3. Notify started. |
| 185 | +4. Resolve branch and PR context. |
| 186 | +5. Create branch if needed. |
| 187 | +6. Fetch attachments. |
| 188 | +7. Run pre-sandbox phase. |
| 189 | +8. If halted, use existing workflow communication and terminal handling. |
| 190 | +9. Ensure Arthur task. |
| 191 | +10. Resolve agent kind. |
| 192 | +11. Provision sandbox. |
| 193 | + |
| 194 | +The pre-sandbox phase should run after PR context is known, because `runOn` depends on whether the branch already has a PR and whether it has conflicts. It should run before Arthur task creation and before sandbox provisioning. |
| 195 | + |
| 196 | +## Prompt Injection |
| 197 | + |
| 198 | +Extend context assembly functions in `src/sandbox/context.ts` to accept pre-sandbox prompt additions. |
| 199 | + |
| 200 | +Research prompt section format: |
| 201 | + |
| 202 | +```md |
| 203 | +## Pre-Sandbox: Ticket Complexity Check |
| 204 | +
|
| 205 | +This information was produced before sandbox creation. |
| 206 | +
|
| 207 | +<step output> |
| 208 | +``` |
| 209 | + |
| 210 | +Apply the same section format to implementation and review prompts when selected by step output. |
| 211 | + |
| 212 | +Suggested API changes: |
| 213 | + |
| 214 | +```ts |
| 215 | +interface ResearchPlanContextInput { |
| 216 | + // existing fields |
| 217 | + preSandboxAdditions?: PreSandboxPromptAddition[]; |
| 218 | +} |
| 219 | +
|
| 220 | +interface ImplementationContextInput { |
| 221 | + // existing fields |
| 222 | + preSandboxAdditions?: PreSandboxPromptAddition[]; |
| 223 | +} |
| 224 | +
|
| 225 | +interface ReviewContextInput { |
| 226 | + // existing fields |
| 227 | + preSandboxAdditions?: PreSandboxPromptAddition[]; |
| 228 | +} |
| 229 | +``` |
| 230 | + |
| 231 | +The runner groups additions by target: |
| 232 | + |
| 233 | +```ts |
| 234 | +{ |
| 235 | + research: [...], |
| 236 | + implementation: [...], |
| 237 | + review: [...] |
| 238 | +} |
| 239 | +``` |
| 240 | + |
| 241 | +## Failure And Halt Behavior |
| 242 | + |
| 243 | +Step execution failure: |
| 244 | + |
| 245 | +- `onFailure: continue`: log the failure, continue to the next step, do not inject output. |
| 246 | +- `onFailure: fail`: halt workflow as failed, unregister run, move ticket to Backlog, notify through existing `failed` event. |
| 247 | +- `onFailure: move_to_backlog`: same terminal ticket movement as failure, but keep the message oriented around pre-sandbox rejection. |
| 248 | + |
| 249 | +Step returns `halt`: |
| 250 | + |
| 251 | +- `outcome: needs_clarification`: unregister run, post clarification questions, move ticket to Backlog, notify through existing `needs_clarification` event. |
| 252 | +- `outcome: failed`: unregister run, move ticket to Backlog, notify through existing `failed` event. |
| 253 | + |
| 254 | +The workflow should own Jira and Slack communication so behavior stays consistent with research, implementation, and review phases. |
| 255 | + |
| 256 | +## Showcase Step |
| 257 | + |
| 258 | +Add `src/pre-sandbox/steps/ticket-complexity-check.ts`. |
| 259 | + |
| 260 | +Purpose: |
| 261 | + |
| 262 | +- Use AI SDK to review only the ticket text. |
| 263 | +- Decide whether the ticket is small enough and clear enough to send into sandbox execution. |
| 264 | +- No repo access. |
| 265 | +- No internal docs access. |
| 266 | + |
| 267 | +Expected behavior: |
| 268 | + |
| 269 | +- Continue when the ticket is clear enough. |
| 270 | +- Halt with `needs_clarification` when the ticket is too broad, too vague, or missing essential acceptance criteria. |
| 271 | +- Return prompt additions for `research` and `implementation` when continuing. |
| 272 | + |
| 273 | +Example output when continuing: |
| 274 | + |
| 275 | +```ts |
| 276 | +{ |
| 277 | + status: "continue", |
| 278 | + promptAdditions: [ |
| 279 | + { |
| 280 | + target: ["research", "implementation"], |
| 281 | + title: "Ticket Complexity Check", |
| 282 | + content: "The ticket looks implementable without additional clarification. Main risk: acceptance criteria do not mention empty states." |
| 283 | + } |
| 284 | + ] |
| 285 | +} |
| 286 | +``` |
| 287 | + |
| 288 | +Example output when halting: |
| 289 | + |
| 290 | +```ts |
| 291 | +{ |
| 292 | + status: "halt", |
| 293 | + outcome: "needs_clarification", |
| 294 | + message: "Ticket is too broad to implement safely without repo knowledge.", |
| 295 | + questions: [ |
| 296 | + "Which user journey is in scope for the first implementation?", |
| 297 | + "What acceptance criteria define completion?" |
| 298 | + ] |
| 299 | +} |
| 300 | +``` |
| 301 | + |
| 302 | +## Implementation Steps |
| 303 | + |
| 304 | +1. Add `pre-sandbox.yaml` |
| 305 | + - Create the required root config file. |
| 306 | + - Start with an empty `steps` array or the showcase `ticket-complexity-check` disabled until its env requirements are settled. |
| 307 | + - Verify with config parser tests. |
| 308 | + |
| 309 | +2. Add config schema and loader |
| 310 | + - Parse YAML. |
| 311 | + - Validate with Zod. |
| 312 | + - Validate step ids against registry. |
| 313 | + - Verify invalid config cases in unit tests. |
| 314 | + |
| 315 | +3. Add build validation script |
| 316 | + - Add `scripts/validate-pre-sandbox-config.ts`. |
| 317 | + - Add `validate:pre-sandbox` script. |
| 318 | + - Run it before `nitro build`. |
| 319 | + - Verify missing file and unknown step fail. |
| 320 | + |
| 321 | +4. Add step registry |
| 322 | + - Add `src/pre-sandbox/steps/index.ts`. |
| 323 | + - Export a typed registry keyed by `uses`. |
| 324 | + - Verify registry ids match config validation. |
| 325 | + |
| 326 | +5. Add runner |
| 327 | + - Add `src/pre-sandbox/runner.ts`. |
| 328 | + - Apply `runOn` conditions. |
| 329 | + - Execute steps sequentially. |
| 330 | + - Enforce timeout. |
| 331 | + - Normalize prompt additions by target. |
| 332 | + - Verify continue, halt, timeout, and failure behavior. |
| 333 | + |
| 334 | +6. Add prompt injection |
| 335 | + - Update `src/sandbox/context.ts`. |
| 336 | + - Add tests in `src/sandbox/context.test.ts`. |
| 337 | + - Verify additions appear in selected prompts only. |
| 338 | + |
| 339 | +7. Add showcase AI SDK step |
| 340 | + - Add `ticket-complexity-check`. |
| 341 | + - Use structured AI output. |
| 342 | + - Keep tools limited to ticket communication decisions for the first version. |
| 343 | + - Mock AI SDK in tests. |
| 344 | + |
| 345 | +8. Wire into `agentWorkflow` |
| 346 | + - Run after PR context and attachments are available. |
| 347 | + - Halt before Arthur task creation and sandbox provisioning. |
| 348 | + - Pass grouped prompt additions into research, implementation, and review context assembly. |
| 349 | + - Verify halted pre-sandbox path never calls `provisionSandbox`. |
| 350 | + |
| 351 | +## Test Plan |
| 352 | + |
| 353 | +Unit tests: |
| 354 | + |
| 355 | +- Config loader accepts the minimal file. |
| 356 | +- Config loader rejects missing `preSandbox`. |
| 357 | +- Config loader rejects unknown `uses`. |
| 358 | +- Config loader rejects invalid `onFailure`. |
| 359 | +- Runner skips based on `runOn`. |
| 360 | +- Runner executes steps sequentially. |
| 361 | +- Runner groups prompt additions by target. |
| 362 | +- Runner halts on `needs_clarification`. |
| 363 | +- Runner handles `onFailure: continue`. |
| 364 | +- Runner handles `onFailure: fail`. |
| 365 | +- Prompt assembly includes pre-sandbox blocks in selected phases only. |
| 366 | + |
| 367 | +Workflow-level tests: |
| 368 | + |
| 369 | +- Continuing pre-sandbox run reaches sandbox provisioning. |
| 370 | +- Halting pre-sandbox run unregisters the run, moves the ticket to Backlog, and sends the standard notification. |
| 371 | +- Halting pre-sandbox run does not provision a sandbox. |
| 372 | + |
| 373 | +Build validation: |
| 374 | + |
| 375 | +- `pnpm validate:pre-sandbox` passes with valid config. |
| 376 | +- `pnpm validate:pre-sandbox` fails with missing file. |
| 377 | +- `pnpm validate:pre-sandbox` fails with unknown step id. |
| 378 | + |
| 379 | +## Deferred |
| 380 | + |
| 381 | +- Parallel step groups. |
| 382 | +- Retries. |
| 383 | +- HTTP/plugin step loading. |
| 384 | +- Target repo supplied config. |
| 385 | +- Rich input selection beyond ticket fields. |
| 386 | +- Persisting pre-sandbox artifacts outside workflow state. |
| 387 | +- Internal docs/resource fetching steps. |
0 commit comments