Skip to content

Commit 93ea9c6

Browse files
committed
chore(docs): update roadmap and design
1 parent e133db0 commit 93ea9c6

13 files changed

Lines changed: 150 additions & 42 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,11 +56,11 @@ ABCA is under active development. The platform ships iteratively — each iterat
5656
|---|---|---|
5757
| **1** | Done | Agent runs on AWS, CLI submit, branch + PR |
5858
| **2** | Done | Production orchestrator, API contract, task management, observability, security, webhooks |
59-
| **3a** | Done | Repo onboarding, per-repo GitHub App credentials, turn caps, prompt guide |
59+
| **3a** | Done | Repo onboarding, per-repo credentials, turn caps, prompt guide |
6060
| **3b** | Done | Memory Tier 1, insights, agent self-feedback, prompt versioning, commit attribution |
6161
| **3bis** | Done | Hardening — reconciler error tracking, error serialization, test coverage gaps |
6262
| **3c** | WIP | Pre-flight checks, persistent session storage, deterministic validation, PR review task type, multi-modal input, input guardrail screening |
63-
| **3d** | Planned | Review feedback loop, PR outcome tracking, evaluation pipeline |
63+
| **3d** | Planned | Review feedback loop, PR outcome tracking, evaluation pipeline, memory input hardening |
6464
| **4** | Planned | GitLab, visual proof, Slack, control panel, WebSocket streaming |
6565
| **5** | Planned | Pre-warming, multi-user/team, cost management, output guardrails, alternate runtime |
6666
| **6** | Planned | Skills learning, multi-repo, iterative feedback, multiplayer, CDK constructs |

docs/design/API_CONTRACT.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -617,6 +617,8 @@ Rate limit status is communicated via response headers (see Standard response he
617617
| `WEBHOOK_NOT_FOUND` | 404 | Webhook does not exist or belongs to a different user. |
618618
| `WEBHOOK_ALREADY_REVOKED` | 409 | Webhook is already revoked. |
619619
| `REPO_NOT_ONBOARDED` | 422 | Repository is not registered with the platform. Repos are onboarded via CDK deployment, not via a runtime API. There are no `/v1/repos` endpoints. |
620+
| `GITHUB_UNREACHABLE` | 502 | The GitHub API was unreachable during the orchestrator's pre-flight check. The task fails fast without consuming compute. Transient — retry with backoff. |
621+
| `REPO_NOT_FOUND_OR_NO_ACCESS` | 422 | The target repository does not exist or the configured credentials lack access. Checked during the orchestrator's pre-flight step (`GET /repos/{owner}/{repo}`). Distinct from `REPO_NOT_ONBOARDED` — the repo is onboarded but the credential cannot reach it. |
620622
| `PR_NOT_FOUND_OR_CLOSED` | 422 | For `pr_iteration` and `pr_review` tasks: the specified PR does not exist, is not open, or is not accessible with the configured GitHub token. Checked during the orchestrator's pre-flight step. |
621623
| `INVALID_STEP_SEQUENCE` | 500 | The blueprint's step sequence is invalid (missing required steps or incorrect ordering). This indicates a CDK configuration error that slipped past synth-time validation. Visible via `GET /v1/tasks/{id}` as `error_code`. See [REPO_ONBOARDING.md](./REPO_ONBOARDING.md#step-sequence-validation). |
622624
| `GUARDRAIL_BLOCKED` | 400 | Task description was blocked by Bedrock Guardrail content screening (prompt injection detected). Revise the task description and retry. |

docs/design/ARCHITECTURE.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,12 @@ Each concept has a **source-of-truth document** and one or more documents that r
207207
| Live session replay | ROADMAP.md (Iter 4) | API_CONTRACT.md |
208208
| PR iteration task type | API_CONTRACT.md, ORCHESTRATOR.md | USER_GUIDE.md, PROMPT_GUIDE.md, SECURITY.md, AGENT_HARNESS.md |
209209
| PR review task type | API_CONTRACT.md, ORCHESTRATOR.md | USER_GUIDE.md, PROMPT_GUIDE.md, SECURITY.md, AGENT_HARNESS.md |
210+
| Orchestrator pre-flight checks | ORCHESTRATOR.md (Context hydration, pre-flight sub-step) | API_CONTRACT.md (Error codes: GITHUB_UNREACHABLE, REPO_NOT_FOUND_OR_NO_ACCESS), ROADMAP.md (3c), SECURITY.md |
210211
| Bedrock Guardrail input screening | SECURITY.md (Input validation and guardrails) | ORCHESTRATOR.md (Context hydration), API_CONTRACT.md (Error codes), OBSERVABILITY.md (Alarms), ROADMAP.md (3c) |
212+
| Memory input hardening (3e Phase 1) | ROADMAP.md (Iter 3e Phase 1, co-ships with 3d) | MEMORY.md, SECURITY.md (Memory-specific threats) |
213+
| Per-tool-call structured telemetry | ROADMAP.md (Iter 3d) | SECURITY.md (Mid-execution enforcement), EVALUATION.md, OBSERVABILITY.md |
214+
| Mid-execution behavioral monitoring | ROADMAP.md (Iter 5), SECURITY.md (Mid-execution enforcement) | OBSERVABILITY.md |
215+
| Tool-call interceptor (Guardian pattern) | SECURITY.md (Mid-execution enforcement), ROADMAP.md (Iter 5) | REPO_ONBOARDING.md (Blueprint security props) |
211216

212217
### Per-repo model selection
213218

docs/design/ORCHESTRATOR.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,11 @@ The orchestrator document describes **behavior** (state machine, admission, canc
2929

3030
**Relationship to blueprints.** The orchestrator is a **framework** that enforces platform invariants — the task state machine, event emission, concurrency management, and cancellation handling — and delegates variable work to **blueprint-defined step implementations**. A blueprint defines which steps run, in what order, and how each step is implemented (built-in strategy, Lambda-backed custom step, or custom sequence). The default blueprint is defined in this document (Section 4). Per-repo customization (see [REPO_ONBOARDING.md](./REPO_ONBOARDING.md)) changes the steps the orchestrator executes, not the framework guarantees it enforces. The orchestrator wraps every step with state transitions, event emission, and cancellation checks — regardless of whether the step is a built-in or a custom Lambda.
3131

32-
### Iteration 1 vs. target state
32+
### Iteration 1 vs. current state
3333

34-
In **Iteration 1** (current), the orchestrator does not exist as a distinct component. The client calls `invoke_agent_runtime` synchronously, the agent runs to completion inside the AgentCore Runtime MicroVM, and the caller infers the result from the response. There is no durable state, no task management, no concurrency control, and no recovery. If the caller disconnects, the session is orphaned.
34+
In **Iteration 1**, the orchestrator did not exist as a distinct component. The client called `invoke_agent_runtime` synchronously, the agent ran to completion inside the AgentCore Runtime MicroVM, and the caller inferred the result from the response. There was no durable state, no task management, no concurrency control, and no recovery.
3535

36-
The **target state** (Iteration 2 and beyond) introduces a durable orchestrator that manages the full task lifecycle. This document designs for the target state; where Iteration 1 constraints apply, they are called out explicitly.
36+
**Current state (Iteration 3+):** The durable orchestrator manages the full task lifecycle with checkpoint/resume (Lambda Durable Functions), the full state machine (8 states), concurrency control, cancellation, context hydration, memory integration, pre-flight checks, and multi-task-type support. This document describes the current architecture; where historical Iteration 1 constraints are referenced (e.g. synchronous invocation model), they are called out explicitly.
3737

3838
---
3939

@@ -224,7 +224,7 @@ When the orchestrator loads a task's `blueprint_config`, it resolves the step pi
224224

225225
1. **Load `RepoConfig`** from the `RepoTable` by `repo` (PK). Merge with platform defaults (see [REPO_ONBOARDING.md](./REPO_ONBOARDING.md#platform-defaults) for default values and override precedence).
226226
2. **Resolve compute strategy** from `compute_type` (default: `agentcore`). The strategy implements the `ComputeStrategy` interface (see [REPO_ONBOARDING.md](./REPO_ONBOARDING.md#compute-strategy-interface)).
227-
3. **Build step list.** If `step_sequence` is provided, use it; otherwise use the default sequence (`admission-control``hydrate-context``start-session``await-agent-completion``finalize`). For each entry, resolve to a built-in step function or a Lambda invocation wrapper.
227+
3. **Build step list.** If `step_sequence` is provided, use it; otherwise use the default sequence (`admission-control``hydrate-context``pre-flight``start-session``await-agent-completion``finalize`). The `pre-flight` step runs fail-closed readiness checks (GitHub API reachability, repo access, PR accessibility for PR tasks) before consuming compute — see [ROADMAP.md Iteration 3c](../guides/ROADMAP.md). For each entry, resolve to a built-in step function or a Lambda invocation wrapper.
228228
4. **Inject custom steps.** If `custom_steps` are defined and no explicit `step_sequence` is provided, insert them at their declared `phase` position (pre-agent steps before `start-session`, post-agent steps after `await-agent-completion`).
229229
5. **Validate.** Check that required steps are present and correctly ordered (see [step sequence validation](./REPO_ONBOARDING.md#step-sequence-validation)). If invalid, fail the task with `INVALID_STEP_SEQUENCE`.
230230
6. **Execute.** Iterate the resolved list. For each step: check cancellation, filter `blueprintConfig` to only the fields that step needs (stripping credential ARNs for custom Lambda steps), execute with retry policy, enforce `StepOutput.metadata` size budget (10KB), prune `previousStepResults` to last 5 steps, emit events. Built-in steps that need durable waits (e.g. `await-agent-completion`) receive the `DurableContext` and `ComputeStrategy` so they can call `waitForCondition` and `computeStrategy.pollSession()` internally — no name-based special-casing in the framework loop.
@@ -304,7 +304,7 @@ We evaluated routing GitHub API calls through AgentCore Gateway (with the GitHub
304304

305305
4. **User message.** The free-text task description provided by the user (via CLI `--task` flag or equivalent). May supplement or replace the issue context.
306306

307-
5. **Memory context (Iteration 3+).** Query long-term memory (e.g. AgentCore Memory) for relevant past context: insights from previous tasks on this repo, failure summaries, learned patterns. See [MEMORY.md](./MEMORY.md) for how insights and code attribution feed into hydration. Not yet implemented.
307+
5. **Memory context (Iteration 3b+).** Query long-term memory (AgentCore Memory) for relevant past context: repository knowledge (semantic search) and past task episodes (episodic search). Memory is loaded during context hydration via two parallel `RetrieveMemoryRecordsCommand` calls with a 5-second timeout and 2,000-token budget. See [MEMORY.md](./MEMORY.md) for how insights and code attribution feed into hydration. Tier 1 (repo knowledge + task episodes) is operational since Iteration 3b. Tier 2 (review feedback rules) is planned for Iteration 3d.
308308

309309
6. **Attachments.** Images or files provided by the user (multi-modal input). Passed through to the agent prompt as base64 or URLs.
310310

@@ -395,7 +395,7 @@ The orchestrator records the `(task_id, session_id)` mapping in the task record
395395

396396
### Invocation model: synchronous vs. asynchronous
397397

398-
**Iteration 1 (current).** `invoke_agent_runtime` is called synchronously with a long read timeout. The call blocks until the agent finishes. This is simple but limits concurrency: one orchestrator process per task.
398+
**Iteration 1 (historical).** `invoke_agent_runtime` was called synchronously with a long read timeout. The call blocked until the agent finished. This was simple but limited concurrency: one orchestrator process per task.
399399

400400
**Target state.** The orchestrator uses AgentCore's **asynchronous processing model** ([Runtime async docs](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-long-run.html)). The key capabilities:
401401

@@ -420,7 +420,7 @@ The orchestrator needs to know whether the session is still running. Two complem
420420

421421
2. **Re-invocation on the same session (target state).** The orchestrator calls `invoke_agent_runtime` with the same `runtimeSessionId`. Sticky routing ensures the request reaches the same instance. The agent's entrypoint can detect this is a poll (e.g., via a `poll: true` field in the payload or by tracking the initial task) and return the current status without starting a new task. This is a fast, lightweight call that returns immediately.
422422

423-
**Iteration 1.** The `invoke_agent_runtime` call blocks; when it returns, the session is over. No explicit liveness check needed.
423+
**Iteration 1 (historical).** The `invoke_agent_runtime` call blocked; when it returned, the session was over. No explicit liveness check was needed.
424424

425425
**Fallback: DynamoDB heartbeat (optional enhancement).** As defense in depth, the agent can write a heartbeat timestamp to DynamoDB every N minutes. The orchestrator reads it during its poll cycle. A missing heartbeat (e.g. none in the last 10 minutes while `/ping` reports `HealthyBusy`) could indicate the agent is stuck but not idle — triggering investigation or forced termination.
426426

@@ -430,15 +430,15 @@ AgentCore Runtime terminates sessions after 15 minutes of inactivity (no `/ping`
430430

431431
**Mitigation (async model).** In the target state, the agent uses the AgentCore SDK's async task management: `add_async_task` registers a background task, and the SDK automatically reports `HealthyBusy` via `/ping` while any async task is active. AgentCore polls `/ping` and sees the agent is busy, preventing idle termination. When the agent calls `complete_async_task`, the status reverts to `Healthy`. The `/ping` endpoint runs on the main thread (or async event loop) while the coding task runs in a separate thread, so `/ping` remains responsive.
432432

433-
**Mitigation (Iteration 1 / current).** The agent container's FastAPI server defines `/ping` as a separate async endpoint. Because the agent task runs in a threadpool worker (not in the asyncio event loop), the `/ping` endpoint remains responsive while the agent works. AgentCore calls `/ping` periodically and the server responds, preventing idle timeout.
433+
**Mitigation (current).** The agent container's FastAPI server defines `/ping` as a separate async endpoint. Because the agent task runs in a threadpool worker (not in the asyncio event loop), the `/ping` endpoint remains responsive while the agent works. AgentCore calls `/ping` periodically and the server responds, preventing idle timeout.
434434

435435
**Risk.** If the agent's computation blocks the entire process (not just a thread) — e.g. due to a subprocess that consumes all resources, or the server becomes unresponsive — the `/ping` response may be delayed, triggering idle termination. This risk applies to both models. The defense is to ensure the coding task runs in a separate thread or process and does not starve the main thread.
436436

437437
### Session completion detection
438438

439439
When the session ends (agent finishes, crashes, or is terminated), the orchestrator detects this:
440440

441-
- **Iteration 1:** The `invoke_agent_runtime` call returns (it blocks). The response body contains the agent's output (status, PR URL, cost, etc.).
441+
- **Iteration 1 (historical):** The `invoke_agent_runtime` call returned (it blocked). The response body contained the agent's output (status, PR URL, cost, etc.).
442442
- **Target state:** The orchestrator polls the agent via re-invocation on the same session (see Invocation model above). Completion is detected when: (a) the agent responds with a "completed" or "failed" status in the poll response, or (b) the re-invocation fails because the session was terminated (idle timeout, crash, or 8-hour limit reached). In the durable orchestrator, a `waitForCondition` evaluates the poll result at each interval and resumes the pipeline when the condition is met. See the session monitoring pattern in the Implementation options section.
443443

444444
### External termination (cancellation)
@@ -871,6 +871,7 @@ The primary table for task state. DynamoDB.
871871
| `cost_usd` | Number (optional) | Agent cost from the SDK result. |
872872
| `duration_s` | Number (optional) | Total task duration in seconds. |
873873
| `build_passed` | Boolean (optional) | Post-agent build verification result. |
874+
| `lint_passed` | Boolean (optional) | Post-agent lint verification result. Recorded alongside `build_passed` during finalization; surfaced as a span attribute (`lint.passed`) and included in the PR body's verification section. |
874875
| `max_turns` | Number (optional) | Maximum agent turns for this task. Set during task creation — either the user-specified value (1–500) or the platform default (100). Included in the orchestrator payload and consumed by the agent SDK's `ClaudeAgentOptions(max_turns=...)`. |
875876
| `max_budget_usd` | Number (optional) | Maximum cost budget in USD for this task. Set during task creation — either the user-specified value ($0.01–$100) or the per-repo Blueprint default. When reached, the agent stops regardless of remaining turns. If neither the task nor the Blueprint specifies a value, no budget limit is applied (turn limit and session timeout still apply). Included in the orchestrator payload and consumed by the agent SDK's `ClaudeAgentOptions(max_budget_usd=...)`. |
876877
| `blueprint_config` | Map (optional) | Snapshot of the `RepoConfig` record at task creation time (or a reference to it). This ensures tasks are not affected by mid-flight config changes. The schema follows the `RepoConfig` interface defined in [REPO_ONBOARDING.md](./REPO_ONBOARDING.md#repoconfig-schema). Includes `compute_type`, `runtime_arn`, `model_id`, `max_turns`, `system_prompt_overrides`, `github_token_secret_arn`, `poll_interval_ms`, `custom_steps`, `step_sequence`, and `egress_allowlist`. The `max_turns` value from `blueprint_config` serves as the per-repo default; per-task `max_turns` (from the API request) takes higher priority. `max_budget_usd` follows the same 2-tier override pattern: per-task value takes priority over `blueprint_config.max_budget_usd`; if neither is specified, no budget limit is applied. |

0 commit comments

Comments
 (0)