You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/design/API_CONTRACT.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -617,6 +617,8 @@ Rate limit status is communicated via response headers (see Standard response he
617
617
|`WEBHOOK_NOT_FOUND`| 404 | Webhook does not exist or belongs to a different user. |
618
618
|`WEBHOOK_ALREADY_REVOKED`| 409 | Webhook is already revoked. |
619
619
|`REPO_NOT_ONBOARDED`| 422 | Repository is not registered with the platform. Repos are onboarded via CDK deployment, not via a runtime API. There are no `/v1/repos` endpoints. |
620
+
|`GITHUB_UNREACHABLE`| 502 | The GitHub API was unreachable during the orchestrator's pre-flight check. The task fails fast without consuming compute. Transient — retry with backoff. |
621
+
|`REPO_NOT_FOUND_OR_NO_ACCESS`| 422 | The target repository does not exist or the configured credentials lack access. Checked during the orchestrator's pre-flight step (`GET /repos/{owner}/{repo}`). Distinct from `REPO_NOT_ONBOARDED` — the repo is onboarded but the credential cannot reach it. |
620
622
|`PR_NOT_FOUND_OR_CLOSED`| 422 | For `pr_iteration` and `pr_review` tasks: the specified PR does not exist, is not open, or is not accessible with the configured GitHub token. Checked during the orchestrator's pre-flight step. |
621
623
|`INVALID_STEP_SEQUENCE`| 500 | The blueprint's step sequence is invalid (missing required steps or incorrect ordering). This indicates a CDK configuration error that slipped past synth-time validation. Visible via `GET /v1/tasks/{id}` as `error_code`. See [REPO_ONBOARDING.md](./REPO_ONBOARDING.md#step-sequence-validation). |
622
624
|`GUARDRAIL_BLOCKED`| 400 | Task description was blocked by Bedrock Guardrail content screening (prompt injection detected). Revise the task description and retry. |
**Relationship to blueprints.** The orchestrator is a **framework** that enforces platform invariants — the task state machine, event emission, concurrency management, and cancellation handling — and delegates variable work to **blueprint-defined step implementations**. A blueprint defines which steps run, in what order, and how each step is implemented (built-in strategy, Lambda-backed custom step, or custom sequence). The default blueprint is defined in this document (Section 4). Per-repo customization (see [REPO_ONBOARDING.md](./REPO_ONBOARDING.md)) changes the steps the orchestrator executes, not the framework guarantees it enforces. The orchestrator wraps every step with state transitions, event emission, and cancellation checks — regardless of whether the step is a built-in or a custom Lambda.
31
31
32
-
### Iteration 1 vs. target state
32
+
### Iteration 1 vs. current state
33
33
34
-
In **Iteration 1** (current), the orchestrator does not exist as a distinct component. The client calls`invoke_agent_runtime` synchronously, the agent runs to completion inside the AgentCore Runtime MicroVM, and the caller infers the result from the response. There is no durable state, no task management, no concurrency control, and no recovery. If the caller disconnects, the session is orphaned.
34
+
In **Iteration 1**, the orchestrator did not exist as a distinct component. The client called`invoke_agent_runtime` synchronously, the agent ran to completion inside the AgentCore Runtime MicroVM, and the caller inferred the result from the response. There was no durable state, no task management, no concurrency control, and no recovery.
35
35
36
-
The **target state** (Iteration 2 and beyond) introduces a durable orchestrator that manages the full task lifecycle. This document designs for the target state; where Iteration 1 constraints apply, they are called out explicitly.
36
+
**Current state (Iteration 3+):** The durable orchestrator manages the full task lifecycle with checkpoint/resume (Lambda Durable Functions), the full state machine (8 states), concurrency control, cancellation, context hydration, memory integration, pre-flight checks, and multi-task-type support. This document describes the current architecture; where historical Iteration 1 constraints are referenced (e.g. synchronous invocation model), they are called out explicitly.
37
37
38
38
---
39
39
@@ -224,7 +224,7 @@ When the orchestrator loads a task's `blueprint_config`, it resolves the step pi
224
224
225
225
1.**Load `RepoConfig`** from the `RepoTable` by `repo` (PK). Merge with platform defaults (see [REPO_ONBOARDING.md](./REPO_ONBOARDING.md#platform-defaults) for default values and override precedence).
226
226
2.**Resolve compute strategy** from `compute_type` (default: `agentcore`). The strategy implements the `ComputeStrategy` interface (see [REPO_ONBOARDING.md](./REPO_ONBOARDING.md#compute-strategy-interface)).
227
-
3.**Build step list.** If `step_sequence` is provided, use it; otherwise use the default sequence (`admission-control` → `hydrate-context` → `start-session` → `await-agent-completion` → `finalize`). For each entry, resolve to a built-in step function or a Lambda invocation wrapper.
227
+
3.**Build step list.** If `step_sequence` is provided, use it; otherwise use the default sequence (`admission-control` → `hydrate-context` → `pre-flight` → `start-session` → `await-agent-completion` → `finalize`). The `pre-flight` step runs fail-closed readiness checks (GitHub API reachability, repo access, PR accessibility for PR tasks) before consuming compute — see [ROADMAP.md Iteration 3c](../guides/ROADMAP.md). For each entry, resolve to a built-in step function or a Lambda invocation wrapper.
228
228
4.**Inject custom steps.** If `custom_steps` are defined and no explicit `step_sequence` is provided, insert them at their declared `phase` position (pre-agent steps before `start-session`, post-agent steps after `await-agent-completion`).
229
229
5.**Validate.** Check that required steps are present and correctly ordered (see [step sequence validation](./REPO_ONBOARDING.md#step-sequence-validation)). If invalid, fail the task with `INVALID_STEP_SEQUENCE`.
230
230
6.**Execute.** Iterate the resolved list. For each step: check cancellation, filter `blueprintConfig` to only the fields that step needs (stripping credential ARNs for custom Lambda steps), execute with retry policy, enforce `StepOutput.metadata` size budget (10KB), prune `previousStepResults` to last 5 steps, emit events. Built-in steps that need durable waits (e.g. `await-agent-completion`) receive the `DurableContext` and `ComputeStrategy` so they can call `waitForCondition` and `computeStrategy.pollSession()` internally — no name-based special-casing in the framework loop.
@@ -304,7 +304,7 @@ We evaluated routing GitHub API calls through AgentCore Gateway (with the GitHub
304
304
305
305
4.**User message.** The free-text task description provided by the user (via CLI `--task` flag or equivalent). May supplement or replace the issue context.
306
306
307
-
5.**Memory context (Iteration 3+).** Query long-term memory (e.g. AgentCore Memory) for relevant past context: insights from previous tasks on this repo, failure summaries, learned patterns. See [MEMORY.md](./MEMORY.md) for how insights and code attribution feed into hydration. Not yet implemented.
307
+
5.**Memory context (Iteration 3b+).** Query long-term memory (AgentCore Memory) for relevant past context: repository knowledge (semantic search) and past task episodes (episodic search). Memory is loaded during context hydration via two parallel `RetrieveMemoryRecordsCommand` calls with a 5-second timeout and 2,000-token budget. See [MEMORY.md](./MEMORY.md) for how insights and code attribution feed into hydration. Tier 1 (repo knowledge + task episodes) is operational since Iteration 3b. Tier 2 (review feedback rules) is planned for Iteration 3d.
308
308
309
309
6.**Attachments.** Images or files provided by the user (multi-modal input). Passed through to the agent prompt as base64 or URLs.
310
310
@@ -395,7 +395,7 @@ The orchestrator records the `(task_id, session_id)` mapping in the task record
395
395
396
396
### Invocation model: synchronous vs. asynchronous
397
397
398
-
**Iteration 1 (current).**`invoke_agent_runtime`is called synchronously with a long read timeout. The call blocks until the agent finishes. This is simple but limits concurrency: one orchestrator process per task.
398
+
**Iteration 1 (historical).**`invoke_agent_runtime`was called synchronously with a long read timeout. The call blocked until the agent finished. This was simple but limited concurrency: one orchestrator process per task.
399
399
400
400
**Target state.** The orchestrator uses AgentCore's **asynchronous processing model** ([Runtime async docs](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-long-run.html)). The key capabilities:
401
401
@@ -420,7 +420,7 @@ The orchestrator needs to know whether the session is still running. Two complem
420
420
421
421
2.**Re-invocation on the same session (target state).** The orchestrator calls `invoke_agent_runtime` with the same `runtimeSessionId`. Sticky routing ensures the request reaches the same instance. The agent's entrypoint can detect this is a poll (e.g., via a `poll: true` field in the payload or by tracking the initial task) and return the current status without starting a new task. This is a fast, lightweight call that returns immediately.
422
422
423
-
**Iteration 1.** The `invoke_agent_runtime` call blocks; when it returns, the session is over. No explicit liveness check needed.
423
+
**Iteration 1 (historical).** The `invoke_agent_runtime` call blocked; when it returned, the session was over. No explicit liveness check was needed.
424
424
425
425
**Fallback: DynamoDB heartbeat (optional enhancement).** As defense in depth, the agent can write a heartbeat timestamp to DynamoDB every N minutes. The orchestrator reads it during its poll cycle. A missing heartbeat (e.g. none in the last 10 minutes while `/ping` reports `HealthyBusy`) could indicate the agent is stuck but not idle — triggering investigation or forced termination.
426
426
@@ -430,15 +430,15 @@ AgentCore Runtime terminates sessions after 15 minutes of inactivity (no `/ping`
430
430
431
431
**Mitigation (async model).** In the target state, the agent uses the AgentCore SDK's async task management: `add_async_task` registers a background task, and the SDK automatically reports `HealthyBusy` via `/ping` while any async task is active. AgentCore polls `/ping` and sees the agent is busy, preventing idle termination. When the agent calls `complete_async_task`, the status reverts to `Healthy`. The `/ping` endpoint runs on the main thread (or async event loop) while the coding task runs in a separate thread, so `/ping` remains responsive.
432
432
433
-
**Mitigation (Iteration 1 / current).** The agent container's FastAPI server defines `/ping` as a separate async endpoint. Because the agent task runs in a threadpool worker (not in the asyncio event loop), the `/ping` endpoint remains responsive while the agent works. AgentCore calls `/ping` periodically and the server responds, preventing idle timeout.
433
+
**Mitigation (current).** The agent container's FastAPI server defines `/ping` as a separate async endpoint. Because the agent task runs in a threadpool worker (not in the asyncio event loop), the `/ping` endpoint remains responsive while the agent works. AgentCore calls `/ping` periodically and the server responds, preventing idle timeout.
434
434
435
435
**Risk.** If the agent's computation blocks the entire process (not just a thread) — e.g. due to a subprocess that consumes all resources, or the server becomes unresponsive — the `/ping` response may be delayed, triggering idle termination. This risk applies to both models. The defense is to ensure the coding task runs in a separate thread or process and does not starve the main thread.
436
436
437
437
### Session completion detection
438
438
439
439
When the session ends (agent finishes, crashes, or is terminated), the orchestrator detects this:
440
440
441
-
-**Iteration 1:** The `invoke_agent_runtime` call returns (it blocks). The response body contains the agent's output (status, PR URL, cost, etc.).
441
+
-**Iteration 1 (historical):** The `invoke_agent_runtime` call returned (it blocked). The response body contained the agent's output (status, PR URL, cost, etc.).
442
442
-**Target state:** The orchestrator polls the agent via re-invocation on the same session (see Invocation model above). Completion is detected when: (a) the agent responds with a "completed" or "failed" status in the poll response, or (b) the re-invocation fails because the session was terminated (idle timeout, crash, or 8-hour limit reached). In the durable orchestrator, a `waitForCondition` evaluates the poll result at each interval and resumes the pipeline when the condition is met. See the session monitoring pattern in the Implementation options section.
443
443
444
444
### External termination (cancellation)
@@ -871,6 +871,7 @@ The primary table for task state. DynamoDB.
871
871
|`cost_usd`| Number (optional) | Agent cost from the SDK result. |
872
872
|`duration_s`| Number (optional) | Total task duration in seconds. |
|`lint_passed`| Boolean (optional) | Post-agent lint verification result. Recorded alongside `build_passed` during finalization; surfaced as a span attribute (`lint.passed`) and included in the PR body's verification section. |
874
875
|`max_turns`| Number (optional) | Maximum agent turns for this task. Set during task creation — either the user-specified value (1–500) or the platform default (100). Included in the orchestrator payload and consumed by the agent SDK's `ClaudeAgentOptions(max_turns=...)`. |
875
876
|`max_budget_usd`| Number (optional) | Maximum cost budget in USD for this task. Set during task creation — either the user-specified value ($0.01–$100) or the per-repo Blueprint default. When reached, the agent stops regardless of remaining turns. If neither the task nor the Blueprint specifies a value, no budget limit is applied (turn limit and session timeout still apply). Included in the orchestrator payload and consumed by the agent SDK's `ClaudeAgentOptions(max_budget_usd=...)`. |
876
877
|`blueprint_config`| Map (optional) | Snapshot of the `RepoConfig` record at task creation time (or a reference to it). This ensures tasks are not affected by mid-flight config changes. The schema follows the `RepoConfig` interface defined in [REPO_ONBOARDING.md](./REPO_ONBOARDING.md#repoconfig-schema). Includes `compute_type`, `runtime_arn`, `model_id`, `max_turns`, `system_prompt_overrides`, `github_token_secret_arn`, `poll_interval_ms`, `custom_steps`, `step_sequence`, and `egress_allowlist`. The `max_turns` value from `blueprint_config` serves as the per-repo default; per-task `max_turns` (from the API request) takes higher priority. `max_budget_usd` follows the same 2-tier override pattern: per-task value takes priority over `blueprint_config.max_budget_usd`; if neither is specified, no budget limit is applied. |
0 commit comments