Commit 7d10254
* Initial design docs
* docs(workflows): address principal-review findings on ADR-014
- Validator parity: declare JSON Schema as the single canonical shape
contract (consumed, not re-implemented), keep exactly one cross-field
validator impl in Phases 1-3, and add a contracts/workflow-validation/
golden corpus to lock any Phase-4 second impl to parity (mirrors the
cedar-parity mechanism).
- Cedar principal migration: split into its own isolated PR (Phase 2a)
reviewed alone with regenerated parity fixtures, landing ahead of the
pr_iteration/pr_review workflow migrations (Phase 2b) that depend on it.
- Repo-optionality: reframe web_research as a not-yet-runnable schema
fixture (not an acceptance proof) and require the two blocking open
questions (memory actorId, artifact delivery) to be resolved as a
recorded ADR addendum before the Phase-0 schema is frozen.
Regenerated Starlight mirrors.
* feat(agent): workflow models + loader (#248 Phase 1)
Add the agent/src/workflow/ package: Pydantic models mirroring
workflow.schema.json and a loader that resolves a first-party workflow
id, parses YAML, and shape-validates against the canonical JSON Schema.
Additive only — does not yet touch task_type (the breaking cutover lands
later in Phase 1).
- models.py: Workflow + sub-models; resolved_requires_repo applies the
domain-derived default (coding=true, knowledge/hybrid=false).
- loader.py: JSON-Schema shape validation (one canonical contract,
consumed not re-implemented), YAML load, id/path agreement, and
path-traversal defense. Cross-field rules deliberately live in the
separate validator (Task #2), so there is one cross-field impl.
- Promote pyyaml + jsonschema to direct deps (were only transitive); the
loader must not depend on another package's transitive pin.
- Fix a latent schema bug found by the new tests: the requires_repo/
read_only allOf conditionals fired vacuously when the property was
absent, wrongly applying repo-less constraints to a coding workflow
relying on the domain default. Guard each `if` with `required`.
17 new tests; agent ruff + ty + full suite (836) green.
* feat(agent): cross-field workflow validator + golden corpus (#248 Phase 1)
The single CI-time implementation of the WORKFLOWS.md cross-field rules
the JSON Schema cannot express (rules 1-9, 11, 12, 14). The runtime loader
stays shape-only and trusts this verdict, so there is exactly one cross-field
implementation in Phases 1-3 (avoids the cedar-style two-language drift).
contracts/workflow-validation/ ships the golden parity corpus from Phase 1 so
the expected-verdict contract is fixed before #246 adds a second validator,
mirroring contracts/cedar-parity/. test_workflow_tree_valid.py gates every
first-party workflow file through the validator at CI time.
* feat(agent): workflow step runner + handler registry (#248 Phase 1)
The agent-side step runner that interprets workflow.steps inside the container
(ADR-014): a StepKind→handler registry, in-order execution honouring each step's
on_failure policy (fail / continue / skip_remaining), per-step
step:<name>:start/<status> progress milestones, and resume-aware checkpointing
(deterministic side-effect-free steps recorded to workflow_state.json on the
persistent mount are skipped on resume; agentic/side-effecting steps re-run
idempotently per WORKFLOWS.md §Step execution semantics).
Handlers are thin wrappers over the existing helpers (setup_repo, run_agent,
verify_build/lint, ensure_pr) so this is maximal structural change, minimal
logic change. post_review/deliver_artifact are registered but raise
NotImplementedError (Phase 2b / Phase 3) — fail-loud, not silent no-op, keeping
validator rule-8 handler parity honest. Orchestration core is unit-tested with
fakes; wiring into pipeline.run_task is task 5.
* feat(agent): first-party workflow files — coding/new-task-v1 + default/agent-v1 (#248 Phase 1)
Migrates the new_task task type to coding/new-task-v1.yaml (the heavyweight
clone→build→open-PR coding path) and ships the platform default workflow
default/agent-v1.yaml — the conservative last rung of the resolution ladder
used when no workflow_ref and no Blueprint default apply (requires_repo:false,
read-leaning tool set, deliver-as-comment). pr_iteration/pr_review migrations
are Phase 2b (depend on the Cedar principal migration).
Both pass the cross-field validator and the test_workflow_tree_valid CI gate
(file path agrees with declared id; zero violations).
* fix(agent): address code-review findings in workflow validator + runner (#248 Phase 1)
Fixes the group-A bugs surfaced by the high-recall code review of the Phase 1
commits:
runner.py
- Resume-skip product loss: clone_repo/hydrate_context populate in-memory
products (ctx.setup, ctx.user_prompt) that can't be rebuilt from the JSON
checkpoint, so skipping them on resume left ctx.setup None (breaking ensure_pr)
and ctx.user_prompt empty (unguided agent). Narrow _RESUMABLE_SKIP_KINDS to
verify_build/verify_lint — steps whose ENTIRE product is the checkpointed
boolean re-applied via ctx.artifacts. clone_repo/hydrate_context now re-run on
resume (handler-level idempotency), matching WORKFLOWS.md's intent.
- Skipped steps now emit step:<name>:start / :skipped milestones so a watcher
sees them accounted for instead of a gap.
validator.py
- _HANDLER_KINDS (rule 8) is now derived from runner.STEP_HANDLERS — the single
source of truth — instead of a hand-copied list that could silently drift.
- rule-6 tier ceiling now applies the elevated-only-fields check to read-only
(the strictest tier) as well as standard; previously read-only could declare
mcp_servers/plugins/skills unchecked.
- rule-11 now verifies a deliver_artifact step's target actually produces the
declared comment/artifact outcome, not merely that the step kind is present.
Tests + corpus updated; rule3/rule7 fixtures switched to s3_and_comment target
to stay isolated to their intended rule, plus a new rule11 target-mismatch
fixture. Full agent suite green (909 passed), ruff + ty clean.
* feat(agent): wire workflow runner into pipeline behind a gate (#248 Phase 1, task 5)
Builds the runner→pipeline seam and gates it off (default production behavior
unchanged). pipeline.run_task's agent invocation now goes through
_execute_agent_step: when WORKFLOW_RUNNER_ENABLED is set AND task_type maps to a
shipped workflow (only new_task→coding/new-task-v1 in Phase 1), the single
run_agent step is dispatched through workflow.run_workflow — exercising the real
handler registry, step milestones, and result threading — while clone, context
assembly, and post-hooks stay on the proven inline path. Otherwise it is exactly
the legacy asyncio.run(run_agent(...)) call. pr_iteration/pr_review have no
workflow file until Phase 2b, so they fall back to inline.
run_workflow gains an only_kinds filter so the seam drives just the run_agent
step (no double clone / double PR against the inline post-hooks). The flag is
flipped to default-on in the tasks 6-8 cutover.
Folds in the group-B code-review fixes (effective when the seam is enabled):
- system prompt: hydrate_context builds ctx.system_prompt via the existing
build_system_prompt (was empty), reusing pipeline's logic rather than
reimplementing it.
- verify gate: new shared _gate_status gives verify_build/verify_lint matching
semantics — informational/read_only never gate, regression_only consults
build_before/lint_before (mirrors pipeline's build_ok = passed or not
build_before), strict always gates. Fixes the regression_only build bug and
the verify_lint read_only asymmetry.
- clone_repo is idempotent (reuses a pre-populated ctx.setup).
- StepContext threads the --trace trajectory into run_agent (was dropped).
ensure_pr strategy reconciliation stays deferred to task 8 (it requires removing
the task_type branch inside ensure_pr); recorded in local-docs. Full agent suite
green (930 passed), ruff + ty clean.
* fix(agent): address second-round review findings on the workflow seam (#248 Phase 1)
- Error-handling contract (highest severity): when the run_agent step's handler
raises, run_workflow captures it into a failed StepOutcome rather than
propagating. _execute_agent_step now RE-RAISES in that case so run_task's
except block restores full fidelity — the log_error_cw APPLICATION_LOGS mirror
(read by TaskDashboard / bgagent status), the span error, and the real
exception text — instead of silently downgrading to a generic AgentResult.
This also makes the previously-dead agent_result-is-None branch live.
- _gate_status: an unset gate now defaults to regression_only semantics (gate
only a regression; a pre-existing failure does not gate), matching pipeline.py
which is unconditionally 'build_ok = passed or not build_before'. Previously an
unset gate fell through to strict, contradicting the docstring's 'mirrors
pipeline.py' claim and would wrongly fail a broken-before repo.
- run_agent handler fails loud on an empty system prompt rather than running an
unguided agent loop (repo-less prompt assembly is Phase 3; this turns the gap
into an attributable failed step instead of a silent context-free run).
- _workflow_id_for_task_type: documented as a transitional bridge tied to the
canonical task_type→workflow table in WORKFLOWS.md / API_CONTRACT.md, to be
kept in sync until tasks 6-8 delete it.
Tests added for each behavior change. Full agent suite green (934 passed),
ruff + ty clean.
* feat(api,cli): replace task_type with workflow_ref/resolved_workflow (#248 tasks 6-7)
Breaking API change (no alias): task_type is removed end-to-end; workflow_ref
selects a workflow and resolves to a pinned {id, version} at the create-task
boundary.
CDK/CLI types (task 6, in lockstep — check:types-sync):
- Remove TaskType + isPrTaskType; add ResolvedWorkflow.
- CreateTaskRequest.workflow_ref?; TaskRecord/TaskDetail/TaskSummary
resolved_workflow (+ mappers).
- CLI: --pr/--review-pr now set workflow_ref (coding/pr-iteration-v1,
coding/pr-review-v1); new --workflow flag; format reads resolved_workflow.id.
CDK resolution boundary (task 7):
- New workflows.ts: CDK-side descriptor mirror of agent/workflows/** +
resolveWorkflowRef (ladder: explicit ref → default/agent-v1), isValidWorkflowRef,
requires_repo/read_only/uses_pr accessors. Drift-guard test keeps the table in
sync with the shipped YAML.
- validation.ts: drop VALID_TASK_TYPES/isValidTaskType; hasTaskSpec now takes the
resolved workflow's required_inputs contract.
- create-task-core.ts: resolve workflow, validate inputs against the contract,
persist workflow_ref + resolved_workflow. Repo stays required for ALL workflows
in Phase 1 (repo-less admission is Phase 3).
- preflight.ts: taskType param → readOnly + requiresRepo; repo-less short-circuits
to passed (seam for Phase 3).
- orchestrate-task/context-hydration/orchestrator/ecs-strategy: derive
read_only/requires_repo/PR-ness from resolved_workflow; payload swaps task_type
→ resolved_workflow. ecs bootCommand passes resolved_workflow (lockstep with
agent run_task in task 8).
CLI 294 tests + CDK suite green; check:types-sync OK. Agent cutover (task 8) lands
next to consume resolved_workflow.
* feat(agent): complete task_type→workflow cutover in the agent (#248 task 8)
Removes the Python TaskType enum / PR_TASK_TYPES / _PROMPTS-by-task_type and the
gated seam; the workflow runner is now the sole agentic path, driven by the
resolved_workflow {id, version} threaded from the create-task boundary.
- models.py: delete TaskType enum; TaskConfig gains resolved_workflow,
policy_principal, is_pr_workflow (drops task_type).
- config.py: build_config takes resolved_workflow (not task_type); derives
policy_principal via workflow.policy_principal_for and is_pr_workflow; PR_TASK_TYPES
→ PR_WORKFLOW_IDS.
- Cedar UNCHANGED (Phase 2a owns the principal migration): the
Agent::TaskAgent::"<id>" scheme and contracts/cedar-parity/ are untouched.
policy_principal_for maps read_only⇒pr_review, else id→legacy {new_task,
pr_iteration, pr_review}. Only policy.py edit: drop the WARN-only TaskType()
validation. runner.py passes config.policy_principal as the principal. All
test_policy* pass unchanged.
- prompts rekeyed by workflow id (get_system_prompt falls back to the default
coding prompt for an unknown id rather than raising).
- pipeline.py: _execute_agent_step loads from resolved_workflow, always-on;
the 3 task_type=='pr_review' branches → workflow.read_only; ensure_pr takes an
explicit strategy (create/push_resolve/resolve) from the workflow step,
resolving the deferred code-review item.
- post_hooks.ensure_pr: strategy param replaces task_type self-inspection.
repo.py: PR-branch resume keys off config.is_pr_workflow.
- server.py/entrypoint.py: thread resolved_workflow; validation keyed on PR
workflow ids.
- New faithful Phase-1 workflow files coding/pr-iteration-v1 (push_resolve) +
coding/pr-review-v1 (read_only, resolve, no Write/Edit per rule 4). All 4
first-party files validate; CDK descriptor drift-guard green.
- docs: API_CONTRACT.md migration table + resolved_workflow responses; Starlight
mirror synced.
Agent suite 922 passed, ruff + ty clean. With tasks 6-7 (e829c4f) this completes
the breaking cutover across API + CLI + agent.
* fix(agent): use ty ignore syntax + annotate workflow test handlers (#248)
The full build was never completed on this branch; ty type-checking and ruff
format gates were red. No src logic changes — type/format hygiene only.
- Tests used mypy-style `# type: ignore[arg-type]`, but the project type
checker is ty (`# ty: ignore[rule]`), so the suppression silently no-op'd.
- test_workflow_runner.py: annotate inline step handlers as
StepHandler-compatible `(step: Step, ctx: StepContext) -> StepOutcome`;
type the `handlers` dicts as `dict[str, StepHandler]` (ty treats the alias
params as positional-only → dict-value invariance rejects named-param
funcs); convert the stale ignore to `# ty: ignore[invalid-argument-type]`.
- test_entrypoint.py: assert `resolved_workflow is not None` before
subscripting (TaskConfig.resolved_workflow is `dict | None`).
- Apply ruff format reflows (runner.py, validator.py,
test_workflow_tree_valid.py) the branch was committed without.
* feat(agent): Cedar read-only enforcement keys off context.read_only (#248 Phase 2a)
Migrate read-only enforcement from the literal `Agent::TaskAgent::"pr_review"`
principal match to the `context.read_only` attribute, so the Write/Edit
hard-deny attaches to the *property* and fires for every read-only workflow
uniformly — not just coding/pr-review. Removes the Phase-2b principal bridge.
This is the security-load-bearing step: an error here *silently weakens*
enforcement (the rule stops matching) rather than failing loudly. Guarded by
new contracts/cedar-parity/ golden fixtures verified on BOTH the cedarpy
(Python) and cedar-wasm (TypeScript) engines.
Policy (both bindings, kept byte-for-byte identical — drift guard enforces):
- agent/policies/hard_deny.cedar + cdk/.../builtin-policies.ts:
pr_review_forbid_write/edit (principal == "pr_review") →
read_only_forbid_write/edit (when context.read_only == true, any principal).
Wiring:
- policy.py: PolicyEngine gains read_only kwarg; threads read_only into every
Cedar request context (probe + base_context).
- models.py: TaskConfig.read_only; config.py derives it from the resolved
workflow; runner.py passes config.read_only to PolicyEngine.
- workflow/loader.py: remove the read_only ⇒ "pr_review" bridge in
policy_principal_for — the principal is now an identity/audit tag only;
pr-review keeps its own id-derived principal.
Parity fixtures (new): read-only-forbid-write, read-only-forbid-edit
(read_only=true ⇒ deny), read-only-false-permits-write (read_only=false ⇒
base_permit — proves the deny is gated on the property, not always-on).
Tests: TestPrReviewPermissions → TestReadOnlyPermissions, now asserting the
deny fires for any read-only principal AND that read_only=false permits Write;
test_hooks + test_config updated. agent 927 / cdk 1822 green; ty + ruff clean.
Docs: ADR-014 addendum (2026-06-08) records dropping the isolated-PR/ordering
requirement (2b shipped first behind the bridge, so read-only was never
unprotected); WORKFLOWS.md §"Replacing the Cedar principal" + phasing table
updated to past tense.
* docs(workflows): resolve repo-optional open questions + freeze schema (#248)
ADR-014 addendum (2026-06-08) settles the two open questions that gated the
Phase-0 schema freeze (WORKFLOWS.md open questions #1, #2). With the one implied
schema reshape applied, the schema is frozen — later phases add handlers and
plumbing, not schema fields.
Decision 1 — Memory actorId for repo-less tasks: per-user user:{cognito_sub}.
Caller-scoped (no cross-tenant bleed; mirrors the per-user trace prefix); cross-
workflow pooling explicitly not adopted. NO schema field — a fixed platform
fallback, applied in memory.py in Phase 3 when repo is absent.
Decision 2 — Artifact delivery: named Python deliverers (open target string),
shared S3 plumbing pinned now. deliver_artifact.target resolves to a registered
deliverer (workflow/deliverers.py → DELIVERERS), same pattern as STEP_HANDLERS —
a new delivery method is a registered deliverer, not a schema change. Plumbing
frozen: task-scoped key artifacts/{task_id}/, prefix-scoped SessionRole grant,
size limit, TaskDetail URL surfacing, workflow:{id} repo-less tenant tag.
Schema reshape applied:
- workflow.schema.json: steps[].target drops its enum (stays type:string,
minLength:1); the closed set moves into the deliverer registry.
- models.py: DeliverTarget Literal → str.
- New workflow/deliverers.py: Deliverer dataclass + DELIVERERS registry +
produced_outcomes(); single source of truth for "what each target produces."
- validator.py rule 11: consults the registry (_deliverer_produces) instead of
the hardcoded _DELIVER_TARGET_OUTCOMES map. The three first-party names keep
their exact produced-outcome sets, so NO existing workflow / fixture / golden
vector changes verdict; an unknown deliverer name now produces nothing (new
guard, tested).
Tests: test_workflow_deliverers.py (registry contract) + a rule-11
unknown-deliverer case. agent 932 green; ty + ruff clean. deliver_artifact
remains a NotImplementedError stub until Phase 3 — only the contract is frozen.
* feat(api,agent): repo-optional admission for repo-less workflows (#248 Phase 3)
First slice of Phase 3 (repo-optional tasks): the admission path now accepts a
submission with no repo when the resolved workflow does not require one, so a
repo-less knowledge workflow (e.g. default/agent-v1) can be admitted, hydrated,
and run from task_description alone. The deliverer/S3 + agent-runtime clone-skip
slices follow separately.
Semantics: requires_repo decides whether a repo is MANDATORY; requires_repo:false
means repo-OPTIONAL (a repo-less workflow may still target a repo). The repo-less
behavior keys off repo *absence*, not the workflow flag — a repo-optional workflow
given a repo still takes the repo-bound path.
Types (CDK + CLI mirror, kept in sync — check:types-sync green):
- CreateTaskRequest.repo, TaskRecord.repo → optional; TaskDetail.repo,
TaskSummary.repo → string | null. Mappers null-coalesce.
Admission (create-task-core.ts): resolve the workflow FIRST, then gate repo on
workflow.requiresRepo (missing repo on a repo-bound workflow → 400). Onboarding +
blueprint-cap lookup runs only when a repo is present; repo-less takes the platform
default cap. Record omits repo when absent; replay-required-fields drops 'repo';
event metadata null-coalesces.
Hydration (context-hydration.ts): new repo-less branch (keyed on !task.repo) that
assembles from task_description + per-user memory, returning before the repo-bound
fetch path; assembleUserPrompt drops the Repository: line when repo is absent.
Memory (memory.ts + orchestrator.ts): loadMemoryContext/writeMinimalEpisode take
an actorNamespace (repo for coding, user:{user_id} for repo-less — ADR-014 addendum).
Preflight already honored requiresRepo (Phase-1 seam) — widened its repo param to
string|undefined. fanout-task-events skips the GitHub channel for a repo-less task.
loadBlueprintConfig skips the RepoTable lookup when repo is absent.
Agent (config.py): build_config loads the workflow up-front and requires
repo_url/github_token only when requires_repo; repo-less runs from
task_description. CLI shows '—' for a repo-less task.
Tests: repo-less acceptance (create-task-core, create-task, context-hydration,
CLI format) + repo-bound-missing-repo rejection; memory test log key repo →
actor_namespace. agent 932 / cdk 1825 / cli 296 green; ty + ruff + eslint clean.
deliver_artifact is still a NotImplementedError stub — a repo-less task can be
admitted and hydrated but not yet deliver an artifact (next slice).
* feat(agent): repo-less pipeline path via the workflow runner (#248 Phase 3)
Second Phase-3 slice: a repo-less workflow now runs the agent loop in-container
with no clone / build / PR. When config.requires_repo is false, run_task delegates
to _run_repoless_task, which drives the workflow's steps (hydrate_context →
run_agent) through the workflow step runner — coding vs knowledge now differ by
the workflow's steps list, per ADR-014's end-state — and assembles a terminal
TaskResult with pr_url=None.
Scope boundary (deliberate): only hydrate_context + run_agent are driven here.
deliver_artifact is the workflow's terminal step but its handler is a stub until
the artifacts-bucket S3/IAM contract ships (next slice), so it is excluded via
only_kinds; delivery is deferred, the agent run is not. The coding path is left
byte-identical — its hard-won inline cancel-safety (skip ensure_pr on a cancelled
task) is NOT yet runner-driven, so routing the full coding step list through the
runner is a separate cancel-aware change, intentionally not done here.
- models.py: TaskConfig.requires_repo (defaults True; coding unaffected).
config.py build_config sets it from the resolved workflow.
- pipeline.py: requires_repo branch before the repo-coupled segment;
_run_repoless_task runs the runner + assembles/persists the terminal result.
run_task repo_url now defaults to "" (repo-less callers omit it).
- prompts/default_agent.py: repo-less system prompt (no git/branch/PR
placeholders); registered for default/agent-v1.
- prompt_builder.build_repoless_system_prompt + shared _render_memory_context.
Tests: repo-less pipeline run (no setup_repo, no PR, success, repo-less prompt);
repo-less prompt has no repo placeholders. agent 934 green; ty + ruff clean.
* fix(agent): address PR-review findings on Phase 2a/3 (#248)
Findings from the plugin review (code-reviewer, silent-failure-hunter,
pr-test-analyzer, type-design-analyzer) on 4c34c8b..HEAD.
HIGH — repo-less task reported COMPLETED while delivering nothing
(silent-failure-hunter). _run_repoless_task skips deliver_artifact (stub until
the artifacts contract ships), but the workflow's terminal outcome (e.g.
`comment`) was never produced, yet status was success. Now: if the primary
terminal outcome is delivery-backed and delivery was skipped, the task is a loud
FAILED with an explicit error + delivery_deferred milestone naming the gap.
MEDIUM — config.py load-failure fallback bug + fail-open (code-reviewer,
silent-failure-hunter). The WorkflowValidationError branch compared workflow_id
against DEFAULT_WORKFLOW_ID (the *coding* default) for requires_repo, contradicting
its comment, and defaulted read_only=False (fail-open) for unknown ids. Now:
log the failure (was silent); fail CLOSED — read_only=True for any id not in
_KNOWN_WRITEABLE_WORKFLOW_IDS; requires_repo keyed off the real repo-less default
(REPO_LESS_DEFAULT_WORKFLOW_ID = default/agent-v1).
MEDIUM — TaskConfig missing cross-field invariant (type-design-analyzer). Added
_validate_requires_repo_has_repo (mirrors _validate_trace_requires_user_id):
requires_repo=True ⇒ non-empty repo_url, enforced at construction. repo_url and
github_token now default to "" so a repo-less TaskConfig is constructible; the
validator preserves the repo-bound invariant.
LOW/secondary — rule-8 now also checks deliver_artifact.target resolves in
DELIVERERS (type-design-analyzer), so an unknown target is caught universally,
not only when it collides with the primary outcome.
Test gaps closed (pr-test-analyzer): repo-OPTIONAL workflow given a valid repo
(admission + hydration); repo-less memory keyed user:{user_id} (asserted);
malformed-repo-on-repo-bound rejection; preflight repo-less short-circuit +
repo-bound-no-repo invariant; repo-less agent_result=None → FAILED; config
load-failure fail-closed; TaskConfig validator; rule-8 deliver target.
agent 941 / cdk 1831 / cli 296 green; ty + ruff + type-sync clean.
* feat(agent,api): deliver_artifact + repo-less memory — close criterion #4 (#248 Phase 3)
Final Phase-3 slice: a repo-less knowledge task now runs end-to-end — admitted,
hydrated, agent loop, AND delivers its output. Closes acceptance criterion #4
(a non-coding task runs without a repo) for the s3/comment deliverers.
Infra (IAM + env):
- agent-session-role.ts: s3:PutObject grant scoped to
artifacts/${aws:PrincipalTag/task_id}/* on the trace-artifacts bucket (mirrors
the traces/ pattern; task_id-scoped per ADR-014 addendum). cdk-nag reason updated.
- agent.ts: ARTIFACTS_BUCKET_NAME runtime env (same bucket as traces).
Deliverers (agent/src/workflow/deliverers.py):
- DELIVERERS gains a deliver(target, ctx) dispatcher + DeliveryResult. The s3
deliverer uploads the agent's result text to artifacts/{task_id}/result.md via
the tenant-scoped S3 client (5 MiB cap); the comment deliverer records a
delivered_comment milestone for the channel fanout; s3_and_comment does both.
- AgentResult.result_text captures ResultMessage.result (the knowledge-task
deliverable); runner.py populates it on success.
- runner._handle_deliver_artifact: now implemented (was NotImplementedError),
routes through deliver(); a delivery failure is a failed step (terminal FAILED),
never a silent skip.
Pipeline (pipeline.py):
- _run_repoless_task drives the FULL step list (deliver_artifact included);
replaces the delivery-deferred FAILED gate with the real runner verdict (agent
succeeded but workflow failed ⇒ attributed FAILED). Sets TaskResult.artifact_uri.
- Repo-less episodic memory write keyed user:{user_id} (fail-open).
Memory (memory.py): write_task_episode repo→actor param; _validate_actor accepts
owner/repo OR user:{id} so the same write path serves coding + knowledge tasks.
Types: TaskResult/TaskRecord/TaskDetail + CLI mirror gain artifact_uri (mirrors
trace_s3_uri); CLI detail view shows an Artifact: line. check:types-sync green.
Tests: deliverer dispatch (s3 upload + key/body, comment milestone, s3_and_comment,
missing bucket, empty text, oversize); repo-less pipeline now succeeds + sets
artifact_uri=null for comment-only; session-role artifacts grant; deliver_artifact
dropped from the fail-loud stub list. agent 947 / cdk 1832 / cli 296 green; synth +
cdk-nag clean; ty + ruff + type-sync clean.
* feat(agent): make repo-less delivery retrievable + ship a knowledge workflow (#248 Phase 3)
Three should-fix items closing the gap where "criterion #4" was met by machinery
but the default repo-less path delivered to a channel that isn't wired.
1. Stale docstring: _run_repoless_task said deliver_artifact was a deferred stub
— it ships now. Updated to describe the real full-step-list run.
2. Default workflow delivers retrievably. The channel fan-out only knows
slack/email/github; an api-origin repo-less task (the common default case) has
no channel, so a `comment`-only deliverer surfaced nothing the user could get.
default/agent-v1 now uses `target: s3_and_comment` (primary outcome `artifact`):
the S3 upload to artifacts/{task_id}/ is always retrievable, the comment
milestone is still rendered by the fan-out when a channel exists.
3. Ship a runnable knowledge workflow: agent/workflows/knowledge/web-research-v1.yaml
— the concrete repo-less demonstration (research → S3 artifact, no repo).
Built-in capabilities only (registry MCP/skills are Phase 4); CDK descriptor
mirror + drift-guard + resolver tests updated.
Also fixes a latent bug the knowledge workflow surfaced: a repo-less id with no
registered prompt fell back to the CODING prompt (leaking unsubstitutable
{repo_url}/{branch_name}). get_system_prompt gains repo_less= so repo-less
workflows fall back to the repo-less default-agent prompt instead.
Tests: repo-less pipeline now asserts artifact_uri set (s3_and_comment);
repo-less prompt fallback; knowledge workflow validates/resolves. agent 949 /
cdk 1833 green; ty + ruff + type-sync clean; docs synced.
* fix(agent): address full-branch review — 2 repo-less blockers + cleanups (#248)
Findings from the four-reviewer full-branch pass. Two were blockers that would
have shipped the repo-less feature broken despite green unit tests (the tests
didn't cross the server/task_state boundaries).
BLOCKER 1 — repo-less task rejected on the AgentCore backend. server.py
_validate_required_params required repo_url unconditionally, so /invocations
returned 400 for every repo-less task before the pipeline ran (ECS backend
dodged it by calling run_task directly). Now gates repo_url on the workflow's
requires_repo (resolved via load_workflow; fails SAFE — assume required — on a
load error). Test: repo-less payload accepted, repo-bound still requires repo.
BLOCKER 2 — artifact_uri computed but never persisted. task_state.write_terminal's
field allowlist had no artifact_uri branch, so the URI was dropped before
DynamoDB and TaskDetail.artifact_uri was always null (delivery not retrievable
despite the S3 object existing). Added the persist branch + tests.
Secondary:
- comment deliverer no longer overstates: _post_comment docstring + WORKFLOWS.md
state plainly that delivered_comment is recorded for the event stream (visible
in `bgagent watch`) but is NOT yet routed to an external channel (not in the
fan-out ROUTABLE_MILESTONES). comment_posted=True only when actually recorded.
- test pins the config→engine read_only seam (runner.py PolicyEngine(read_only=)):
dropping it now fails a test instead of silently disabling the Write/Edit deny.
- stale phase-boundary comments corrected (post_review docstring, runner module
docstring + only_kinds + run_agent guard, pr-review/pr-iteration YAML headers)
— they described Phase 2a/2b/3 as pending when those shipped on this branch.
- ADR-014: replaced the dangling pre-rebase commit hash 838c72e (unreachable
after rebase) with a descriptive reference.
agent 954 / cdk 1833 green; ty + ruff + type-sync clean; docs synced.
* test(cli): cover artifact_uri render lines (#248 Phase 3)
The Artifact: display lines added to formatTaskDetail / formatStatusSnapshot
introduced uncovered truthy branches that dropped CLI branch coverage below the
84% gate (full `mise run build` caught it; the per-package run had passed on the
prior content). Adds non-null + null cases for both renderers, mirroring the
existing trace_s3_uri tests. cli 300 passed; full build green.
* fix(agent): address PR review findings + green CI (#248)
CI: ruff-format the 4 files that tripped the build self-mutation
(config.py, deliverers.py, test_pipeline.py, test_task_state.py) so
"Fail build on mutation" passes.
Review findings:
- prompts/__init__.py: get_system_prompt now logs WARN when an id has no
registered prompt and falls back, restoring the signal the old
build_system_prompt emitted (silent-failure-hunter).
- pipeline._run_repoless_task: chain wf_result.failed_step.error into the
synthesized AgentResult so a run_agent failure on the repo-less path is
diagnosable instead of a generic message (code-reviewer).
- server._validate_required_params: narrow the broad `except Exception`
around load_workflow to WorkflowValidationError/ImportError so a genuine
programming error surfaces loudly instead of being masked as "could not
resolve requires_repo" (still fails safe — repo required).
- workflows.ts workflowIsReadOnly: document why the unknown-id default is
`false` (conservative admission: needsWrite=!readOnly) and must NOT be
"aligned" to the agent-side fail-closed read-only default.
- runner.py / pr-review-v1.yaml: comment fixes — artifact_key→artifact_uri,
drop never-written review_posted, rule 4→6, post_review handler is
registered-but-unimplemented not "no shipped handler" (comment-analyzer).
Tests (pr-test-analyzer gaps):
- test_memory.py: cover _validate_actor (user:{id} namespace + rejects) and
a write_task_episode(actor="user:...") reaching create_event.
- test_pipeline.py: cover the repo-less delivery-gate-failure branch (agent
succeeds but deliver_artifact fails → terminal FAILED naming the step).
Docs: ROADMAP.md "Task types" now reflects workflow-driven tasks + repo-less
knowledge workflows; Starlight mirror regenerated.
agent 962 passed, ruff+ty clean, cdk compile+workflows tests green.
* fix(agent): ship first-party workflow files in the image + sync user docs (#248)
The Dockerfile copied agent/src, agent/policies, and contracts but never
agent/workflows, so the runtime loader (which resolves /app/workflows/<domain>/
<name>-vN.yaml) failed every workflow load with WorkflowValidationError. Add
COPY agent/workflows/ /app/workflows/ so the five shipped workflows + the JSON
schema land in the image. Verified in a local build: load_workflow(
'coding/new-task-v1') resolves to 1.0.0.
Also bring the user-facing guides in line with the workflow-driven model:
USER_GUIDE/QUICK_START now document workflow_ref, the --workflow flag, and
resolved_workflow responses (the retired task_type is shown only as a migration
mapping). Rename the "Task types" section to "Workflows", update the Starlight
sync map + sidebar slug, and regenerate the mirrors.
* fix(agent,api,cli): address PR-review findings on the workflow seam (#248)
Resolve five verified findings from the #248 review:
HIGH — thread agent_config.allowed_tools to the SDK. runner.py hardcoded the
full tool surface, so a workflow's declared allowed_tools never reached
ClaudeAgentOptions and the read_only:false read-leaning lanes (default/agent-v1,
web-research-v1) could still get Write/Bash. Add TaskConfig.allowed_tools
(populated from the resolved workflow), resolve it in run_agent via
_resolve_allowed_tools (drops Write/Edit when read_only), and add SDK-layer
tests.
Cedar missing-read_only fail-open — add the ADR-014-promised parity vector.
With context omitting read_only, both engines fail OPEN (Allow + base_permit);
safety rests on policy.py::_eval_tier re-raising on diagnostics.errors. Add
contracts/cedar-parity/read-only-missing-attribute.json (verified on cedarpy
AND cedar-wasm) plus test_policy.py tests pinning the re-raise + always-inject
discipline.
MEDIUM — repo-less artifact success gate. pipeline.py now fails the task when
the primary outcome is `artifact` but no artifact_uri was produced (matches the
WORKFLOWS.md "agent-success AND S3 key present" contract), not just when the
deliver step raised.
MEDIUM — rule 13 model allow-list admission. Add WORKFLOW_MODEL_ALLOWLIST +
disallowedWorkflowModel() and enforce it in create-task-core (unpermitted model
=> 400, no silent downgrade); descriptor sync-test asserts modelId matches the
YAML and stays on the allow-list.
LOW — CLI repo-less submission. --repo is no longer a hard requiredOption:
required unless --workflow selects a repo-less workflow (still required with
--pr/--review-pr). Lets bgagent reach knowledge/web-research-v1.
Rule 10 (one production version per lineage) remains intentionally unenforced
at the single-file layer — it is a registry/promotion-time property.
* fix(agent,api,cli): address PR-review #296 resolution-ladder + deliverer findings (#248)
Resolve the remaining findings from isadeks's review (findings #2/#4 were
already fixed in the prior commit):
#1 — bare submit silently stopped opening PRs. The server ladder resolves an
absent workflow_ref to the repo-less default/agent-v1 (by design), so a CLI
`submit --repo X --task Y` regressed from "opens a PR" to "S3 artifact". The CLI
now sends coding/new-task-v1 explicitly when a repo is present and no workflow/
pr flag is given, preserving the old new_task default.
#3 — repo-optional + repo disagreed across the boundary. The agent took the
repo-less path off config.requires_repo alone, while the orchestrator built a
repo-bound prompt off repo presence. pipeline.run_task now takes the repo-less
path only when requires_repo is false AND no repo was supplied; a repo-optional
workflow given a repo runs the repo-bound path (clone/build/PR), matching CDK
admission and the prompt.
#6 — resolveWorkflowRef silently discarded a @Version constraint. It now honors
the constraint (Phase 1: must match the single shipped version) and returns
null otherwise; create-task-core 400s with a distinct "version not available"
message via resolveWorkflowRefError, instead of quietly running the shipped
version.
#7 — deliver_artifact unset-target default disagreed between validator (lenient
full set) and runtime (s3). Introduce DEFAULT_DELIVER_TARGET as the single
source of truth; produced_outcomes(None) now models that exact default, so a
primary:comment workflow that omits target is correctly flagged by rule 11.
#10 — descriptor/YAML drift had no parity test. The sync test now asserts
required_inputs (allOf/oneOf) parity in addition to id/version/requires_repo/
read_only/model, and a new test cross-checks the agent-side
_KNOWN_WRITEABLE_WORKFLOW_IDS (the third hand-maintained copy) against the
writeable repo-bound descriptors so drift fails CI.
* fix(agent): close PR-review #296 follow-ups — post-hook fail-soft, prompt, cap, parity (#248)
#5 (High) — post-hook load_workflow had no fail-soft fallback, so a reload
failure AFTER run_agent mutated/committed the tree stranded the work as FAILED
with no PR. read_only now comes from config (already fallback-computed and the
verdict that drove Cedar); the reload — needed only for the ensure_pr strategy —
is wrapped in the same WorkflowValidationError fallback build_config uses,
defaulting to "create" so the PR still opens.
#8 — knowledge/web-research-v1 had no registered prompt and silently degraded to
the generic default-agent prompt. Add a research-specialized, repo-less prompt
(prompts/web_research.py) and register it.
#9 — the 5 MiB artifact cap was checked AFTER encoding to UTF-8 (a second full
copy). Reject on the character count up front (chars ≤ bytes in UTF-8) so the cap
actually bounds memory on the MicroVM before materializing the bytes.
#10 residual — add a parity test for the agent-side REPO_LESS_DEFAULT_WORKFLOW_ID
/ DEFAULT_WORKFLOW_ID constants: the repo-less default must match the CDK platform
default and be requires_repo:false in the YAML; the coding default must be
repo-bound. Closes the last hand-maintained-copy axis #10 named.
Perf — memoize load_workflow (@cache): files are image-baked and immutable per
process and Workflow is frozen, so the 3-4 loads per task now parse once.
Leaves the coding-lane decorative-`gate` item (run_workflow only drives run_agent
while verify_build/ensure_pr stay inline) as a tracked follow-up — it is a
larger half-migrated-runner refactor both reviewers flagged as non-blocking.
---------
Co-authored-by: bgagent <bgagent@noreply.github.com>
1 parent 1a1b6cc commit 7d10254
128 files changed
Lines changed: 10042 additions & 700 deletions
File tree
- agent
- policies
- src
- prompts
- workflow
- tests
- workflows
- coding
- default
- knowledge
- schema
- cdk
- src
- constructs
- handlers
- stacks
- test
- constructs
- handlers
- cli
- src
- commands
- test
- commands
- contracts
- cedar-parity
- workflow-validation
- docs
- decisions
- design
- guides
- scripts
- src/content/docs
- architecture
- decisions
- getting-started
- roadmap
- using
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
77 | 83 | | |
78 | 84 | | |
79 | 85 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
16 | | - | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
17 | 21 | | |
18 | | - | |
| 22 | + | |
19 | 23 | | |
20 | | - | |
| 24 | + | |
21 | 25 | | |
22 | 26 | | |
23 | | - | |
| 27 | + | |
| 28 | + | |
24 | 29 | | |
25 | | - | |
| 30 | + | |
26 | 31 | | |
27 | | - | |
| 32 | + | |
28 | 33 | | |
29 | | - | |
| 34 | + | |
30 | 35 | | |
31 | 36 | | |
32 | | - | |
| 37 | + | |
| 38 | + | |
33 | 39 | | |
34 | 40 | | |
35 | 41 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
37 | 43 | | |
38 | 44 | | |
39 | 45 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
14 | | - | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
15 | 28 | | |
16 | 29 | | |
17 | 30 | | |
| |||
314 | 327 | | |
315 | 328 | | |
316 | 329 | | |
317 | | - | |
| 330 | + | |
318 | 331 | | |
319 | 332 | | |
320 | 333 | | |
| |||
325 | 338 | | |
326 | 339 | | |
327 | 340 | | |
328 | | - | |
| 341 | + | |
329 | 342 | | |
330 | 343 | | |
331 | 344 | | |
| |||
351 | 364 | | |
352 | 365 | | |
353 | 366 | | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
354 | 406 | | |
355 | | - | |
356 | | - | |
357 | | - | |
358 | | - | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
359 | 414 | | |
360 | 415 | | |
361 | | - | |
362 | | - | |
363 | | - | |
364 | | - | |
365 | | - | |
366 | | - | |
| 416 | + | |
367 | 417 | | |
368 | | - | |
369 | | - | |
| 418 | + | |
| 419 | + | |
370 | 420 | | |
371 | 421 | | |
372 | 422 | | |
| |||
394 | 444 | | |
395 | 445 | | |
396 | 446 | | |
397 | | - | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
398 | 453 | | |
399 | 454 | | |
400 | 455 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
33 | 32 | | |
34 | 33 | | |
35 | 34 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
56 | 70 | | |
57 | 71 | | |
58 | 72 | | |
| |||
67 | 81 | | |
68 | 82 | | |
69 | 83 | | |
70 | | - | |
| 84 | + | |
71 | 85 | | |
72 | 86 | | |
73 | 87 | | |
| |||
81 | 95 | | |
82 | 96 | | |
83 | 97 | | |
84 | | - | |
85 | | - | |
86 | | - | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
87 | 102 | | |
88 | 103 | | |
89 | 104 | | |
90 | 105 | | |
91 | 106 | | |
92 | 107 | | |
93 | 108 | | |
94 | | - | |
| 109 | + | |
95 | 110 | | |
96 | 111 | | |
97 | 112 | | |
| |||
124 | 139 | | |
125 | 140 | | |
126 | 141 | | |
127 | | - | |
| 142 | + | |
128 | 143 | | |
129 | 144 | | |
130 | 145 | | |
| |||
0 commit comments