Skip to content

feat(tasks): workflow-driven tasks — declarative steps + repo-less workflows (#248)#296

Merged
krokoko merged 27 commits into
mainfrom
feat/248-workflow-driven-tasks
Jun 10, 2026
Merged

feat(tasks): workflow-driven tasks — declarative steps + repo-less workflows (#248)#296
krokoko merged 27 commits into
mainfrom
feat/248-workflow-driven-tasks

Conversation

@krokoko

@krokoko krokoko commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Workflow-driven tasks (#248): replace the hardcoded task_type enum with declarative, versioned workflow files that describe how the agent runs one kind of task — and unlock repo-less (knowledge) workflows that run end-to-end with no GitHub repo. Implements ADR-014.

Area

  • cdk — infrastructure, handlers, constructs
  • agent — Python runtime / Docker image
  • clibgagent client
  • docs — guides or design sources (docs/guides/, docs/design/)
  • tooling — root mise.toml, scripts, CI workflows

Related

Changes

Workflow model + runner (agent). A versioned YAML workflow declares ordered steps, domain, requires_repo, read_only, prompt, agent_config, hydration, and terminal outcomes. A step runner (agent/src/workflow/) interprets workflow.steps in-container; the scattered task_type branches become step handlers. The three shipped task types are now first-party workflow files (coding/new-task-v1, coding/pr-iteration-v1, coding/pr-review-v1) plus the platform default default/agent-v1. A cross-field validator + contracts/workflow-validation/ golden corpus gate authoring.

task_typeworkflow_ref (breaking, pre-1.0). The enum is removed end-to-end (API/CLI/agent); the request field is workflow_ref, recorded metadata is resolved_workflow {id, version}. Legacy callers migrate via the published one-to-one map (new_task → coding/new-task-v1, etc.). CDK↔CLI types kept in sync.

Cedar read-only migration (security, ADR-014 §Phase 2a). Read-only enforcement moved from a literal Agent::TaskAgent::"pr_review" principal match to a context.read_only == true rule, applied uniformly to every read-only workflow. Kept byte-for-byte identical across both Cedar bindings (agent/policies/hard_deny.cedar + cdk/.../builtin-policies.ts, drift-guard enforced) and locked by new contracts/cedar-parity/ fixtures verified on both the cedarpy (Python) and cedar-wasm (TypeScript) engines. Fails closed (missing attribute → DENY).

Repo-optional / repo-less tasks (Phase 3, acceptance criterion #4).

  • Admission: repo is optional across CDK + CLI types; required only when the resolved workflow's requires_repo is true (requires_repo:false = repo-optional, not forbidden). Onboarding/blueprint lookup runs only when a repo is present.
  • Pipeline: a repo-less task branches to _run_repoless_task, driving hydrate_context → run_agent → deliver_artifact through the workflow runner with no clone/build/PR. The coding path is unchanged.
  • deliver_artifact: a registered-deliverer dispatcher (agent/src/workflow/deliverers.py); the s3 deliverer uploads the agent's result text to artifacts/{task_id}/result.md (new SessionRole s3:PutObject grant scoped to artifacts/${task_id}/*, 5 MiB cap, ARTIFACTS_BUCKET_NAME env), surfaced on TaskDetail.artifact_uri; the comment deliverer records a delivered_comment milestone (event-stream visible; external-channel routing is a tracked follow-up).
  • Memory: repo-less tasks key long-term memory on user:{user_id} instead of repo.
  • A runnable reference knowledge workflow ships: agent/workflows/knowledge/web-research-v1.yaml.

Verification. agent 954 / cdk 1833 / cli 300 green; ty + ruff + ESLint + CDK↔CLI type-sync clean; mise run build PASS; cdk synth + cdk-nag clean. mise run security clean for this branch's changes (the 2 findings it surfaces — semgrep scripts/check-types-sync.ts, zizmor build.yml — are pre-existing on main, not introduced here; the semgrep one is tracked separately). Reviewed across four passes (correctness, silent-failure/fail-open, test coverage, type design, comments).

Reviewer notes / call-outs

  • Breaking change: removing task_type contradicts the issue's "legacy task_type alias" acceptance line — this is an intentional, ADR-documented hard cutover (no dual-field grace period). Please confirm this deviation is acceptable, or I'll restore an alias.
  • Out of scope / follow-ups: Phase 4 registry resolution (feat(registry): central agent asset registry for capabilities, skills, plugins, and MCP servers #246); rendering the delivered_comment milestone to an external channel (Slack/email/GitHub); post_review handler (no shipped workflow uses it).

Draft pending the breaking-change sign-off and a green CI run.

Some tests:

A1 — Coding (new task → PR) — the most common path; verifies #1 (bare submit still opens a PR)
example:

node lib/bin/bgagent.js submit --repo krokoko/agent-plugins \                                                                               
    --task "Update the rfc template to add a new codeowner file" \
    --workflow coding/new-task-v1"
Task 01KTQ49FYJYXB3SHGNY4GN21W0 — COMPLETED (4m 49s total)
  Repo:          krokoko/agent-plugins
  Channel:       api
  Description:   Update the rfc template to add a new codeowner file
  Turn:          25 / ~—
  Last milestone: pr_created (5s ago)
  Current:       task completed
  Cost:          $0.23 / budget —
  PR:            https://github.com/krokoko/agent-plugins/pull/101
  Last event:    2026-06-09T21:28:16.832506+00:00

A2 — Repo-less knowledge task — verifies #2 (CLI reachable) + #8 (research prompt) + repo-less lane

node lib/bin/bgagent.js submit --workflow knowledge/web-research-v1 --task "Compare gRPC vs REST for internal microservices; cite sources"
Task 01KTQBHRXGF7K6PNQ2T3411XYY — COMPLETED (2m 37s total)
  Repo:          — (repo-less)
  Channel:       api
  Workflow:      knowledge/web-research-v1
  Description:   Compare gRPC vs REST for internal microservices; cite
                 sources
  Turn:          49 / ~—
  Last milestone: agent_execution_complete (14s ago)
  Current:       task completed
  Cost:          $0.32 / budget —
  Artifact:      s3://XXXXXXX/artifacts/01KTQBHRXGF7K6PNQ2T3411XYY/result.md
  Last event:    2026-06-09T23:32:53.977286+00:00

The markdown file is generated and contains the result of the research
image

A3 — PR iteration & review (PR flags still work)

node lib/bin/bgagent.js submit --repo krokoko/agent-plugins --review-pr 102 # read-only review

Result: comments correctly added to krokoko/agent-plugins#102

node lib/bin/bgagent.js submit --repo krokoko/agent-plugins --pr 102 # address feedback

Result:

Task 01KTQCDSXD4ENRKSHCB8G6DFRK — COMPLETED (5m 37s total)
  Repo:          krokoko/agent-plugins
  Channel:       api
  Workflow:      coding/pr-iteration-v1 (PR #102)
  Turn:          16 / ~—
  Last milestone: pr_created (2m 06s ago)
  Current:       task completed
  Cost:          $0.18 / budget —
  PR:            https://github.com/krokoko/agent-plugins/pull/102
  Last event:    2026-06-09T23:51:37.508Z

A4 - Resolution-ladder / repo-optionality

node lib/bin/bgagent.js submit --workflow default/agent-v1 --task "Draft a 200-word summary of CQRS"
Task 01KTQCHW04E5GZ189SMDAWCRVC — COMPLETED (11s total)
  Repo:          — (repo-less)
  Channel:       api
  Workflow:      default/agent-v1
  Description:   Draft a 200-word summary of CQRS
  Turn:          2 / ~—
  Last milestone: agent_execution_complete (1m 37s ago)
  Current:       task completed
  Cost:          $0.05 / budget —
  Artifact:      s3://XXXXXXXX/artifacts/01KTQCHW04E5GZ189SMDAWCRVC/result.md
  Last event:    2026-06-09T23:48:17.547Z

Document created:
image

A5 - guardrails and validation (expected failures)

node lib/bin/bgagent.js submit --task "do something"                                                                                              ✔ 
Error: --repo is required (omit it only with --workflow for a repo-less workflow).
node lib/bin/bgagent.js submit --workflow default/agent-v123 --task "Draft a 200-word summary of CQRS"                                         
Error: Unknown workflow_ref "default/agent-v123". (VALIDATION_ERROR)
node lib/bin/bgagent.js submit --repo krokoko/agent-plugins --task 'xxx' --workflow coding/does-not-exist-v1                                  
Error: Unknown workflow_ref "coding/does-not-exist-v1". (VALIDATION_ERROR)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

bgagent and others added 24 commits June 8, 2026 16:41
- Validator parity: declare JSON Schema as the single canonical shape
  contract (consumed, not re-implemented), keep exactly one cross-field
  validator impl in Phases 1-3, and add a contracts/workflow-validation/
  golden corpus to lock any Phase-4 second impl to parity (mirrors the
  cedar-parity mechanism).
- Cedar principal migration: split into its own isolated PR (Phase 2a)
  reviewed alone with regenerated parity fixtures, landing ahead of the
  pr_iteration/pr_review workflow migrations (Phase 2b) that depend on it.
- Repo-optionality: reframe web_research as a not-yet-runnable schema
  fixture (not an acceptance proof) and require the two blocking open
  questions (memory actorId, artifact delivery) to be resolved as a
  recorded ADR addendum before the Phase-0 schema is frozen.

Regenerated Starlight mirrors.
Add the agent/src/workflow/ package: Pydantic models mirroring
workflow.schema.json and a loader that resolves a first-party workflow
id, parses YAML, and shape-validates against the canonical JSON Schema.
Additive only — does not yet touch task_type (the breaking cutover lands
later in Phase 1).

- models.py: Workflow + sub-models; resolved_requires_repo applies the
  domain-derived default (coding=true, knowledge/hybrid=false).
- loader.py: JSON-Schema shape validation (one canonical contract,
  consumed not re-implemented), YAML load, id/path agreement, and
  path-traversal defense. Cross-field rules deliberately live in the
  separate validator (Task #2), so there is one cross-field impl.
- Promote pyyaml + jsonschema to direct deps (were only transitive); the
  loader must not depend on another package's transitive pin.
- Fix a latent schema bug found by the new tests: the requires_repo/
  read_only allOf conditionals fired vacuously when the property was
  absent, wrongly applying repo-less constraints to a coding workflow
  relying on the domain default. Guard each `if` with `required`.

17 new tests; agent ruff + ty + full suite (836) green.
…se 1)

The single CI-time implementation of the WORKFLOWS.md cross-field rules
the JSON Schema cannot express (rules 1-9, 11, 12, 14). The runtime loader
stays shape-only and trusts this verdict, so there is exactly one cross-field
implementation in Phases 1-3 (avoids the cedar-style two-language drift).

contracts/workflow-validation/ ships the golden parity corpus from Phase 1 so
the expected-verdict contract is fixed before #246 adds a second validator,
mirroring contracts/cedar-parity/. test_workflow_tree_valid.py gates every
first-party workflow file through the validator at CI time.
The agent-side step runner that interprets workflow.steps inside the container
(ADR-014): a StepKind→handler registry, in-order execution honouring each step's
on_failure policy (fail / continue / skip_remaining), per-step
step:<name>:start/<status> progress milestones, and resume-aware checkpointing
(deterministic side-effect-free steps recorded to workflow_state.json on the
persistent mount are skipped on resume; agentic/side-effecting steps re-run
idempotently per WORKFLOWS.md §Step execution semantics).

Handlers are thin wrappers over the existing helpers (setup_repo, run_agent,
verify_build/lint, ensure_pr) so this is maximal structural change, minimal
logic change. post_review/deliver_artifact are registered but raise
NotImplementedError (Phase 2b / Phase 3) — fail-loud, not silent no-op, keeping
validator rule-8 handler parity honest. Orchestration core is unit-tested with
fakes; wiring into pipeline.run_task is task 5.
…t/agent-v1 (#248 Phase 1)

Migrates the new_task task type to coding/new-task-v1.yaml (the heavyweight
clone→build→open-PR coding path) and ships the platform default workflow
default/agent-v1.yaml — the conservative last rung of the resolution ladder
used when no workflow_ref and no Blueprint default apply (requires_repo:false,
read-leaning tool set, deliver-as-comment). pr_iteration/pr_review migrations
are Phase 2b (depend on the Cedar principal migration).

Both pass the cross-field validator and the test_workflow_tree_valid CI gate
(file path agrees with declared id; zero violations).
…er (#248 Phase 1)

Fixes the group-A bugs surfaced by the high-recall code review of the Phase 1
commits:

runner.py
- Resume-skip product loss: clone_repo/hydrate_context populate in-memory
  products (ctx.setup, ctx.user_prompt) that can't be rebuilt from the JSON
  checkpoint, so skipping them on resume left ctx.setup None (breaking ensure_pr)
  and ctx.user_prompt empty (unguided agent). Narrow _RESUMABLE_SKIP_KINDS to
  verify_build/verify_lint — steps whose ENTIRE product is the checkpointed
  boolean re-applied via ctx.artifacts. clone_repo/hydrate_context now re-run on
  resume (handler-level idempotency), matching WORKFLOWS.md's intent.
- Skipped steps now emit step:<name>:start / :skipped milestones so a watcher
  sees them accounted for instead of a gap.

validator.py
- _HANDLER_KINDS (rule 8) is now derived from runner.STEP_HANDLERS — the single
  source of truth — instead of a hand-copied list that could silently drift.
- rule-6 tier ceiling now applies the elevated-only-fields check to read-only
  (the strictest tier) as well as standard; previously read-only could declare
  mcp_servers/plugins/skills unchecked.
- rule-11 now verifies a deliver_artifact step's target actually produces the
  declared comment/artifact outcome, not merely that the step kind is present.

Tests + corpus updated; rule3/rule7 fixtures switched to s3_and_comment target
to stay isolated to their intended rule, plus a new rule11 target-mismatch
fixture. Full agent suite green (909 passed), ruff + ty clean.
…hase 1, task 5)

Builds the runner→pipeline seam and gates it off (default production behavior
unchanged). pipeline.run_task's agent invocation now goes through
_execute_agent_step: when WORKFLOW_RUNNER_ENABLED is set AND task_type maps to a
shipped workflow (only new_task→coding/new-task-v1 in Phase 1), the single
run_agent step is dispatched through workflow.run_workflow — exercising the real
handler registry, step milestones, and result threading — while clone, context
assembly, and post-hooks stay on the proven inline path. Otherwise it is exactly
the legacy asyncio.run(run_agent(...)) call. pr_iteration/pr_review have no
workflow file until Phase 2b, so they fall back to inline.

run_workflow gains an only_kinds filter so the seam drives just the run_agent
step (no double clone / double PR against the inline post-hooks). The flag is
flipped to default-on in the tasks 6-8 cutover.

Folds in the group-B code-review fixes (effective when the seam is enabled):
- system prompt: hydrate_context builds ctx.system_prompt via the existing
  build_system_prompt (was empty), reusing pipeline's logic rather than
  reimplementing it.
- verify gate: new shared _gate_status gives verify_build/verify_lint matching
  semantics — informational/read_only never gate, regression_only consults
  build_before/lint_before (mirrors pipeline's build_ok = passed or not
  build_before), strict always gates. Fixes the regression_only build bug and
  the verify_lint read_only asymmetry.
- clone_repo is idempotent (reuses a pre-populated ctx.setup).
- StepContext threads the --trace trajectory into run_agent (was dropped).

ensure_pr strategy reconciliation stays deferred to task 8 (it requires removing
the task_type branch inside ensure_pr); recorded in local-docs. Full agent suite
green (930 passed), ruff + ty clean.
…#248 Phase 1)

- Error-handling contract (highest severity): when the run_agent step's handler
  raises, run_workflow captures it into a failed StepOutcome rather than
  propagating. _execute_agent_step now RE-RAISES in that case so run_task's
  except block restores full fidelity — the log_error_cw APPLICATION_LOGS mirror
  (read by TaskDashboard / bgagent status), the span error, and the real
  exception text — instead of silently downgrading to a generic AgentResult.
  This also makes the previously-dead agent_result-is-None branch live.

- _gate_status: an unset gate now defaults to regression_only semantics (gate
  only a regression; a pre-existing failure does not gate), matching pipeline.py
  which is unconditionally 'build_ok = passed or not build_before'. Previously an
  unset gate fell through to strict, contradicting the docstring's 'mirrors
  pipeline.py' claim and would wrongly fail a broken-before repo.

- run_agent handler fails loud on an empty system prompt rather than running an
  unguided agent loop (repo-less prompt assembly is Phase 3; this turns the gap
  into an attributable failed step instead of a silent context-free run).

- _workflow_id_for_task_type: documented as a transitional bridge tied to the
  canonical task_type→workflow table in WORKFLOWS.md / API_CONTRACT.md, to be
  kept in sync until tasks 6-8 delete it.

Tests added for each behavior change. Full agent suite green (934 passed),
ruff + ty clean.
…248 tasks 6-7)

Breaking API change (no alias): task_type is removed end-to-end; workflow_ref
selects a workflow and resolves to a pinned {id, version} at the create-task
boundary.

CDK/CLI types (task 6, in lockstep — check:types-sync):
- Remove TaskType + isPrTaskType; add ResolvedWorkflow.
- CreateTaskRequest.workflow_ref?; TaskRecord/TaskDetail/TaskSummary
  resolved_workflow (+ mappers).
- CLI: --pr/--review-pr now set workflow_ref (coding/pr-iteration-v1,
  coding/pr-review-v1); new --workflow flag; format reads resolved_workflow.id.

CDK resolution boundary (task 7):
- New workflows.ts: CDK-side descriptor mirror of agent/workflows/** +
  resolveWorkflowRef (ladder: explicit ref → default/agent-v1), isValidWorkflowRef,
  requires_repo/read_only/uses_pr accessors. Drift-guard test keeps the table in
  sync with the shipped YAML.
- validation.ts: drop VALID_TASK_TYPES/isValidTaskType; hasTaskSpec now takes the
  resolved workflow's required_inputs contract.
- create-task-core.ts: resolve workflow, validate inputs against the contract,
  persist workflow_ref + resolved_workflow. Repo stays required for ALL workflows
  in Phase 1 (repo-less admission is Phase 3).
- preflight.ts: taskType param → readOnly + requiresRepo; repo-less short-circuits
  to passed (seam for Phase 3).
- orchestrate-task/context-hydration/orchestrator/ecs-strategy: derive
  read_only/requires_repo/PR-ness from resolved_workflow; payload swaps task_type
  → resolved_workflow. ecs bootCommand passes resolved_workflow (lockstep with
  agent run_task in task 8).

CLI 294 tests + CDK suite green; check:types-sync OK. Agent cutover (task 8) lands
next to consume resolved_workflow.
…ask 8)

Removes the Python TaskType enum / PR_TASK_TYPES / _PROMPTS-by-task_type and the
gated seam; the workflow runner is now the sole agentic path, driven by the
resolved_workflow {id, version} threaded from the create-task boundary.

- models.py: delete TaskType enum; TaskConfig gains resolved_workflow,
  policy_principal, is_pr_workflow (drops task_type).
- config.py: build_config takes resolved_workflow (not task_type); derives
  policy_principal via workflow.policy_principal_for and is_pr_workflow; PR_TASK_TYPES
  → PR_WORKFLOW_IDS.
- Cedar UNCHANGED (Phase 2a owns the principal migration): the
  Agent::TaskAgent::"<id>" scheme and contracts/cedar-parity/ are untouched.
  policy_principal_for maps read_only⇒pr_review, else id→legacy {new_task,
  pr_iteration, pr_review}. Only policy.py edit: drop the WARN-only TaskType()
  validation. runner.py passes config.policy_principal as the principal. All
  test_policy* pass unchanged.
- prompts rekeyed by workflow id (get_system_prompt falls back to the default
  coding prompt for an unknown id rather than raising).
- pipeline.py: _execute_agent_step loads from resolved_workflow, always-on;
  the 3 task_type=='pr_review' branches → workflow.read_only; ensure_pr takes an
  explicit strategy (create/push_resolve/resolve) from the workflow step,
  resolving the deferred code-review item.
- post_hooks.ensure_pr: strategy param replaces task_type self-inspection.
  repo.py: PR-branch resume keys off config.is_pr_workflow.
- server.py/entrypoint.py: thread resolved_workflow; validation keyed on PR
  workflow ids.
- New faithful Phase-1 workflow files coding/pr-iteration-v1 (push_resolve) +
  coding/pr-review-v1 (read_only, resolve, no Write/Edit per rule 4). All 4
  first-party files validate; CDK descriptor drift-guard green.
- docs: API_CONTRACT.md migration table + resolved_workflow responses; Starlight
  mirror synced.

Agent suite 922 passed, ruff + ty clean. With tasks 6-7 (e829c4f) this completes
the breaking cutover across API + CLI + agent.
)

The full build was never completed on this branch; ty type-checking and ruff
format gates were red. No src logic changes — type/format hygiene only.

- Tests used mypy-style `# type: ignore[arg-type]`, but the project type
  checker is ty (`# ty: ignore[rule]`), so the suppression silently no-op'd.
  - test_workflow_runner.py: annotate inline step handlers as
    StepHandler-compatible `(step: Step, ctx: StepContext) -> StepOutcome`;
    type the `handlers` dicts as `dict[str, StepHandler]` (ty treats the alias
    params as positional-only → dict-value invariance rejects named-param
    funcs); convert the stale ignore to `# ty: ignore[invalid-argument-type]`.
  - test_entrypoint.py: assert `resolved_workflow is not None` before
    subscripting (TaskConfig.resolved_workflow is `dict | None`).
- Apply ruff format reflows (runner.py, validator.py,
  test_workflow_tree_valid.py) the branch was committed without.
…248 Phase 2a)

Migrate read-only enforcement from the literal `Agent::TaskAgent::"pr_review"`
principal match to the `context.read_only` attribute, so the Write/Edit
hard-deny attaches to the *property* and fires for every read-only workflow
uniformly — not just coding/pr-review. Removes the Phase-2b principal bridge.

This is the security-load-bearing step: an error here *silently weakens*
enforcement (the rule stops matching) rather than failing loudly. Guarded by
new contracts/cedar-parity/ golden fixtures verified on BOTH the cedarpy
(Python) and cedar-wasm (TypeScript) engines.

Policy (both bindings, kept byte-for-byte identical — drift guard enforces):
- agent/policies/hard_deny.cedar + cdk/.../builtin-policies.ts:
  pr_review_forbid_write/edit (principal == "pr_review") →
  read_only_forbid_write/edit (when context.read_only == true, any principal).

Wiring:
- policy.py: PolicyEngine gains read_only kwarg; threads read_only into every
  Cedar request context (probe + base_context).
- models.py: TaskConfig.read_only; config.py derives it from the resolved
  workflow; runner.py passes config.read_only to PolicyEngine.
- workflow/loader.py: remove the read_only ⇒ "pr_review" bridge in
  policy_principal_for — the principal is now an identity/audit tag only;
  pr-review keeps its own id-derived principal.

Parity fixtures (new): read-only-forbid-write, read-only-forbid-edit
(read_only=true ⇒ deny), read-only-false-permits-write (read_only=false ⇒
base_permit — proves the deny is gated on the property, not always-on).

Tests: TestPrReviewPermissions → TestReadOnlyPermissions, now asserting the
deny fires for any read-only principal AND that read_only=false permits Write;
test_hooks + test_config updated. agent 927 / cdk 1822 green; ty + ruff clean.

Docs: ADR-014 addendum (2026-06-08) records dropping the isolated-PR/ordering
requirement (2b shipped first behind the bridge, so read-only was never
unprotected); WORKFLOWS.md §"Replacing the Cedar principal" + phasing table
updated to past tense.
…#248)

ADR-014 addendum (2026-06-08) settles the two open questions that gated the
Phase-0 schema freeze (WORKFLOWS.md open questions #1, #2). With the one implied
schema reshape applied, the schema is frozen — later phases add handlers and
plumbing, not schema fields.

Decision 1 — Memory actorId for repo-less tasks: per-user user:{cognito_sub}.
Caller-scoped (no cross-tenant bleed; mirrors the per-user trace prefix); cross-
workflow pooling explicitly not adopted. NO schema field — a fixed platform
fallback, applied in memory.py in Phase 3 when repo is absent.

Decision 2 — Artifact delivery: named Python deliverers (open target string),
shared S3 plumbing pinned now. deliver_artifact.target resolves to a registered
deliverer (workflow/deliverers.py → DELIVERERS), same pattern as STEP_HANDLERS —
a new delivery method is a registered deliverer, not a schema change. Plumbing
frozen: task-scoped key artifacts/{task_id}/, prefix-scoped SessionRole grant,
size limit, TaskDetail URL surfacing, workflow:{id} repo-less tenant tag.

Schema reshape applied:
- workflow.schema.json: steps[].target drops its enum (stays type:string,
  minLength:1); the closed set moves into the deliverer registry.
- models.py: DeliverTarget Literal → str.
- New workflow/deliverers.py: Deliverer dataclass + DELIVERERS registry +
  produced_outcomes(); single source of truth for "what each target produces."
- validator.py rule 11: consults the registry (_deliverer_produces) instead of
  the hardcoded _DELIVER_TARGET_OUTCOMES map. The three first-party names keep
  their exact produced-outcome sets, so NO existing workflow / fixture / golden
  vector changes verdict; an unknown deliverer name now produces nothing (new
  guard, tested).

Tests: test_workflow_deliverers.py (registry contract) + a rule-11
unknown-deliverer case. agent 932 green; ty + ruff clean. deliver_artifact
remains a NotImplementedError stub until Phase 3 — only the contract is frozen.
…Phase 3)

First slice of Phase 3 (repo-optional tasks): the admission path now accepts a
submission with no repo when the resolved workflow does not require one, so a
repo-less knowledge workflow (e.g. default/agent-v1) can be admitted, hydrated,
and run from task_description alone. The deliverer/S3 + agent-runtime clone-skip
slices follow separately.

Semantics: requires_repo decides whether a repo is MANDATORY; requires_repo:false
means repo-OPTIONAL (a repo-less workflow may still target a repo). The repo-less
behavior keys off repo *absence*, not the workflow flag — a repo-optional workflow
given a repo still takes the repo-bound path.

Types (CDK + CLI mirror, kept in sync — check:types-sync green):
- CreateTaskRequest.repo, TaskRecord.repo → optional; TaskDetail.repo,
  TaskSummary.repo → string | null. Mappers null-coalesce.

Admission (create-task-core.ts): resolve the workflow FIRST, then gate repo on
workflow.requiresRepo (missing repo on a repo-bound workflow → 400). Onboarding +
blueprint-cap lookup runs only when a repo is present; repo-less takes the platform
default cap. Record omits repo when absent; replay-required-fields drops 'repo';
event metadata null-coalesces.

Hydration (context-hydration.ts): new repo-less branch (keyed on !task.repo) that
assembles from task_description + per-user memory, returning before the repo-bound
fetch path; assembleUserPrompt drops the Repository: line when repo is absent.

Memory (memory.ts + orchestrator.ts): loadMemoryContext/writeMinimalEpisode take
an actorNamespace (repo for coding, user:{user_id} for repo-less — ADR-014 addendum).

Preflight already honored requiresRepo (Phase-1 seam) — widened its repo param to
string|undefined. fanout-task-events skips the GitHub channel for a repo-less task.
loadBlueprintConfig skips the RepoTable lookup when repo is absent.

Agent (config.py): build_config loads the workflow up-front and requires
repo_url/github_token only when requires_repo; repo-less runs from
task_description. CLI shows '—' for a repo-less task.

Tests: repo-less acceptance (create-task-core, create-task, context-hydration,
CLI format) + repo-bound-missing-repo rejection; memory test log key repo →
actor_namespace. agent 932 / cdk 1825 / cli 296 green; ty + ruff + eslint clean.

deliver_artifact is still a NotImplementedError stub — a repo-less task can be
admitted and hydrated but not yet deliver an artifact (next slice).
…ase 3)

Second Phase-3 slice: a repo-less workflow now runs the agent loop in-container
with no clone / build / PR. When config.requires_repo is false, run_task delegates
to _run_repoless_task, which drives the workflow's steps (hydrate_context →
run_agent) through the workflow step runner — coding vs knowledge now differ by
the workflow's steps list, per ADR-014's end-state — and assembles a terminal
TaskResult with pr_url=None.

Scope boundary (deliberate): only hydrate_context + run_agent are driven here.
deliver_artifact is the workflow's terminal step but its handler is a stub until
the artifacts-bucket S3/IAM contract ships (next slice), so it is excluded via
only_kinds; delivery is deferred, the agent run is not. The coding path is left
byte-identical — its hard-won inline cancel-safety (skip ensure_pr on a cancelled
task) is NOT yet runner-driven, so routing the full coding step list through the
runner is a separate cancel-aware change, intentionally not done here.

- models.py: TaskConfig.requires_repo (defaults True; coding unaffected).
  config.py build_config sets it from the resolved workflow.
- pipeline.py: requires_repo branch before the repo-coupled segment;
  _run_repoless_task runs the runner + assembles/persists the terminal result.
  run_task repo_url now defaults to "" (repo-less callers omit it).
- prompts/default_agent.py: repo-less system prompt (no git/branch/PR
  placeholders); registered for default/agent-v1.
- prompt_builder.build_repoless_system_prompt + shared _render_memory_context.

Tests: repo-less pipeline run (no setup_repo, no PR, success, repo-less prompt);
repo-less prompt has no repo placeholders. agent 934 green; ty + ruff clean.
Findings from the plugin review (code-reviewer, silent-failure-hunter,
pr-test-analyzer, type-design-analyzer) on 4c34c8b..HEAD.

HIGH — repo-less task reported COMPLETED while delivering nothing
(silent-failure-hunter). _run_repoless_task skips deliver_artifact (stub until
the artifacts contract ships), but the workflow's terminal outcome (e.g.
`comment`) was never produced, yet status was success. Now: if the primary
terminal outcome is delivery-backed and delivery was skipped, the task is a loud
FAILED with an explicit error + delivery_deferred milestone naming the gap.

MEDIUM — config.py load-failure fallback bug + fail-open (code-reviewer,
silent-failure-hunter). The WorkflowValidationError branch compared workflow_id
against DEFAULT_WORKFLOW_ID (the *coding* default) for requires_repo, contradicting
its comment, and defaulted read_only=False (fail-open) for unknown ids. Now:
log the failure (was silent); fail CLOSED — read_only=True for any id not in
_KNOWN_WRITEABLE_WORKFLOW_IDS; requires_repo keyed off the real repo-less default
(REPO_LESS_DEFAULT_WORKFLOW_ID = default/agent-v1).

MEDIUM — TaskConfig missing cross-field invariant (type-design-analyzer). Added
_validate_requires_repo_has_repo (mirrors _validate_trace_requires_user_id):
requires_repo=True ⇒ non-empty repo_url, enforced at construction. repo_url and
github_token now default to "" so a repo-less TaskConfig is constructible; the
validator preserves the repo-bound invariant.

LOW/secondary — rule-8 now also checks deliver_artifact.target resolves in
DELIVERERS (type-design-analyzer), so an unknown target is caught universally,
not only when it collides with the primary outcome.

Test gaps closed (pr-test-analyzer): repo-OPTIONAL workflow given a valid repo
(admission + hydration); repo-less memory keyed user:{user_id} (asserted);
malformed-repo-on-repo-bound rejection; preflight repo-less short-circuit +
repo-bound-no-repo invariant; repo-less agent_result=None → FAILED; config
load-failure fail-closed; TaskConfig validator; rule-8 deliver target.

agent 941 / cdk 1831 / cli 296 green; ty + ruff + type-sync clean.
…#4 (#248 Phase 3)

Final Phase-3 slice: a repo-less knowledge task now runs end-to-end — admitted,
hydrated, agent loop, AND delivers its output. Closes acceptance criterion #4
(a non-coding task runs without a repo) for the s3/comment deliverers.

Infra (IAM + env):
- agent-session-role.ts: s3:PutObject grant scoped to
  artifacts/${aws:PrincipalTag/task_id}/* on the trace-artifacts bucket (mirrors
  the traces/ pattern; task_id-scoped per ADR-014 addendum). cdk-nag reason updated.
- agent.ts: ARTIFACTS_BUCKET_NAME runtime env (same bucket as traces).

Deliverers (agent/src/workflow/deliverers.py):
- DELIVERERS gains a deliver(target, ctx) dispatcher + DeliveryResult. The s3
  deliverer uploads the agent's result text to artifacts/{task_id}/result.md via
  the tenant-scoped S3 client (5 MiB cap); the comment deliverer records a
  delivered_comment milestone for the channel fanout; s3_and_comment does both.
- AgentResult.result_text captures ResultMessage.result (the knowledge-task
  deliverable); runner.py populates it on success.
- runner._handle_deliver_artifact: now implemented (was NotImplementedError),
  routes through deliver(); a delivery failure is a failed step (terminal FAILED),
  never a silent skip.

Pipeline (pipeline.py):
- _run_repoless_task drives the FULL step list (deliver_artifact included);
  replaces the delivery-deferred FAILED gate with the real runner verdict (agent
  succeeded but workflow failed ⇒ attributed FAILED). Sets TaskResult.artifact_uri.
- Repo-less episodic memory write keyed user:{user_id} (fail-open).

Memory (memory.py): write_task_episode repo→actor param; _validate_actor accepts
owner/repo OR user:{id} so the same write path serves coding + knowledge tasks.

Types: TaskResult/TaskRecord/TaskDetail + CLI mirror gain artifact_uri (mirrors
trace_s3_uri); CLI detail view shows an Artifact: line. check:types-sync green.

Tests: deliverer dispatch (s3 upload + key/body, comment milestone, s3_and_comment,
missing bucket, empty text, oversize); repo-less pipeline now succeeds + sets
artifact_uri=null for comment-only; session-role artifacts grant; deliver_artifact
dropped from the fail-loud stub list. agent 947 / cdk 1832 / cli 296 green; synth +
cdk-nag clean; ty + ruff + type-sync clean.
…orkflow (#248 Phase 3)

Three should-fix items closing the gap where "criterion #4" was met by machinery
but the default repo-less path delivered to a channel that isn't wired.

1. Stale docstring: _run_repoless_task said deliver_artifact was a deferred stub
   — it ships now. Updated to describe the real full-step-list run.

2. Default workflow delivers retrievably. The channel fan-out only knows
   slack/email/github; an api-origin repo-less task (the common default case) has
   no channel, so a `comment`-only deliverer surfaced nothing the user could get.
   default/agent-v1 now uses `target: s3_and_comment` (primary outcome `artifact`):
   the S3 upload to artifacts/{task_id}/ is always retrievable, the comment
   milestone is still rendered by the fan-out when a channel exists.

3. Ship a runnable knowledge workflow: agent/workflows/knowledge/web-research-v1.yaml
   — the concrete repo-less demonstration (research → S3 artifact, no repo).
   Built-in capabilities only (registry MCP/skills are Phase 4); CDK descriptor
   mirror + drift-guard + resolver tests updated.

Also fixes a latent bug the knowledge workflow surfaced: a repo-less id with no
registered prompt fell back to the CODING prompt (leaking unsubstitutable
{repo_url}/{branch_name}). get_system_prompt gains repo_less= so repo-less
workflows fall back to the repo-less default-agent prompt instead.

Tests: repo-less pipeline now asserts artifact_uri set (s3_and_comment);
repo-less prompt fallback; knowledge workflow validates/resolves. agent 949 /
cdk 1833 green; ty + ruff + type-sync clean; docs synced.
…ups (#248)

Findings from the four-reviewer full-branch pass. Two were blockers that would
have shipped the repo-less feature broken despite green unit tests (the tests
didn't cross the server/task_state boundaries).

BLOCKER 1 — repo-less task rejected on the AgentCore backend. server.py
_validate_required_params required repo_url unconditionally, so /invocations
returned 400 for every repo-less task before the pipeline ran (ECS backend
dodged it by calling run_task directly). Now gates repo_url on the workflow's
requires_repo (resolved via load_workflow; fails SAFE — assume required — on a
load error). Test: repo-less payload accepted, repo-bound still requires repo.

BLOCKER 2 — artifact_uri computed but never persisted. task_state.write_terminal's
field allowlist had no artifact_uri branch, so the URI was dropped before
DynamoDB and TaskDetail.artifact_uri was always null (delivery not retrievable
despite the S3 object existing). Added the persist branch + tests.

Secondary:
- comment deliverer no longer overstates: _post_comment docstring + WORKFLOWS.md
  state plainly that delivered_comment is recorded for the event stream (visible
  in `bgagent watch`) but is NOT yet routed to an external channel (not in the
  fan-out ROUTABLE_MILESTONES). comment_posted=True only when actually recorded.
- test pins the config→engine read_only seam (runner.py PolicyEngine(read_only=)):
  dropping it now fails a test instead of silently disabling the Write/Edit deny.
- stale phase-boundary comments corrected (post_review docstring, runner module
  docstring + only_kinds + run_agent guard, pr-review/pr-iteration YAML headers)
  — they described Phase 2a/2b/3 as pending when those shipped on this branch.
- ADR-014: replaced the dangling pre-rebase commit hash 838c72e (unreachable
  after rebase) with a descriptive reference.

agent 954 / cdk 1833 green; ty + ruff + type-sync clean; docs synced.
The Artifact: display lines added to formatTaskDetail / formatStatusSnapshot
introduced uncovered truthy branches that dropped CLI branch coverage below the
84% gate (full `mise run build` caught it; the per-package run had passed on the
prior content). Adds non-null + null cases for both renderers, mirroring the
existing trace_s3_uri tests. cli 300 passed; full build green.
CI: ruff-format the 4 files that tripped the build self-mutation
(config.py, deliverers.py, test_pipeline.py, test_task_state.py) so
"Fail build on mutation" passes.

Review findings:
- prompts/__init__.py: get_system_prompt now logs WARN when an id has no
  registered prompt and falls back, restoring the signal the old
  build_system_prompt emitted (silent-failure-hunter).
- pipeline._run_repoless_task: chain wf_result.failed_step.error into the
  synthesized AgentResult so a run_agent failure on the repo-less path is
  diagnosable instead of a generic message (code-reviewer).
- server._validate_required_params: narrow the broad `except Exception`
  around load_workflow to WorkflowValidationError/ImportError so a genuine
  programming error surfaces loudly instead of being masked as "could not
  resolve requires_repo" (still fails safe — repo required).
- workflows.ts workflowIsReadOnly: document why the unknown-id default is
  `false` (conservative admission: needsWrite=!readOnly) and must NOT be
  "aligned" to the agent-side fail-closed read-only default.
- runner.py / pr-review-v1.yaml: comment fixes — artifact_key→artifact_uri,
  drop never-written review_posted, rule 4→6, post_review handler is
  registered-but-unimplemented not "no shipped handler" (comment-analyzer).

Tests (pr-test-analyzer gaps):
- test_memory.py: cover _validate_actor (user:{id} namespace + rejects) and
  a write_task_episode(actor="user:...") reaching create_event.
- test_pipeline.py: cover the repo-less delivery-gate-failure branch (agent
  succeeds but deliver_artifact fails → terminal FAILED naming the step).

Docs: ROADMAP.md "Task types" now reflects workflow-driven tasks + repo-less
knowledge workflows; Starlight mirror regenerated.

agent 962 passed, ruff+ty clean, cdk compile+workflows tests green.
…docs (#248)

The Dockerfile copied agent/src, agent/policies, and contracts but never
agent/workflows, so the runtime loader (which resolves /app/workflows/<domain>/
<name>-vN.yaml) failed every workflow load with WorkflowValidationError. Add
COPY agent/workflows/ /app/workflows/ so the five shipped workflows + the JSON
schema land in the image. Verified in a local build: load_workflow(
'coding/new-task-v1') resolves to 1.0.0.

Also bring the user-facing guides in line with the workflow-driven model:
USER_GUIDE/QUICK_START now document workflow_ref, the --workflow flag, and
resolved_workflow responses (the retired task_type is shown only as a migration
mapping). Rename the "Task types" section to "Workflows", update the Starlight
sync map + sidebar slug, and regenerate the mirrors.
@isadeks

isadeks commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Scope: 113 files, +9029/−596. Replaces the hardcoded task_type enum with declarative YAML workflows, adds repo-less tasks, an S3 deliverer, and migrates Cedar read-only enforcement to a context.read_only rule.

This is a well-structured PR with strong test coverage and thoughtful docs. The findings below are concentrated in the resolution ladder / repo-optionality seam (where CDK admission, agent pipeline, and CLI disagree about what a default or repo-less task means) and the half-migrated workflow runner (declarative steps/gate/allowed_tools partly ignored on the coding path).


Findings (ranked, most severe first)

1. ⚠️ Bare bgagent submit --repo --task silently stops producing PRs

cli/src/commands/submit.ts:183 + cdk/src/handlers/shared/workflows.ts:131

The old task_type default was new_task (clone → build → open PR). The CLI now only maps --prpr-iteration and --review-prpr-review; a bare submit leaves workflow_ref undefined. The server's resolution ladder (resolveWorkflowRef(undefined)) falls back to default/agent-v1requires_repo:false, repo-less, Read/Glob/Grep/WebFetch only, S3 artifact — not coding/new-task-v1. The documented new_task → coding/new-task-v1 mapping exists only in a comment; the Blueprint-default rung is "Phase 4, not yet wired."

Failure: bgagent submit --repo owner/repo --task "implement X" — previously opened a PR, now produces an S3 markdown artifact with no clone/build/PR. The most common invocation regresses silently. Either map a repo-present default to coding/new-task-v1, or block bare submit.

2. ⚠️ Repo-less workflows are unreachable from the CLI (--repo is required)

cli/src/commands/submit.ts:62

--repo is a .requiredOption(...), but this PR makes repo optional server-side for requires_repo:false workflows. bgagent submit --workflow knowledge/web-research-v1 --task "..." is rejected by commander before a request is built, even though create-task-core.ts:128 would accept it. The headline repo-less feature can't be exercised through the CLI.

3. ⚠️ Repo-optional workflow given a repo: the agent silently drops it

agent/src/pipeline.py:645 vs cdk/src/handlers/shared/context-hydration.ts

Repo-optionality is decided on two disagreeing axes. The orchestrator's hydration keys off task.repo presence (repo present ⇒ repo-bound prompt, issue/PR fetch), but pipeline.run_task keys off config.requires_repo (the workflow flag) to choose _run_repoless_task. create-task-core.ts:128 explicitly allows requires_repo:false + a supplied repo.

Failure: submit default/agent-v1 --repo owner/repo. CDK accepts and persists the repo; the orchestrator assembles a repo-bound prompt; the agent sees requires_repo == False, takes _run_repoless_task (no clone, no build, no PR), and the prompt promises a repo the agent never clones. The two halves disagree on whether the repo is in play.

4. ⚠️ Per-workflow tool ceiling is never enforced — allowed_tools is hardcoded

agent/src/runner.py:323

allowed_tools = ["Bash", "Read", "Write", "Edit", "Glob", "Grep", "WebFetch"] is a fixed list; the workflow's agent_config.allowed_tools is ignored. The schema/YAML claim this list "Replaces the hardcoded list in runner.py" and that excluding Write/Edit/Bash is the "defense in depth" second layer.

Failure: default/agent-v1 declares [Read, Glob, Grep, WebFetch] (no mutation, by design) and is read_only:false, so the Cedar hard-deny (context.read_only == true) does not fire. A default-workflow task that should be incapable of mutation can Write/Edit/Bash, gated only by soft-deny HITL. The intended hard tool-restriction is gone. (Read-only workflows are still protected by the context.read_only hard-deny — that invariant survives; the per-workflow ceiling does not.)

5. Post-hook load_workflow lacks the fail-soft fallback build_config uses → task FAILED after the tree is already mutated

agent/src/pipeline.py:853

build_config (config.py:382) wraps load_workflow in a WorkflowValidationError fallback (id-derived read_only/requires_repo) so a load failure still runs the task. The post-hook reload at pipeline.py:853 is not wrapped — it's only caught by the broad terminal except Exception.

Failure: a workflow file that build_config couldn't load took the fallback and the agent ran the full loop (possibly committing). At finalize, load_workflow raises again; instead of falling back, the task is marked FAILED with the work stranded (no PR/commit resolution) — inconsistent with build_config's identical-failure handling.

6. resolveWorkflowRef silently discards a @version constraint

cdk/src/handlers/shared/workflows.ts:147

isValidWorkflowRef accepts @<constraint> syntactically (regex (@[^\s]+)?), but resolveWorkflowRef does ref.split('@', 1)[0] (JS limit=1 returns only the first element) and always pins to the single shipped version.

Failure: workflow_ref: "coding/new-task-v1@2.0.0" passes validation, @2.0.0 is dropped, the task runs 1.0.0 with no warning or error. When a v2 ships with different behavior, callers pinned to @2.0.0 silently keep getting v1. An unsatisfiable pin should 400, not be discarded.

7. deliver_artifact with an unset target: validator is lenient but runtime defaults to s3 → declared comment outcome never delivered, yet task reports success

agent/src/workflow/runner.py:625 vs agent/src/workflow/deliverers.py:170

produced_outcomes(None) returns the full DELIVER_OUTCOMES set (incl. comment), so a workflow declaring terminal_outcomes.primary: comment with a deliver_artifact step that omits target passes validation (rule 11). At runtime _handle_deliver_artifact does target = step.target or "s3" → only _upload_to_s3 runs (produces={'artifact'}), never _post_comment.

Failure: the declared terminal deliverable (comment) is never produced, but the S3 upload succeeds so succeeded=True — a silent "succeeded with the declared product never delivered." The lenient validator default and the runtime s3 default must agree.

8. knowledge/web-research-v1 has no registered prompt → silent fallback to the generic default-agent prompt

agent/src/prompts/__init__.py:24

CDK accepts knowledge/web-research-v1 as valid and the YAML ships in the image, but _PROMPTS has no entry for that id. get_system_prompt logs a WARN and falls back to default/agent-v1's prompt.

Failure: a task submitted with the reference knowledge workflow runs with the generic default-agent system prompt instead of a research-specialized one — the workflow silently degrades to generic behavior with no error surfaced to the user. (CDK admission and agent capability diverge.)

9. S3 artifact 5 MiB cap is checked after the full result is encoded → cap never bounds memory

agent/src/workflow/deliverers.py:77

_artifact_body does body = text.encode("utf-8") (a second full copy of ctx.agent_result.result_text, already resident) and only then checks len(body) > MAX_ARTIFACT_BYTES.

Failure: a knowledge task that ingests a large page and emits a multi-hundred-MB result has both the str and its UTF-8 bytes resident before the cap fires — an OOM risk on the constrained agent MicroVM, which is exactly what the cap was meant to prevent. Check len(text) (or character budget) up front; for s3_and_comment, _artifact_body is also re-encoded twice.

10. read_only / requires_repo are two hand-maintained sources (CDK DESCRIPTORS vs agent YAML) with no parity test

cdk/src/handlers/shared/workflows.ts:64 + agent/src/config.py:27

CDK admission/preflight derives readOnly/requiresRepo from the TS DESCRIPTORS table; the agent derives the same from agent/workflows/*.yaml. The drift test (builtin-policies.test.ts) covers only Cedar policy text, not these descriptors, and the workflows fixture only checks id/version. _KNOWN_WRITEABLE_WORKFLOW_IDS is a third hand-maintained copy of the writeable set.

Failure: edit a YAML to read_only:true but leave TS readOnly:false and preflight scopes a contents:write token for a task the agent runs read-only (over-grant); or add a writeable workflow and forget _KNOWN_WRITEABLE_WORKFLOW_IDS and a load-failure fallback mis-denies its writes. Nothing in CI catches the divergence. Generate the descriptor table from the YAML (codegen/JSON sidecar), or add a parity test over all four fields + requiredInputs.


Lower-priority (noted, not in the top 10)

  • Dead post_review plumbing (runner.py:613): a registered handler that raises NotImplementedError, plus the post_review/review_posted vocabulary threaded through the schema, models, and validator special-cases — no shipped workflow uses it. Inviting a runtime error where a validation error belongs.
  • Workflow YAML loaded/parsed 3–4× per task (server.py, config.py:382, pipeline.py:172, pipeline.py:853): load_workflow isn't memoized though the file is immutable per process — add @lru_cache or thread the resolved Workflow forward.
  • Half-migrated runner: on the repo-bound path run_workflow(only_kinds={"run_agent"}) drives only the agent step; verify_build/ensure_pr stay inline and the inline gate logic ignores the per-step gate field declared in the coding YAMLs. The declarative steps/gate are effectively decorative for coding workflows — worth flagging as an altitude concern for follow-up.
  • Cedar context.read_only == true is structurally fail-open on a missing/non-bool attribute. Not currently reachable (policy.py always threads a real bool), but a latent fragility — consider a fail-closed default in the policy or an explicit assertion.

Bottom line

The architecture (declarative workflows, registry-driven deliverers, Cedar-by-property) is sound and the right direction. The blocking issues are the resolution-ladder/repo-optionality seam (#1#3) — a default coding submit silently stops opening PRs, and repo-optional + repo disagree across the boundary — and the unenforced per-workflow tool ceiling (#4). I'd hold the breaking cutover until #1#4 are resolved; #5#10 are fixable in follow-up but #7/#10 risk silent divergence.

bgagent added 2 commits June 9, 2026 17:02
…248)

Resolve five verified findings from the #248 review:

HIGH — thread agent_config.allowed_tools to the SDK. runner.py hardcoded the
full tool surface, so a workflow's declared allowed_tools never reached
ClaudeAgentOptions and the read_only:false read-leaning lanes (default/agent-v1,
web-research-v1) could still get Write/Bash. Add TaskConfig.allowed_tools
(populated from the resolved workflow), resolve it in run_agent via
_resolve_allowed_tools (drops Write/Edit when read_only), and add SDK-layer
tests.

Cedar missing-read_only fail-open — add the ADR-014-promised parity vector.
With context omitting read_only, both engines fail OPEN (Allow + base_permit);
safety rests on policy.py::_eval_tier re-raising on diagnostics.errors. Add
contracts/cedar-parity/read-only-missing-attribute.json (verified on cedarpy
AND cedar-wasm) plus test_policy.py tests pinning the re-raise + always-inject
discipline.

MEDIUM — repo-less artifact success gate. pipeline.py now fails the task when
the primary outcome is `artifact` but no artifact_uri was produced (matches the
WORKFLOWS.md "agent-success AND S3 key present" contract), not just when the
deliver step raised.

MEDIUM — rule 13 model allow-list admission. Add WORKFLOW_MODEL_ALLOWLIST +
disallowedWorkflowModel() and enforce it in create-task-core (unpermitted model
=> 400, no silent downgrade); descriptor sync-test asserts modelId matches the
YAML and stays on the allow-list.

LOW — CLI repo-less submission. --repo is no longer a hard requiredOption:
required unless --workflow selects a repo-less workflow (still required with
--pr/--review-pr). Lets bgagent reach knowledge/web-research-v1.

Rule 10 (one production version per lineage) remains intentionally unenforced
at the single-file layer — it is a registry/promotion-time property.
…rer findings (#248)

Resolve the remaining findings from isadeks's review (findings #2/#4 were
already fixed in the prior commit):

#1 — bare submit silently stopped opening PRs. The server ladder resolves an
absent workflow_ref to the repo-less default/agent-v1 (by design), so a CLI
`submit --repo X --task Y` regressed from "opens a PR" to "S3 artifact". The CLI
now sends coding/new-task-v1 explicitly when a repo is present and no workflow/
pr flag is given, preserving the old new_task default.

#3 — repo-optional + repo disagreed across the boundary. The agent took the
repo-less path off config.requires_repo alone, while the orchestrator built a
repo-bound prompt off repo presence. pipeline.run_task now takes the repo-less
path only when requires_repo is false AND no repo was supplied; a repo-optional
workflow given a repo runs the repo-bound path (clone/build/PR), matching CDK
admission and the prompt.

#6 — resolveWorkflowRef silently discarded a @Version constraint. It now honors
the constraint (Phase 1: must match the single shipped version) and returns
null otherwise; create-task-core 400s with a distinct "version not available"
message via resolveWorkflowRefError, instead of quietly running the shipped
version.

#7 — deliver_artifact unset-target default disagreed between validator (lenient
full set) and runtime (s3). Introduce DEFAULT_DELIVER_TARGET as the single
source of truth; produced_outcomes(None) now models that exact default, so a
primary:comment workflow that omits target is correctly flagged by rule 11.

#10 — descriptor/YAML drift had no parity test. The sync test now asserts
required_inputs (allOf/oneOf) parity in addition to id/version/requires_repo/
read_only/model, and a new test cross-checks the agent-side
_KNOWN_WRITEABLE_WORKFLOW_IDS (the third hand-maintained copy) against the
writeable repo-bound descriptors so drift fails CI.
@theagenticguy

theagenticguy commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Reviewed at HEAD 3e3a66f, re-fetched after d26860c and 3e3a66f. Thanks @krokoko. I read this through the lens of where ABCA is heading: one engine, with coding and knowledge lanes. So the framing leans architectural. Sophia's review is the line-by-line pass and is already in flight, so I've tried not to duplicate it.

My two findings are verified closed, at the code level rather than the commit message.

First, allowed_tools now reaches the SDK. _resolve_allowed_tools (agent/src/runner.py:285) reads config.allowed_tools. build_config populates that from workflow_obj.agent_config.allowed_tools (agent/src/config.py:386). It then flows to ClaudeAgentOptions verbatim (runner.py:387). The empty-list fallback to the full surface fires only on the WorkflowValidationError path. On that path read_only is also fail-closed to true (config.py:402), so Write and Edit drop anyway. The defense-in-depth layer the design promised is real now.

Second, the Cedar missing-attribute path is pinned. contracts/cedar-parity/read-only-missing-attribute.json is honest in the best way. It asserts both engines fail open identically on a missing read_only. It also documents in-fixture that safety rests on policy.py::_eval_tier re-raising on diagnostics.errors, which maps to Outcome.DENY, plus the always-inject discipline. It does not rest on Cedar's native semantics. test_eval_tier_reraises_on_missing_read_only_attribute locks the re-raise. This is the ADR-014 Phase-2a "silently weakens" tripwire. Resolved.

Both of mine are done. The rest is architectural color on the work Sophia already surfaced.

The architectural read: this PR ships the lane-router, and the lane-router is the bet.

Strip away the YAML and #248 ships a routing primitive. A submission arrives, and one decision picks the lane: coding, which clones, builds, and opens a PR, versus knowledge, which hydrates, runs the agent, and delivers an artifact. That decision is the control point for everything ABCA is growing into. So the findings worth holding the cutover for are not the ones with the scariest blast radius. They are the ones where the lane decision is computed in more than one place, and the places disagree. Today it is computed in four: CLI flag inference, server resolveWorkflowRef, the orchestrate-task missing-id default, and the agent's config.requires_repo. Two of Sophia's findings are symptoms of those four not agreeing.

I'd gate the cutover on Finding #1, where a bare submit --repo --task stops producing a PR. I want to underline it, because the fix commit is titled for the workflow seam but this regression is still live. Traced at HEAD: the CLI bare submit sends no workflow_ref (cli/src/commands/submit.ts). The server then resolves default/agent-v1 (cdk/src/handlers/shared/workflows.ts:173), which sets requiresRepo:false. The agent takes _run_repoless_task (agent/src/pipeline.py:660) and writes an S3 markdown artifact. No clone, no build, no PR.

The coding/new-task-v1 mapping lives only in a comment. The orchestrate-task.ts default to coding/new-task-v1 fires only when resolved_workflow is wholly absent, which means legacy records, not this case. So the most common historical invocation, "here is my repo, do this task," silently changes from "opens a PR" to "writes a markdown file." For a pre-1.0 breaking cutover, that burns trust on day one. The cheapest correct fix: when a repo is present and no workflow_ref was given, resolve to coding/new-task-v1. Repo-presence is the signal. That keeps the ladder honest and puts the lane decision on one axis.

Finding #3, a repo-optional workflow given a repo, is the same root cause. Hydration keys the lane off task.repo presence (context-hydration.ts) and builds a repo-bound prompt. The agent keys it off config.requires_repo (pipeline.py:660) and runs repo-less. Submit default/agent-v1 --repo owner/repo and the orchestrator promises the agent a repo it never clones. Repo-presence and requires_repo are two independent booleans that must be one decision. This is invisible today because nobody submits that combination. But once knowledge workflows are real and a user attaches a repo for context, the prompt and the runtime diverge. Better to collapse it now, while there are two lanes, than after there are nine.

I'd treat #1 and #3 as one fix. Make "is this task repo-bound?" a single value, resolved at the create-task boundary and threaded everywhere. Then the CLI, server, orchestrator, and agent cannot hold four opinions.

Independent corroboration on three of Sophia's open items. These are being worked, so this is a severity read, not new findings.

Finding #5 I'd rank High. The post-hook load_workflow (pipeline.py:868) is a bare call with no WorkflowValidationError fallback, unlike build_config (config.py:381). It is the only spot that can fail a task after run_agent already mutated the tree. The work gets stranded with no commit and no PR. This is distinct from the "no PR" findings: here a PR should exist and does not. Wrap it in the same fallback build_config uses.

Finding #10 is mostly closed, with one axis left. The new workflows.test.ts drift guard over id, version, requires_repo, read_only, and model genuinely closes the TS-to-YAML axis. But the third copy Sophia named, the Python _KNOWN_WRITEABLE_WORKFLOW_IDS and REPO_LESS_DEFAULT_WORKFLOW_ID (config.py:19,27), still has no parity test against the YAML. That set is the fail-closed fallback's source of truth. If it drifts, the load-failure path mis-classifies read-only. One more assertion closes it.

Finding #6 is Low, but make it loud. The @version constraint is discarded at workflows.ts:177 via split('@',1)[0], which pins to the single shipped version. A test now codifies this as intended, a reasonable Phase-1 stance. But an unsatisfiable pin like ...-v1@2.0.0 should 400, not silently run v1. The day a v2 ships, anyone pinned to @2.0.0 keeps getting v1 with no signal. Cheap to reject unknown versions now.

Two smaller items I'd not block on but would log so they don't get lost. load_workflow is parsed three times per repo-bound task with no @lru_cache (loader.py). And the coding path still runs run_workflow(only_kinds={"run_agent"}) with verify_build and ensure_pr inline, ignoring each step's declared gate field. So steps and gate are decorative for coding workflows today. Both are fine as follow-ups. They are worth a tracking note so "declarative steps" doesn't read as fully wired on the coding lane.

Bottom line. The direction is right and the security core is solid. My two findings are genuinely closed, and the byte-equality and cedar-parity guards are real; the full agent suite was green when I ran it. I'd hold the breaking task_type removal only until #1 and #3 collapse onto a single repo-bound decision, and #5 gets the fail-soft wrapper. Everything else is good follow-up material. Nice work on the convergence groundwork. The repo-less lane is the part that matters most long-term, and it is structurally sound.

Reviewed via background-agent tooling. Happy to pair on the #1 and #3 single-axis refactor if useful.

Comment thread cdk/src/handlers/shared/workflows.ts
Comment thread agent/src/pipeline.py Outdated
@theagenticguy

Copy link
Copy Markdown
Contributor

Follow-up — re-verified at 67d71ec (you pushed it while I was typing the comment above 😄, so updating to keep my comments honest against the current tree).

67d71ec lands fixes for most of what I'd flagged. Re-checked each against the code:

  • Installation: docker image inspect fails #1 (bare submit --repo → no PR)cli/src/commands/submit.ts:199-208 now sends workflow_ref: 'coding/new-task-v1' when a repo is present and no --workflow/--pr/--review-pr is given. That's the repo-present-is-the-signal fix; the old new_task default is preserved. My inline note on workflows.ts:174 is now resolved — leaving it for thread continuity only.
  • Docs: user guide hard-codes us-east-1 #3 (repo-optional + repo disagreed)pipeline.py:670 now takes the repo-less branch only when not requires_repo AND not repo_url, so a repo-optional workflow given a repo runs the repo-bound path (clone/build/PR), matching CDK admission and the hydrated prompt. The two axes are one decision now. 👍
  • chore(deps): bump defu from 6.1.4 to 6.1.6 in /docs #6 (@version silently discarded)parseWorkflowRef + resolveWorkflowRefError (workflows.ts:162,186) now honor the constraint and 400 on an unsatisfiable pin instead of quietly running the shipped version.
  • chore(project): update structure #7 (deliver_artifact unset-target)DEFAULT_DELIVER_TARGET is now the single source of truth shared by validator and runtime, so a primary: comment workflow that omits target is correctly flagged by rule 11.

Two still open after 67d71ec:

  1. feat: add Iteration 3e for memory security and integrity (OWASP ASI06) #5 — post-hook load_workflow is still a bare call (pipeline.py:878), no WorkflowValidationError fallback. Not mentioned in the commit message and confirmed unchanged. This is the one I'd still treat as High: it's the only load that can raise after run_agent has mutated/committed the tree, stranding the work with no ensure_pr. My inline comment on it stands — it just needs the same id-derived fallback build_config (config.py:381-404) already uses.
  2. chore(deps): bump cryptography from 46.0.6 to 46.0.7 in /agent #10 — residual third copy. The new workflows.test.ts drift guard over id/version/requires_repo/read_only/model/required_inputs is a genuine close of the TS↔YAML axis 👏. But the Python constants _KNOWN_WRITEABLE_WORKFLOW_IDS / REPO_LESS_DEFAULT_WORKFLOW_ID (agent/src/config.py:19,27) — the source of truth for the fail-closed load-failure path — still have no parity test against the YAML (agent/tests/ only checks resolved_requires_repo per file). If that set drifts, the fallback mis-classifies read-only silently. One assertion closes it.

Net: the resolution-ladder seam I'd have gated on is now collapsed onto a single axis — nice turnaround. I'd hold only on #5 (cheap, but it's a post-mutation failure) and treat #10's third-copy as a should-fix-before-merge. Everything else from my pass is closed. 🚢

@krokoko

krokoko commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@isadeks thanks for the thorough, accurate pass — this was exactly the right place to concentrate scrutiny (the resolution-ladder/repo-optionality seam). Status against the current tree, by finding:

Fixed

  • Installation: docker image inspect fails #1 — bare submit --repo --task stopped opening PRs67d71ec. The CLI now sends workflow_ref: 'coding/new-task-v1' when a repo is present and no --workflow/--pr/--review-pr is given (cli/src/commands/submit.ts). Repo-presence is the signal; the server ladder stays repo-agnostic by design.
  • Docs: specify that you can't use the agent with the canned repo #2 — repo-less unreachable from CLI3e3a66f (before your re-fetch). --repo is no longer a hard requiredOption; it's required unless --workflow selects a repo-less workflow (still required with --pr/--review-pr).
  • Docs: user guide hard-codes us-east-1 #3 — repo-optional + repo disagreed67d71ec. pipeline.run_task takes the repo-less branch only when not requires_repo AND not repo_url; a repo-optional workflow given a repo runs the repo-bound path. The two axes are one decision now.
  • feat: add FargateAgentStack as alternative compute backend #4allowed_tools hardcoded3e3a66f. _resolve_allowed_tools(config) threads the workflow's list to the SDK and drops Write/Edit when read_only; the empty-list fallback only fires on the WorkflowValidationError path, where read_only is already fail-closed.
  • chore(deps): bump defu from 6.1.4 to 6.1.6 in /docs #6@version silently discarded67d71ec. parseWorkflowRef + resolveWorkflowRefError honor the constraint; an unsatisfiable pin 400s ("version not available") instead of running the shipped version.
  • chore(project): update structure #7deliver_artifact unset-target mismatch67d71ec. DEFAULT_DELIVER_TARGET is now the single source of truth shared by validator and runtime, so a primary: comment workflow that omits target is correctly flagged by rule 11.

Still open / partial — agree with your severity

Lower-priority noted: load_workflow parsed 3–4×/task (no @lru_cache), and the coding path's inline verify_build/ensure_pr ignore the per-step gate field (declarative steps/gate are decorative on the coding lane today). Both logged as follow-ups so "declarative steps" isn't read as fully wired on coding.

Net: the resolution-ladder seam (#1#3) is collapsed onto a single repo-bound decision, and the tool-ceiling (#4) is enforced. Remaining before I'd call the cutover clean: #5 and #10's REPO_LESS_DEFAULT_WORKFLOW_ID residual. Thanks again.

…ompt, cap, parity (#248)

#5 (High) — post-hook load_workflow had no fail-soft fallback, so a reload
failure AFTER run_agent mutated/committed the tree stranded the work as FAILED
with no PR. read_only now comes from config (already fallback-computed and the
verdict that drove Cedar); the reload — needed only for the ensure_pr strategy —
is wrapped in the same WorkflowValidationError fallback build_config uses,
defaulting to "create" so the PR still opens.

#8 — knowledge/web-research-v1 had no registered prompt and silently degraded to
the generic default-agent prompt. Add a research-specialized, repo-less prompt
(prompts/web_research.py) and register it.

#9 — the 5 MiB artifact cap was checked AFTER encoding to UTF-8 (a second full
copy). Reject on the character count up front (chars ≤ bytes in UTF-8) so the cap
actually bounds memory on the MicroVM before materializing the bytes.

#10 residual — add a parity test for the agent-side REPO_LESS_DEFAULT_WORKFLOW_ID
/ DEFAULT_WORKFLOW_ID constants: the repo-less default must match the CDK platform
default and be requires_repo:false in the YAML; the coding default must be
repo-bound. Closes the last hand-maintained-copy axis #10 named.

Perf — memoize load_workflow (@cache): files are image-baked and immutable per
process and Workflow is frozen, so the 3-4 loads per task now parse once.

Leaves the coding-lane decorative-`gate` item (run_workflow only drives run_agent
while verify_build/ensure_pr stay inline) as a tracked follow-up — it is a
larger half-migrated-runner refactor both reviewers flagged as non-blocking.
@krokoko

krokoko commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up — pushed 9eacdd8 closing the remaining items from both passes. Thanks @isadeks and @theagenticguy for the thorough reviews.

Now fixed:

  • feat: add Iteration 3e for memory security and integrity (OWASP ASI06) #5 (High) — post-hook load_workflow fail-soft ✅ The reload at pipeline.py runs after run_agent has mutated/committed, so a failure there no longer strands the work. read_only now comes from config (already fallback-computed by build_config, and the verdict that drove Cedar during the run — so the post-hook can't diverge); the reload — needed only for the ensure_pr strategy — is wrapped in the same WorkflowValidationError fallback build_config uses, defaulting to "create" so the PR still opens. Test: test_post_hook_workflow_reload_failure_still_opens_pr.
  • feat(compute): add ComputeStrategy interface with ECS Fargate backend #8knowledge/web-research-v1 prompt ✅ Added a research-specialized, repo-less prompt (prompts/web_research.py) and registered it, so it no longer silently inherits the generic default-agent prompt.
  • chore(deps): bump basic-ftp from 5.2.0 to 5.2.1 #9 — artifact cap before encode_artifact_body now rejects on the character count up front (chars ≤ bytes in UTF-8) before materializing the UTF-8 copy, so the 5 MiB cap actually bounds memory on the MicroVM. Test proves encode() is never reached on the oversize path.
  • chore(deps): bump cryptography from 46.0.6 to 46.0.7 in /agent #10 residual — REPO_LESS_DEFAULT_WORKFLOW_ID parity@theagenticguy — you were right that a residual remained. For the record, _KNOWN_WRITEABLE_WORKFLOW_IDS was already cross-checked in 67d71ec's workflows.test.ts (it reads config.py); 9eacdd8 adds the missing assertion for REPO_LESS_DEFAULT_WORKFLOW_ID / DEFAULT_WORKFLOW_ID — the repo-less default must match the CDK platform default and be requires_repo:false in the YAML; the coding default must be repo-bound. That closes the last hand-maintained-copy axis.

Also addressed (perf): load_workflow is now memoized (@cache) — files are image-baked and immutable per process and Workflow is frozen=True, so the 3–4 loads per task parse once.

Tracked follow-up (not in this PR): the coding-lane decorative-gate observation — run_workflow(only_kinds={"run_agent"}) drives only the agent step while verify_build/ensure_pr stay inline and ignore each step's declared gate. That's a larger half-migrated-runner refactor both of you flagged as non-blocking; logging it so "declarative steps" isn't read as fully wired on the coding lane, to be picked up separately.

Full suite green before push: agent 975, CLI, CDK 1844 — lint/typecheck/types-sync clean. Deploying to dev now for manual verification.

@krokoko krokoko marked this pull request as ready for review June 9, 2026 23:55
@krokoko krokoko requested a review from a team as a code owner June 9, 2026 23:55
@krokoko krokoko requested a review from theagenticguy June 9, 2026 23:56

@theagenticguy theagenticguy left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved ✅

Re-verified at 9eacdd8 against the code, not the commit messages. Every item I gated on is closed, and I ran the suite locally rather than trust the green claim.

My two findings: both closed. allowed_tools reaches the SDK and drops Write/Edit on read-only (runner.py:285, config.py:386). The Cedar missing-attribute path is pinned by read-only-missing-attribute.json, with the re-raise locked by test_eval_tier_reraises_on_missing_read_only_attribute.

The one I held the cutover on, #5: fixed well in 9eacdd8. The post-hook now reads read_only from config. That is the same verdict that drove Cedar during the run, so it cannot diverge. The strategy-only load_workflow reload is wrapped in the WorkflowValidationError fallback and defaults to "create". So a load failure after the tree is mutated still opens the PR instead of stranding the work. test_post_hook_workflow_reload_failure_still_opens_pr covers it. Reusing the config verdict is cleaner than the fix I suggested.

The rest of Sophia's pass: #1, #2, #3, #4, #6, #7 confirmed closed. #8 now ships a registered research prompt. #9 checks the character count before encoding, so the 5 MiB cap actually bounds MicroVM memory. #10's residual parity assertion for REPO_LESS_DEFAULT_WORKFLOW_ID is in workflows.test.ts.

Local verification: full agent suite ran green for me, 975 passed in 5.4s. The workflow, policy, cedar-parity, pipeline, deliverer, and prompt paths pass; all three new named tests exist and pass.

One tracked follow-up, not a blocker: the coding lane still runs run_workflow(only_kinds={"run_agent"}) with verify_build and ensure_pr inline, ignoring each step's declared gate. So steps and gate are declarative only on the knowledge lane today, not on coding. Both reviewers flagged this as a separate half-migrated-runner refactor, and it is logged for follow-up. Worth a short tracking issue so "declarative steps" is not read as fully wired on the coding lane.

The convergence groundwork is solid. The repo-less lane is the part that matters most for where ABCA is heading, and it is structurally sound. Nice work turning all of this around fast. 🚢

Reviewed via background-agent tooling.

@krokoko krokoko enabled auto-merge June 10, 2026 00:13
@krokoko krokoko added this pull request to the merge queue Jun 10, 2026
Merged via the queue into main with commit 7d10254 Jun 10, 2026
9 checks passed
@krokoko krokoko deleted the feat/248-workflow-driven-tasks branch June 10, 2026 00:14
@theagenticguy

Copy link
Copy Markdown
Contributor

Opened #301 to track the coding-lane decorative-gate follow-up we agreed was non-blocking (per-step gate is honored on the repo-less lane via _gate_status but not on the coding lane, where verify_build/ensure_pr run inline). Scoped to making gate honored without the larger half-migrated-runner unification. cc @krokoko @isadeks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(tasks): capability-driven tasks — declarative steps, non-coding workflows, migrate existing task types

3 participants