Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
fd31772
Initial design docs
Jun 5, 2026
6ee5309
docs(workflows): address principal-review findings on ADR-014
Jun 5, 2026
2647d3c
feat(agent): workflow models + loader (#248 Phase 1)
Jun 5, 2026
c7042ea
feat(agent): cross-field workflow validator + golden corpus (#248 Pha…
Jun 8, 2026
093971d
feat(agent): workflow step runner + handler registry (#248 Phase 1)
Jun 8, 2026
9daf614
feat(agent): first-party workflow files — coding/new-task-v1 + defaul…
Jun 8, 2026
4bca53f
fix(agent): address code-review findings in workflow validator + runn…
Jun 8, 2026
b5744e2
feat(agent): wire workflow runner into pipeline behind a gate (#248 P…
Jun 8, 2026
5488bde
fix(agent): address second-round review findings on the workflow seam…
Jun 8, 2026
4fb6ad9
feat(api,cli): replace task_type with workflow_ref/resolved_workflow …
Jun 8, 2026
72fddbe
feat(agent): complete task_type→workflow cutover in the agent (#248 t…
Jun 8, 2026
4c34c8b
fix(agent): use ty ignore syntax + annotate workflow test handlers (#…
Jun 8, 2026
5570e5d
feat(agent): Cedar read-only enforcement keys off context.read_only (…
Jun 8, 2026
eafc2ee
docs(workflows): resolve repo-optional open questions + freeze schema…
Jun 8, 2026
97f390f
feat(api,agent): repo-optional admission for repo-less workflows (#24…
Jun 9, 2026
311a4dc
feat(agent): repo-less pipeline path via the workflow runner (#248 Ph…
Jun 9, 2026
6a6716b
fix(agent): address PR-review findings on Phase 2a/3 (#248)
Jun 9, 2026
0b2e8e1
feat(agent,api): deliver_artifact + repo-less memory — close criterio…
Jun 9, 2026
9c5b45d
feat(agent): make repo-less delivery retrievable + ship a knowledge w…
Jun 9, 2026
292019f
fix(agent): address full-branch review — 2 repo-less blockers + clean…
Jun 9, 2026
3d2b239
test(cli): cover artifact_uri render lines (#248 Phase 3)
Jun 9, 2026
ab69e80
fix(agent): address PR review findings + green CI (#248)
Jun 9, 2026
c26cd8e
Merge branch 'main' into feat/248-workflow-driven-tasks
krokoko Jun 9, 2026
d26860c
fix(agent): ship first-party workflow files in the image + sync user …
Jun 9, 2026
3e3a66f
fix(agent,api,cli): address PR-review findings on the workflow seam (…
Jun 9, 2026
67d71ec
fix(agent,api,cli): address PR-review #296 resolution-ladder + delive…
Jun 9, 2026
9eacdd8
fix(agent): close PR-review #296 follow-ups — post-hook fail-soft, pr…
Jun 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions agent/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,12 @@ COPY agent/policies/ /app/policies/
# by ``cdk/src/handlers/shared/types.ts`` at synth time. See
# ``contracts/README.md`` for the contract.
COPY contracts/ /app/contracts/
# First-party workflow files (#248). ``agent/src/workflow/loader.py`` resolves
# ``_WORKFLOWS_ROOT`` to ``/app/workflows`` (parents[2] of /app/src/workflow/)
# and loads ``<domain>/<name>-vN.yaml`` plus ``schema/workflow.schema.json`` at
# task time; without these files every workflow load fails with
# ``WorkflowValidationError: workflow '...' not found at /app/workflows/...``.
COPY agent/workflows/ /app/workflows/
COPY agent/prepare-commit-msg.sh /app/
COPY agent/test_sdk_smoke.py agent/test_subprocess_threading.py /app/

Expand Down
24 changes: 15 additions & 9 deletions agent/policies/hard_deny.cedar
Original file line number Diff line number Diff line change
Expand Up @@ -12,24 +12,30 @@
@rule_id("base_permit")
permit (principal, action, resource);

// pr_review tasks may never invoke Write. Absolute; cannot be overridden
// by per-blueprint customization or --pre-approve.
// Read-only workflows may never invoke Write. Absolute; cannot be overridden
// by per-blueprint customization or --pre-approve. Keyed on the read_only
// context attribute (not a principal literal) so the deny attaches to the
// *property* and fires for every read-only workflow uniformly — not just
// coding/pr-review. (#248 Phase 2a — replaces the literal
// Agent::TaskAgent::"pr_review" match; see ADR-014 addendum 2026-06-08.)
@tier("hard")
@rule_id("pr_review_forbid_write")
@rule_id("read_only_forbid_write")
forbid (
principal == Agent::TaskAgent::"pr_review",
principal,
action == Agent::Action::"invoke_tool",
resource == Agent::Tool::"Write"
);
)
when { context.read_only == true };

// pr_review tasks may never invoke Edit.
// Read-only workflows may never invoke Edit.
@tier("hard")
@rule_id("pr_review_forbid_edit")
@rule_id("read_only_forbid_edit")
forbid (
principal == Agent::TaskAgent::"pr_review",
principal,
action == Agent::Action::"invoke_tool",
resource == Agent::Tool::"Edit"
);
)
when { context.read_only == true };

// Reject `rm -rf /` and similar absolute-root destructive commands.
@tier("hard")
Expand Down
6 changes: 6 additions & 0 deletions agent/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,12 @@ dependencies = [
# commit. See docs/design/CEDAR_HITL_GATES.md §15.6 (decision #23) and
# the parity-contract banner in mise.toml.
"cedarpy==4.8.4", #https://github.com/k9securityio/cedar-py — EXACT pin (no ^/~), parity with @cedar-policy/cedar-wasm@4.8.2 (both Cedar Rust 4.8.2)
# Workflow-driven tasks (#248): the step runner loads YAML workflow files
# and validates them against agent/workflows/schema/workflow.schema.json.
# Both were previously only transitively present; declared directly so the
# workflow loader does not depend on another package's transitive pin.
"pyyaml==6.0.3", #https://pypi.org/project/PyYAML/
"jsonschema==4.26.0", #https://pypi.org/project/jsonschema/
]

[tool.uv]
Expand Down
91 changes: 73 additions & 18 deletions agent/src/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,26 @@
import uuid
from datetime import UTC

from models import AttachmentConfig, TaskConfig, TaskType
from models import AttachmentConfig, TaskConfig
from shell import log

AGENT_WORKSPACE = os.environ.get("AGENT_WORKSPACE", "/workspace")

# Task types that operate on an existing pull request.
PR_TASK_TYPES = frozenset(("pr_iteration", "pr_review"))
# The platform default workflow id used when a payload omits resolved_workflow
# (local/batch runs). Mirrors the create-task boundary's coding default.
DEFAULT_WORKFLOW_ID = "coding/new-task-v1"
# The repo-less platform default workflow (#248 Phase 3) — the one first-party
# id whose ``requires_repo`` is false. Used by the load-failure fallback to
# decide repo-optionality without loading the file.
REPO_LESS_DEFAULT_WORKFLOW_ID = "default/agent-v1"
# First-party workflow ids that operate on an existing pull request.
PR_WORKFLOW_IDS = frozenset(("coding/pr-iteration-v1", "coding/pr-review-v1"))
# First-party workflow ids that are writeable (NOT read-only). Used only by the
# load-failure fallback to bias an unrecognised id toward read-only (fail closed
# on the write-deny invariant). pr-review-v1 is intentionally excluded (it is
# read-only); default/agent-v1 is excluded because its conservative posture
# should fail closed too.
_KNOWN_WRITEABLE_WORKFLOW_IDS = frozenset(("coding/new-task-v1", "coding/pr-iteration-v1"))


def resolve_github_token() -> str:
Expand Down Expand Up @@ -314,7 +327,7 @@ def _refresh(current: dict) -> dict | None:


def build_config(
repo_url: str,
repo_url: str = "",
task_description: str = "",
issue_number: str = "",
github_token: str = "",
Expand All @@ -325,7 +338,7 @@ def build_config(
dry_run: bool = False,
task_id: str = "",
system_prompt_overrides: str = "",
task_type: str = "new_task",
resolved_workflow: dict | None = None,
branch_name: str = "",
pr_number: str = "",
channel_source: str = "",
Expand All @@ -351,22 +364,59 @@ def build_config(
"ANTHROPIC_MODEL", "us.anthropic.claude-sonnet-4-6"
)

# Resolve the workflow id (the create-task boundary already pinned it; local
# batch runs default to the coding workflow). Required-input validation is
# owned by the create-task boundary now; the agent re-checks only the
# pr_number/issue/description shape needed to run.
workflow = resolved_workflow or {"id": DEFAULT_WORKFLOW_ID, "version": "1.0.0"}
workflow_id = workflow.get("id", DEFAULT_WORKFLOW_ID)
is_pr_workflow = workflow_id in PR_WORKFLOW_IDS

# Load the workflow up-front: it drives the Cedar principal, the read_only
# flag, AND whether a repo is required (#248 Phase 3). Fall back to id-based
# mapping when the file can't be loaded (e.g. a registry-only id in a future
# phase) — a repo-less default is the safe assumption only for non-coding.
from workflow import WorkflowValidationError, load_workflow, policy_principal_for

try:
workflow_obj = load_workflow(workflow_id)
policy_principal = policy_principal_for(workflow_obj)
workflow_read_only = workflow_obj.read_only
workflow_requires_repo = workflow_obj.resolved_requires_repo
workflow_allowed_tools = list(workflow_obj.agent_config.allowed_tools)
except WorkflowValidationError as exc:
# The pinned workflow file failed to load (corrupt YAML, schema drift, a
# future registry-only id). This is the one place read_only/requires_repo
# can be wrong without a loud failure, so: (1) log it, and (2) fail
# *closed* — assume read-only (deny writes) for any id we don't recognise
# as a known writeable coding workflow, rather than fail-open to writeable.
log("ERROR", f"workflow {workflow_id!r} failed to load ({exc}); using fallback policy")
policy_principal = "pr_review" if workflow_id == "coding/pr-review-v1" else "new_task"
# Known writeable coding workflows are the only ids that fall back to
# writeable; everything else (incl. an unrecognised id) is read-only.
workflow_read_only = workflow_id not in _KNOWN_WRITEABLE_WORKFLOW_IDS
# requires_repo: the repo-less platform default is the only id that does
# NOT require a repo; every other id (coding or unknown) requires one.
workflow_requires_repo = workflow_id != REPO_LESS_DEFAULT_WORKFLOW_ID
# Tool surface is unknown without the file; empty = the runner falls back
# to its built-in full surface. read_only (above, fail-closed) still drops
# Write/Edit, so the write-deny invariant holds even on this path.
workflow_allowed_tools = []

errors = []
if not resolved_repo_url:
errors.append("repo_url is required (e.g., 'owner/repo')")
if not resolved_github_token:
errors.append("github_token is required")
# Repo + GitHub token are required only for repo-bound workflows; a repo-less
# workflow (requires_repo:false) runs from task_description/attachments alone.
if workflow_requires_repo:
if not resolved_repo_url:
errors.append("repo_url is required (e.g., 'owner/repo')")
if not resolved_github_token:
errors.append("github_token is required")
if not resolved_aws_region:
errors.append("aws_region is required for Bedrock")
try:
task = TaskType(task_type)
except ValueError:
errors.append(f"Invalid task_type: '{task_type}'")
task = None
if task and task.is_pr_task:
if is_pr_workflow:
if not pr_number:
errors.append("pr_number is required for pr_iteration/pr_review task type")
elif task and not resolved_issue_number and not resolved_task_description:
errors.append(f"pr_number is required for the {workflow_id!r} workflow")
elif not resolved_issue_number and not resolved_task_description:
errors.append("Either issue_number or task_description is required")

if errors:
Expand Down Expand Up @@ -394,7 +444,12 @@ def build_config(
max_turns=max_turns,
max_budget_usd=max_budget_usd,
system_prompt_overrides=system_prompt_overrides,
task_type=task_type,
resolved_workflow=workflow,
policy_principal=policy_principal,
read_only=workflow_read_only,
allowed_tools=workflow_allowed_tools,
requires_repo=workflow_requires_repo,
is_pr_workflow=is_pr_workflow,
branch_name=branch_name,
pr_number=pr_number,
task_id=task_id or uuid.uuid4().hex[:12],
Expand Down
3 changes: 1 addition & 2 deletions agent/src/entrypoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

from config import ( # noqa: F401
AGENT_WORKSPACE,
PR_TASK_TYPES,
PR_WORKFLOW_IDS,
build_config,
get_config,
resolve_github_token,
Expand All @@ -29,7 +29,6 @@
RepoSetup,
TaskConfig,
TaskResult,
TaskType,
TokenUsage,
)
from pipeline import main, run_task # noqa: F401
Expand Down
27 changes: 21 additions & 6 deletions agent/src/memory.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,20 @@ def _validate_repo(repo: str) -> None:
)


def _validate_actor(actor: str) -> None:
"""Validate a memory actorId: an ``owner/repo`` OR a ``user:{id}`` namespace.

Repo-less knowledge tasks (#248 Phase 3) key memory on ``user:{user_id}``
instead of a repo (ADR-014 addendum). Accept both forms so the same write
path serves coding and knowledge tasks; reject anything else.
"""
if actor.startswith("user:"):
if not actor[len("user:") :]:
raise ValueError("actor 'user:' namespace requires a non-empty user id")
return
_validate_repo(actor)


def _log_error(func_name: str, err: Exception, memory_id: str, task_id: str) -> None:
"""Log memory write failure with severity based on exception type."""
is_programming_error = isinstance(err, (TypeError, ValueError, AttributeError, KeyError))
Expand All @@ -67,7 +81,7 @@ def _log_error(func_name: str, err: Exception, memory_id: str, task_id: str) ->

def write_task_episode(
memory_id: str,
repo: str,
actor: str,
task_id: str,
status: str,
pr_url: str | None = None,
Expand All @@ -81,17 +95,18 @@ def write_task_episode(
status, PR URL, cost, duration, and any self-feedback from the
agent's "## Agent notes" section in the PR body.

Uses actorId=repo and sessionId=task_id so the extraction strategy
namespace templates (/{actorId}/episodes/{sessionId}/) place records
into the correct per-repo, per-task namespace.
Uses actorId=``actor`` and sessionId=task_id so the extraction strategy
namespace templates (/{actorId}/episodes/{sessionId}/) place records into
the correct namespace. ``actor`` is an ``owner/repo`` for coding tasks or a
``user:{user_id}`` namespace for repo-less knowledge tasks (#248 Phase 3).

Metadata includes source_type='agent_episode' for provenance tracking
and content_sha256 for integrity auditing on read (schema v3).

Returns True on success, False on failure (fail-open).
"""
try:
_validate_repo(repo)
_validate_actor(actor)
client = _get_client()

parts = [
Expand Down Expand Up @@ -124,7 +139,7 @@ def write_task_episode(

client.create_event(
memoryId=memory_id,
actorId=repo,
actorId=actor,
sessionId=task_id,
eventTimestamp=_iso_now(),
payload=[
Expand Down
Loading
Loading