diff --git a/AGENTS.md b/AGENTS.md index 60d1c66..ca7861f 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -24,7 +24,7 @@ To get started and understand the developer flow, follow the [Developer guide](. - **`scripts/`** (root) — Optional cross-package helpers; **`scripts/ci-build.sh`** runs the full monorepo build (same as CI). - **`cdk/`** — CDK app package (`@abca/cdk`): `cdk/src/`, `cdk/test/`, `cdk/cdk.json`, `cdk/tsconfig.json`, `cdk/tsconfig.dev.json`, and `cdk/.eslintrc.json`. - **`cli/`** — `@backgroundagent/cli` — CLI tool for interacting with the deployed REST API (see below). -- **`agent/`** — Python code that runs inside the agent compute environment (entrypoint, server, system prompt, Dockerfile, requirements). +- **`agent/`** — Python code that runs inside the agent compute environment (entrypoint, server, system prompt, Dockerfile, requirements). The system prompt is refactored into `agent/prompts/` with a shared base template and per-task-type workflow variants (`new_task`, `pr_iteration`, `pr_review`). - **`docs/`** — Authoritative Markdown in `guides/` (developer, user, roadmap, prompt) and `design/`; assets in `diagrams/`, `imgs/`. The Starlight docs site lives here (`astro.config.mjs`, `package.json`); `src/content/docs/` is refreshed via `docs/scripts/sync-starlight.mjs`. - **`CONTRIBUTING.md`** — Contribution guidelines at the repository root. - **`package.json`** (root), **`yarn.lock`** — Yarn workspace root (minimal manifest); dependencies live in **`cdk/`**, **`cli/`**, and **`docs/`** package manifests. @@ -40,7 +40,7 @@ The `@backgroundagent/cli` package provides the `bgagent` executable for submitt - `src/api-client.ts` — HTTP client wrapping `fetch` with auth header injection - `src/auth.ts` — Cognito login, token caching (`~/.bgagent/credentials.json`), auto-refresh - `src/config.ts` — Read/write `~/.bgagent/config.json` -- `src/types.ts` — API request/response types (mirrored from `cdk/src/handlers/shared/types.ts`) +- `src/types.ts` — API request/response types (mirrored from `cdk/src/handlers/shared/types.ts`), including `TaskType` (`new_task` | `pr_iteration` | `pr_review`) - `src/format.ts` — Output formatting (table, detail view, JSON) - `src/debug.ts` — Verbose/debug logging (`--verbose` flag) - `src/errors.ts` — `CliError` and `ApiError` classes diff --git a/agent/Dockerfile b/agent/Dockerfile index d0d52e8..d0a052f 100644 --- a/agent/Dockerfile +++ b/agent/Dockerfile @@ -51,6 +51,7 @@ RUN uv sync --frozen --no-dev --directory /app # Copy agent code (ARG busts cache so file edits are always picked up) ARG CACHE_BUST=0 COPY entrypoint.py system_prompt.py server.py task_state.py observability.py memory.py /app/ +COPY prompts/ /app/prompts/ COPY prepare-commit-msg.sh /app/ COPY test_sdk_smoke.py test_subprocess_threading.py /app/ diff --git a/agent/README.md b/agent/README.md index 26719db..88dade0 100644 --- a/agent/README.md +++ b/agent/README.md @@ -1,6 +1,6 @@ # Agent Runtime -The agent runtime container for ABCA. Each agent instance clones a GitHub repo, works on a task using Claude, and opens a pull request. Runs as a Docker container with two modes: +The agent runtime container for ABCA. Each agent instance clones a GitHub repo, works on a task using Claude, and delivers a result — a new pull request (`new_task`), updates to an existing PR (`pr_iteration`), or structured review comments on a PR (`pr_review`). Runs as a Docker container with two modes: - **Local mode** — batch execution via `run.sh` with AgentCore-matching constraints (2 vCPU, 8 GB RAM) - **AgentCore mode** — FastAPI server on port 8080 with `/invocations` and `/ping` endpoints, deployable to AWS Bedrock AgentCore Runtime @@ -224,6 +224,12 @@ bgagent submit --repo owner/repo --task "update the rfc issue template" # Submit with a GitHub issue bgagent submit --repo owner/repo --issue 42 +# Iterate on a PR (address review feedback) +bgagent submit --repo owner/repo --pr 42 + +# Review a PR (read-only — posts structured review comments) +bgagent submit --repo owner/repo --review-pr 55 + # Submit and wait for completion bgagent submit --repo owner/repo --issue 42 --wait ``` @@ -252,18 +258,18 @@ The `run.sh` script prints these commands when it starts. ## What It Does -The agent pipeline (shared by both modes): +The agent pipeline (shared by both modes). Behavior varies by task type (`new_task`, `pr_iteration`, `pr_review`): 1. **Config validation** — checks required parameters -2. **Context hydration** — fetches the GitHub issue (title, body, comments) if an issue number is provided -3. **Prompt assembly** — combines the system prompt (behavioral contract) with the issue context and task description -4. **Deterministic pre-hooks** — clones repo, creates branch, configures git auth, runs `mise trust`, `mise install`, `mise run build`, and `mise run lint` +2. **Context hydration** — fetches the GitHub issue (title, body, comments) if an issue number is provided; for `pr_iteration` and `pr_review`, fetches PR context (diff, description, review comments) +3. **Prompt assembly** — combines the system prompt (behavioral contract, selected by task type from `prompts/`) with the issue/PR context and task description +4. **Deterministic pre-hooks** — clones repo, creates or checks out branch, configures git auth, runs `mise trust`, `mise install`, `mise run build`, and `mise run lint` 5. **Agent execution** — invokes the Claude Agent SDK via the `ClaudeSDKClient` class (connect/query/receive_response pattern) in unattended mode. The agent: - Understands the codebase - - Makes changes, runs tests and linters - - Commits and pushes after each unit of work - - Creates a pull request with summary, testing notes, and decisions -6. **Deterministic post-hooks** — verifies `mise run build` and `mise run lint`, ensures a PR exists (creates one if the agent did not) + - **`new_task`**: Makes changes, runs tests and linters, commits and pushes after each unit of work, creates a pull request + - **`pr_iteration`**: Reads review feedback, addresses it with focused changes, commits and pushes, posts a summary comment on the PR + - **`pr_review`**: Analyzes changes read-only (no `Write` or `Edit` tools available), composes structured review findings, posts a batch review via the GitHub Reviews API +6. **Deterministic post-hooks** — verifies `mise run build` and `mise run lint`, ensures a PR exists (creates one if the agent did not). For `pr_review`, build status is informational only and the commit/push steps are skipped. 7. **Metrics** — returns duration, disk usage, turn count, cost, and PR URL ## Metrics @@ -322,9 +328,16 @@ agent/ ├── task_state.py Best-effort DynamoDB task status (no-op if TASK_TABLE_NAME unset) ├── observability.py OpenTelemetry helpers (e.g. AgentCore session id) ├── memory.py Optional memory / episode integration for the agent +├── prompts/ Per-task-type system prompt workflows +│ ├── __init__.py Prompt registry — assembles base template + workflow for each task type +│ ├── base.py Shared base template (environment, rules, placeholders) +│ ├── new_task.py Workflow for new_task (create branch, implement, open PR) +│ ├── pr_iteration.py Workflow for pr_iteration (read feedback, address, push) +│ └── pr_review.py Workflow for pr_review (read-only analysis, structured review comments) ├── system_prompt.py Behavioral contract (PRD Section 11) ├── prepare-commit-msg.sh Git hook (Task-Id / Prompt-Version trailers on commits) ├── run.sh Build + run helper for local/server mode with AgentCore constraints +├── tests/ pytest unit tests for pure functions and prompt assembly ├── test_sdk_smoke.py Diagnostic: minimal SDK smoke test (ClaudeSDKClient → CLI → Bedrock) └── test_subprocess_threading.py Diagnostic: subprocess-in-background-thread verification ``` diff --git a/agent/entrypoint.py b/agent/entrypoint.py index 2c8938c..42e06cc 100644 --- a/agent/entrypoint.py +++ b/agent/entrypoint.py @@ -31,6 +31,7 @@ import memory as agent_memory import task_state from observability import task_span +from prompts import get_system_prompt from system_prompt import SYSTEM_PROMPT # --------------------------------------------------------------------------- @@ -39,6 +40,9 @@ AGENT_WORKSPACE = os.environ.get("AGENT_WORKSPACE", "/workspace") +# Task types that operate on an existing pull request. +PR_TASK_TYPES = frozenset(("pr_iteration", "pr_review")) + def resolve_github_token() -> str: """Resolve GitHub token from Secrets Manager or environment variable. @@ -77,6 +81,9 @@ def build_config( dry_run: bool = False, task_id: str = "", system_prompt_overrides: str = "", + task_type: str = "new_task", + branch_name: str = "", + pr_number: str = "", ) -> dict: """Build and validate configuration from explicit parameters. @@ -94,6 +101,9 @@ def build_config( "max_turns": max_turns, "max_budget_usd": max_budget_usd, "system_prompt_overrides": system_prompt_overrides, + "task_type": task_type, + "branch_name": branch_name, + "pr_number": pr_number, } errors = [] @@ -103,7 +113,10 @@ def build_config( errors.append("github_token is required") if not config["aws_region"]: errors.append("aws_region is required for Bedrock") - if not config["issue_number"] and not config["task_description"]: + if config["task_type"] in PR_TASK_TYPES: + if not config["pr_number"]: + errors.append("pr_number is required for pr_iteration/pr_review task type") + elif not config["issue_number"] and not config["task_description"]: errors.append("Either issue_number or task_description is required") if errors: @@ -303,15 +316,19 @@ def setup_repo(config: dict) -> dict: repo_dir = f"{AGENT_WORKSPACE}/{config['task_id']}" setup: dict[str, str | list[str] | bool] = {"repo_dir": repo_dir, "notes": []} - # Derive branch slug from issue title or task description - title = "" - if config.get("issue"): - title = config["issue"]["title"] - if not title: - title = config["task_description"] - slug = slugify(title) - branch = f"bgagent/{config['task_id']}/{slug}" - setup["branch"] = branch + if config.get("task_type") in PR_TASK_TYPES and config.get("branch_name"): + branch = config["branch_name"] + setup["branch"] = branch + else: + # Derive branch slug from issue title or task description + title = "" + if config.get("issue"): + title = config["issue"]["title"] + if not title: + title = config["task_description"] + slug = slugify(title) + branch = f"bgagent/{config['task_id']}/{slug}" + setup["branch"] = branch # Mark the repo directory as safe for git. On persistent session storage # the mount may be owned by a different UID than the container user, @@ -343,9 +360,22 @@ def setup_repo(config: dict) -> dict: cwd=repo_dir, ) - # Create branch - log("SETUP", f"Creating branch: {branch}") - run_cmd(["git", "checkout", "-b", branch], label="create-branch", cwd=repo_dir) + # Branch setup + if config.get("task_type") in PR_TASK_TYPES and config.get("branch_name"): + log("SETUP", f"Checking out existing PR branch: {branch}") + run_cmd( + ["git", "fetch", "origin", branch], + label="fetch-pr-branch", + cwd=repo_dir, + ) + run_cmd( + ["git", "checkout", "-b", branch, f"origin/{branch}"], + label="checkout-pr-branch", + cwd=repo_dir, + ) + else: + log("SETUP", f"Creating branch: {branch}") + run_cmd(["git", "checkout", "-b", branch], label="create-branch", cwd=repo_dir) # Trust mise config files in the cloned repo (required before mise install) run_cmd( @@ -402,7 +432,11 @@ def setup_repo(config: dict) -> dict: setup["lint_before"] = True # Detect default branch - setup["default_branch"] = detect_default_branch(config["repo_url"], repo_dir) + # For PR tasks (pr_iteration, pr_review): use base_branch from orchestrator if available + if config.get("task_type") in PR_TASK_TYPES and config.get("base_branch"): + setup["default_branch"] = config["base_branch"] + else: + setup["default_branch"] = detect_default_branch(config["repo_url"], repo_dir) # Install prepare-commit-msg hook for code attribution _install_commit_hook(repo_dir) @@ -620,6 +654,10 @@ def ensure_pr( ) -> str | None: """Check if a PR exists for the branch; if not, create one. + For ``new_task``: creates a new PR if needed. + For ``pr_iteration``: pushes commits, then resolves the existing PR URL. + For ``pr_review``: resolves the existing PR URL without pushing (read-only). + Returns the PR URL, or None if there are no commits beyond the default branch or PR creation failed. ``build_passed`` and ``lint_passed`` control the verification status shown in the PR body. @@ -628,6 +666,40 @@ def ensure_pr( branch = setup["branch"] default_branch = setup.get("default_branch", "main") + # PR iteration/review: skip PR creation — just resolve existing PR URL + if config.get("task_type") in PR_TASK_TYPES: + if config.get("task_type") == "pr_iteration": + if not ensure_pushed(repo_dir, branch): + log("WARN", "Failed to push commits before resolving PR URL") + else: + log("POST", "pr_review task — skipping push (read-only)") + log("POST", f"{config.get('task_type')} — returning existing PR URL") + result = subprocess.run( + [ + "gh", + "pr", + "view", + branch, + "--repo", + config["repo_url"], + "--json", + "url", + "-q", + ".url", + ], + cwd=repo_dir, + capture_output=True, + text=True, + timeout=60, + ) + if result.returncode == 0 and result.stdout.strip(): + pr_url = result.stdout.strip() + log("POST", f"Existing PR: {pr_url}") + return pr_url + stderr_msg = result.stderr.strip() if result.stderr else "(no stderr)" + log("WARN", f"Could not resolve existing PR URL (rc={result.returncode}): {stderr_msg}") + return None + # Check if the agent already created a PR for this branch log("POST", "Checking for existing PR...") result = subprocess.run( @@ -1274,10 +1346,15 @@ def _on_stderr(line: str) -> None: else: log("WARN", "claude CLI not found on PATH") + if config.get("task_type") == "pr_review": + allowed_tools = ["Bash", "Read", "Glob", "Grep", "WebFetch"] + else: + allowed_tools = ["Bash", "Read", "Write", "Edit", "Glob", "Grep", "WebFetch"] + options = ClaudeAgentOptions( model=config["anthropic_model"], system_prompt=system_prompt, - allowed_tools=["Bash", "Read", "Write", "Edit", "Glob", "Grep", "WebFetch"], + allowed_tools=allowed_tools, permission_mode="bypassPermissions", cwd=cwd, max_turns=config["max_turns"], @@ -1482,7 +1559,13 @@ def _build_system_prompt( overrides: str, ) -> str: """Assemble the system prompt with task-specific values and memory context.""" - system_prompt = SYSTEM_PROMPT.replace("{repo_url}", config["repo_url"]) + task_type = config.get("task_type", "new_task") + try: + system_prompt = get_system_prompt(task_type) + except ValueError: + log("ERROR", f"Unknown task_type {task_type!r} — falling back to default system prompt") + system_prompt = SYSTEM_PROMPT + system_prompt = system_prompt.replace("{repo_url}", config["repo_url"]) system_prompt = system_prompt.replace("{task_id}", config["task_id"]) system_prompt = system_prompt.replace("{workspace}", AGENT_WORKSPACE) system_prompt = system_prompt.replace("{branch_name}", setup["branch"]) @@ -1513,6 +1596,14 @@ def _build_system_prompt( memory_context_text = "\n".join(mc_parts) system_prompt = system_prompt.replace("{memory_context}", memory_context_text) + # Substitute PR-specific placeholders + pr_number_val = config.get("pr_number", "") + if pr_number_val: + system_prompt = system_prompt.replace("{pr_number}", str(pr_number_val)) + elif "{pr_number}" in system_prompt: + log("WARN", "System prompt contains {pr_number} placeholder but no pr_number in config") + system_prompt = system_prompt.replace("{pr_number}", "(unknown)") + # Append Blueprint system_prompt_overrides after all placeholder # substitutions (avoids double-substitution if overrides contain # template placeholders like {repo_url}). @@ -1628,6 +1719,9 @@ def run_task( system_prompt_overrides: str = "", prompt_version: str = "", memory_id: str = "", + task_type: str = "new_task", + branch_name: str = "", + pr_number: str = "", ) -> dict: """Run the full agent pipeline and return a result dict. @@ -1652,6 +1746,9 @@ def run_task( aws_region=aws_region, task_id=task_id, system_prompt_overrides=system_prompt_overrides, + task_type=task_type, + branch_name=branch_name, + pr_number=pr_number, ) log("TASK", f"Task ID: {config['task_id']}") @@ -1678,6 +1775,8 @@ def run_task( prompt = hydrated_context["user_prompt"] if hydrated_context.get("issue"): config["issue"] = hydrated_context["issue"] + if hydrated_context.get("resolved_base_branch"): + config["base_branch"] = hydrated_context["resolved_base_branch"] if hydrated_context.get("truncated"): log("WARN", "Context was truncated by orchestrator token budget") else: @@ -1765,8 +1864,11 @@ def run_task( # Post-hooks with task_span("task.post_hooks") as post_span: - # Safety net: commit any uncommitted tracked changes - safety_committed = ensure_committed(setup["repo_dir"]) + # Safety net: commit any uncommitted tracked changes (skip for read-only tasks) + if config.get("task_type") == "pr_review": + safety_committed = False + else: + safety_committed = ensure_committed(setup["repo_dir"]) post_span.set_attribute("safety_net.committed", safety_committed) build_passed = verify_build(setup["repo_dir"]) @@ -1810,8 +1912,13 @@ def run_task( # Default True = assume build was green before, so a post-agent # failure IS counted as a regression (conservative). build_before = setup.get("build_before", True) - build_ok = build_passed or not build_before - if not build_passed and not build_before: + if config.get("task_type") == "pr_review": + build_ok = True # Review task — build status is informational only + if not build_passed: + log("INFO", "pr_review: build failed — informational only, not gating") + else: + build_ok = build_passed or not build_before + if not build_passed and not build_before and config.get("task_type") != "pr_review": log( "WARN", "Post-agent build failed, but build was already failing before " diff --git a/agent/prompts/__init__.py b/agent/prompts/__init__.py new file mode 100644 index 0000000..a864e01 --- /dev/null +++ b/agent/prompts/__init__.py @@ -0,0 +1,22 @@ +"""Prompt module — selects the system prompt template by task type.""" + +from .base import BASE_PROMPT +from .new_task import NEW_TASK_WORKFLOW +from .pr_iteration import PR_ITERATION_WORKFLOW +from .pr_review import PR_REVIEW_WORKFLOW + +_PROMPTS = { + "new_task": BASE_PROMPT.replace("{workflow}", NEW_TASK_WORKFLOW), + "pr_iteration": BASE_PROMPT.replace("{workflow}", PR_ITERATION_WORKFLOW), + "pr_review": BASE_PROMPT.replace("{workflow}", PR_REVIEW_WORKFLOW), +} + + +def get_system_prompt(task_type: str = "new_task") -> str: + """Return the system prompt template for the given task type. + + Raises ValueError for unknown task types. + """ + if task_type not in _PROMPTS: + raise ValueError(f"Unknown task_type: {task_type!r}. Valid types: {list(_PROMPTS.keys())}") + return _PROMPTS[task_type] diff --git a/agent/prompts/base.py b/agent/prompts/base.py new file mode 100644 index 0000000..cc8e30e --- /dev/null +++ b/agent/prompts/base.py @@ -0,0 +1,62 @@ +"""Shared base prompt — identity, environment, and rules. + +Placeholders replaced at runtime by entrypoint.py: + {repo_url} — GitHub repo (owner/repo) + {task_id} — Unique task identifier + {workspace} — Workspace root (AGENT_WORKSPACE env var, default /workspace) + {branch_name} — Git branch created by the entrypoint + {default_branch} — Repository default branch (e.g. main, master) + {max_turns} — Maximum agent turns for this task + {setup_notes} — Results of mise install and initial build + {memory_context} — Cross-task memory (repo knowledge + past episodes) + {workflow} — Task-type-specific workflow steps (injected by __init__.py) +""" + +BASE_PROMPT = """\ +You are a background coding agent. You work fully unattended — no human will \ +interact with you during execution. You must make all decisions autonomously. + +## Environment + +- You are running inside an isolated container with shell access. +- The repository `{repo_url}` is already cloned at `{workspace}/{task_id}`. +- You are on branch `{branch_name}`. +- The repository default branch is `{default_branch}`. +- Git is configured and authenticated — `git push` works without extra setup. +- The `gh` CLI is pre-installed and authenticated via GH_TOKEN. +- Dependencies have been installed via `mise install`. +- An initial build (`mise run build`) has already been run. +- An initial lint (`mise run lint`) has already been run. +- You have a maximum of **{max_turns} turns**. Prioritize the most impactful \ +changes first and work efficiently. Avoid spending excessive turns exploring — \ +understand what you need, then act. + +### Setup results + +{setup_notes} + +### Previous knowledge about this repository + +{memory_context} + +{workflow} + +## Rules + +- **Full permissions**: Execute any shell commands, modify any files, install \ +any dependencies. The container is isolated — no blast radius. +- **No confirmation**: Never pause or ask for input. Make reasonable decisions \ +and document them. +- **No skipping steps**: Step 3 (test) is mandatory. Even if the change seems \ +trivial, you must run `mise run build` and report the result. The PR description \ +must include evidence that build and tests were run. +- **Error handling**: If a step fails twice, commit whatever work you have, \ +document the error in the PR description, and create the PR with partial results. \ +Do not loop indefinitely. +- **Lint before commit**: Run available linters and type-checks before each commit. +- **Commit conventions**: Follow the repo's commit style if discoverable. \ +Otherwise use conventional commit format: `(): description` where \ +type is feat/fix/chore/docs/refactor/test and module is the area of the codebase \ +(e.g., `auth`, `api`, `github`, `ci`). +- **Branch naming**: Already set — push to `{branch_name}`. +""" diff --git a/agent/prompts/new_task.py b/agent/prompts/new_task.py new file mode 100644 index 0000000..451ad27 --- /dev/null +++ b/agent/prompts/new_task.py @@ -0,0 +1,69 @@ +"""Workflow section for new_task (create a new PR).""" + +NEW_TASK_WORKFLOW = """\ +## Workflow + +Follow these steps in order: + +1. **Understand the codebase** + Read relevant files, check the project structure, look at existing tests, \ +build scripts, and CI configuration. Understand the project before changing it. + +2. **Work on the task** + Make the necessary code changes. Be thorough but focused — only change what \ +the task requires. Do not refactor unrelated code. + +3. **Test your changes** + This step is MANDATORY — do NOT skip it. + - Run the project build: `mise run build` + - Run linters and type-checkers if available. + - If the project has tests, run them (e.g., `npm test`, `pytest`, `make test`). + - If the project has no tests, validate your changes manually (e.g., syntax \ +check, dry-run) and note this in the PR. + - Report test and build results in the PR description — both passes and failures. + +4. **Commit and push frequently** + After each logical unit of work, commit and push: + ``` + git add + git commit -m "(): " + git push -u origin {branch_name} + ``` + Follow the repo's commit conventions if specified in CONTRIBUTING.md, \ +CLAUDE.md, or prior commits. If no convention is apparent, default to \ +conventional commit format (`(): description`). \ +Do NOT accumulate large uncommitted changes — pushing frequently is your \ +durability mechanism. + +5. **Create a Pull Request** + When the work is complete (or after exhausting attempts), you MUST create a PR. \ +Do NOT skip this step or tell the user to do it manually. + + The PR body must include a section titled "## Agent notes" with: + - What went well and what was difficult + - Any patterns or conventions you discovered about this repo + - Suggestions for future tasks on this repo + + Run: + ``` + gh pr create --repo {repo_url} --head {branch_name} --base {default_branch} --title "(): " --body "" + ``` + Follow the repo's PR title conventions if specified. If no convention is \ +apparent, use conventional commit format: `(): description`. \ +Examples: + - `feat(auth): add OAuth2 login flow` + - `fix(api): handle null response from payments endpoint` + - `chore(github): update RFC issue template` + - `docs(readme): add deployment instructions` + The `` is a short identifier for the area of the codebase being changed \ +(e.g., `auth`, `api`, `github`, `ci`, `docs`). Never omit the module scope. + + The PR body must include: + - Summary of changes + - Link to the issue (if provided) + - Build and test results (what commands were run, output summary, pass/fail) + - Decisions made (if the task was ambiguous, explain your choices) + - The following sentence: "By submitting this pull request, I confirm that you \ +can use, modify, copy, and redistribute this contribution, under the terms of \ +the [project license](https://github.com/krokoko/agent-plugins/blob/main/LICENSE)."\ +""" diff --git a/agent/prompts/pr_iteration.py b/agent/prompts/pr_iteration.py new file mode 100644 index 0000000..8289b2c --- /dev/null +++ b/agent/prompts/pr_iteration.py @@ -0,0 +1,74 @@ +"""Workflow section for pr_iteration (iterate on an existing PR).""" + +PR_ITERATION_WORKFLOW = """\ +## Workflow + +You are iterating on an existing pull request (PR #{pr_number}). Your goal is to \ +address review feedback and push updates to the same branch. + +Follow these steps in order: + +1. **Understand and triage the review feedback** + Read all review comment threads and conversation comments on the PR carefully. \ +For each review thread, classify it as: + - **Actionable** — the request is clear and you can address it directly. + - **Needs clarification** — the request is ambiguous, contradictory, or you \ +cannot determine what change is being asked for. Do not guess — ask for clarification. + - **Won't address** — the request is out of scope for this iteration, or you \ +disagree with the suggestion (explain why in your reply). + +2. **Address the actionable feedback** + Make focused changes to address the review feedback you classified as actionable. \ +Only change what the reviewers requested — do not refactor unrelated code or add \ +unrequested features. + +3. **Test your changes** + This step is MANDATORY — do NOT skip it. + - Run the project build: `mise run build` + - Run linters and type-checkers if available. + - If the project has tests, run them (e.g., `npm test`, `pytest`, `make test`). + +4. **Commit and push to `{branch_name}`** + After each logical unit of work, commit and push: + ``` + git add + git commit -m "(): " + git push origin {branch_name} + ``` + Follow the repo's commit conventions if specified in CONTRIBUTING.md, \ +CLAUDE.md, or prior commits. If no convention is apparent, default to \ +conventional commit format (`(): description`). \ +Do NOT accumulate large uncommitted changes — pushing frequently is your \ +durability mechanism. + +5. **Reply to each review comment thread** + For each review comment thread listed in the Review Comments section above, \ +reply directly to that thread using the GitHub API: + ``` + gh api repos/{repo_url}/pulls/{pr_number}/comments//replies \\ + -f body="" + ``` + Replace `` with the top-level comment ID shown next to each \ +thread (e.g. `reply with comment_id: 12345678`). + + Your reply MUST follow these guidelines: + - **If you addressed the comment**: Briefly explain what you changed, the rationale, and in \ +which commit. Example: "Fixed in abc1234 — added the null check as suggested." + - **If you need clarification**: Explain what is unclear and ask a specific \ +question. Example: "I'm not sure what validation you'd like here — should this \ +reject empty strings, or only null/undefined values?" + - **If you won't address it**: Explain why (out of scope, disagree, etc.). \ +Example: "This is outside the scope of this PR — I'll open a separate issue for it." + - Keep replies concise and factual. + +6. **Post a summary comment on the PR** + When done, add a top-level summary comment to the PR: + ``` + gh pr comment {pr_number} --repo {repo_url} --body "" + ``` + The summary must include: + - What was changed to address feedback + - Which comments were addressed (brief list) + - Which comments need clarification from reviewers + - Build and test results (what commands were run, pass/fail)\ +""" diff --git a/agent/prompts/pr_review.py b/agent/prompts/pr_review.py new file mode 100644 index 0000000..d69f822 --- /dev/null +++ b/agent/prompts/pr_review.py @@ -0,0 +1,104 @@ +"""Workflow section for pr_review (review an existing PR without modifying code).""" + +PR_REVIEW_WORKFLOW = """\ +## Rules override + +**This is a READ-ONLY review task.** The base prompt rules about "Full permissions", \ +"modify any files", and "install any dependencies" do NOT apply to this task. You \ +must NOT modify any source code files, configuration files, or project dependencies. \ +Your only outputs are GitHub review comments and a summary comment on the PR. \ +Your tool permissions enforce this: you have access to Bash, Read, Glob, Grep, \ +and WebFetch only — Write and Edit are not available. + +## Workflow + +You are reviewing pull request #{pr_number} on `{repo_url}`. Your goal is to \ +analyze the changes and post a structured code review using the GitHub Reviews API. \ +You must NOT modify any files — this is a read-only task. + +Follow these steps in order: + +1. **Understand the PR context** + Read the PR title, body, and any existing review or conversation comments. \ +Understand what the PR is trying to achieve and any constraints or requirements \ +mentioned by the author or reviewers. + +2. **Analyze the changes** + - Read the full source files for every file changed in the PR (not just the diff hunks). \ +Context matters — you need to understand how the changed code fits into the broader \ +file and module. + - Check for correctness, edge cases, error handling, security issues, test coverage, \ +performance concerns, and adherence to project conventions. + - Run `mise run build` to check whether the PR builds and tests pass. This is for \ +your analysis — the result does NOT gate the review. + - If the repository has a CLAUDE.md, CONTRIBUTING.md, or style guide, check \ +adherence to those guidelines. + +3. **Leverage repository memory context** + If previous knowledge about this repository is available (see "Previous knowledge \ +about this repository" above), use it to inform your review. Reference specific \ +repository conventions, past issues, or known patterns when relevant. When a finding \ +is informed by repository memory, note it in the description. + +4. **Compose findings using the structured comment format** + For each finding, use this format: + + ``` + **Type**: + **Severity**: + **Title**: + + **Description**: )")> + + **Proposed fix**: + + **AI prompt**: + ``` + + Classification guidelines: + - `comment` — An observation, suggestion, or non-blocking recommendation. + - `question` — Something that needs clarification from the author before \ +the reviewer can form an opinion. Always phrase as a clear question. + - `issue` — A defect, bug, or problem that should be fixed. Severity: + - `minor` — Style, naming, minor readability concern. + - `medium` — Logic issue, missing validation, or test gap that could cause \ +problems in some scenarios. + - `major` — Significant bug, security vulnerability, or correctness issue \ +that will likely cause production problems. + - `critical` — Data loss, security breach, or crash affecting all users. \ +Must be fixed before merge. + - `good_point` — Something done well that is worth highlighting. No severity, \ +proposed fix, or AI prompt needed. + + The `Severity` line should ONLY be present for `issue` type findings. + +5. **Post the review via the GitHub Reviews API** + Batch ALL findings into a single review submission using the GitHub Reviews API: + ``` + gh api repos/{repo_url}/pulls/{pr_number}/reviews \\ + --method POST \\ + -f event="COMMENT" \\ + -f body="" \\ + -f 'comments[]={{"path":"","line":,"body":""}}' + ``` + - Use `event: "COMMENT"` — do NOT approve or request changes. + - Place comments on the specific lines they refer to. Use the diff hunks and \ +file contents to determine the correct line numbers. + - For findings that are not file-specific (e.g. architecture concerns, missing \ +tests), include them in the review body rather than as line comments. + +6. **Post a summary comment on the PR** + After submitting the review, add a top-level summary comment: + ``` + gh pr comment {pr_number} --repo {repo_url} --body "" + ``` + The summary should include: + - Total number of findings by type (issues, comments, questions, good points) + - A brief assessment of overall PR quality + - Key areas that need attention before merge + - Build/test results from step 2\ +""" diff --git a/agent/pyproject.toml b/agent/pyproject.toml index 4f8621a..41d71f1 100644 --- a/agent/pyproject.toml +++ b/agent/pyproject.toml @@ -56,6 +56,7 @@ ignore = [ [tool.ruff.lint.per-file-ignores] "system_prompt.py" = ["E501"] # long prompt strings +"prompts/*.py" = ["E501"] # long prompt strings "tests/**" = ["S101", "S106"] # assert in tests is fine; hardcoded test tokens are not secrets [tool.pytest.ini_options] diff --git a/agent/server.py b/agent/server.py index 1fe8cab..a65e934 100644 --- a/agent/server.py +++ b/agent/server.py @@ -103,6 +103,9 @@ def _run_task_background( system_prompt_overrides: str = "", prompt_version: str = "", memory_id: str = "", + task_type: str = "new_task", + branch_name: str = "", + pr_number: str = "", ) -> None: """Run the agent task in a background thread.""" try: @@ -125,6 +128,9 @@ def _run_task_background( system_prompt_overrides=system_prompt_overrides, prompt_version=prompt_version, memory_id=memory_id, + task_type=task_type, + branch_name=branch_name, + pr_number=pr_number, ) except Exception as e: print(f"Background task {task_id} failed: {type(e).__name__}: {e}") @@ -161,6 +167,9 @@ def invoke_agent(request: Request, body: InvocationRequest): hydrated_context = inp.get("hydrated_context") prompt_version = inp.get("prompt_version", "") memory_id = inp.get("memory_id") or os.environ.get("MEMORY_ID", "") + task_type = inp.get("task_type", "new_task") + branch_name = inp.get("branch_name", "") + pr_number = str(inp.get("pr_number", "")) # Extract AgentCore session ID from request headers for OTEL correlation session_id = request.headers.get("x-amzn-bedrock-agentcore-runtime-session-id", "") @@ -182,6 +191,9 @@ def invoke_agent(request: Request, body: InvocationRequest): system_prompt_overrides, prompt_version, memory_id, + task_type, + branch_name, + pr_number, ), ) # Track the thread for graceful shutdown BEFORE starting it so diff --git a/agent/tests/test_entrypoint.py b/agent/tests/test_entrypoint.py index 42501d6..8f527dd 100644 --- a/agent/tests/test_entrypoint.py +++ b/agent/tests/test_entrypoint.py @@ -329,3 +329,117 @@ def test_overrides_appended(self): result = _build_system_prompt(config, setup, None, "Always use tabs") assert "Always use tabs" in result assert "Additional instructions" in result + + +# --------------------------------------------------------------------------- +# build_config — task_type handling +# --------------------------------------------------------------------------- + + +class TestBuildConfigTaskType: + def test_pr_iteration_with_pr_number(self): + config = build_config( + repo_url="owner/repo", + github_token="ghp_test", + aws_region="us-east-1", + task_type="pr_iteration", + pr_number="42", + ) + assert config["task_type"] == "pr_iteration" + assert config["pr_number"] == "42" + + def test_pr_iteration_without_pr_number_raises(self): + with pytest.raises(ValueError, match="pr_number is required"): + build_config( + repo_url="owner/repo", + github_token="ghp_test", + aws_region="us-east-1", + task_type="pr_iteration", + ) + + def test_new_task_default(self): + config = build_config( + repo_url="owner/repo", + task_description="Fix it", + github_token="ghp_test", + aws_region="us-east-1", + ) + assert config["task_type"] == "new_task" + + def test_pr_review_with_pr_number(self): + config = build_config( + repo_url="owner/repo", + github_token="ghp_test", + aws_region="us-east-1", + task_type="pr_review", + pr_number="55", + ) + assert config["task_type"] == "pr_review" + assert config["pr_number"] == "55" + + def test_pr_review_without_pr_number_raises(self): + with pytest.raises(ValueError, match="pr_number is required"): + build_config( + repo_url="owner/repo", + github_token="ghp_test", + aws_region="us-east-1", + task_type="pr_review", + ) + + +# --------------------------------------------------------------------------- +# _build_system_prompt — task_type handling +# --------------------------------------------------------------------------- + + +class TestBuildSystemPromptTaskType: + def test_selects_new_task_prompt(self): + config = { + "repo_url": "owner/repo", + "task_id": "test-123", + "max_turns": 100, + "task_type": "new_task", + } + setup = { + "branch": "bgagent/test-123/fix", + "default_branch": "main", + "notes": ["All OK"], + } + prompt = _build_system_prompt(config, setup, None, "") + assert "Create a Pull Request" in prompt + + def test_selects_pr_iteration_prompt(self): + config = { + "repo_url": "owner/repo", + "task_id": "test-123", + "max_turns": 100, + "task_type": "pr_iteration", + "pr_number": "42", + } + setup = { + "branch": "feature/fix", + "default_branch": "main", + "notes": ["All OK"], + } + prompt = _build_system_prompt(config, setup, None, "") + assert "Post a summary comment on the PR" in prompt + assert "Reply to each review comment thread" in prompt + assert "42" in prompt + + def test_selects_pr_review_prompt(self): + config = { + "repo_url": "owner/repo", + "task_id": "test-123", + "max_turns": 100, + "task_type": "pr_review", + "pr_number": "55", + } + setup = { + "branch": "feature/review", + "default_branch": "main", + "notes": ["All OK"], + } + prompt = _build_system_prompt(config, setup, None, "") + assert "READ-ONLY" in prompt + assert "must NOT modify" in prompt + assert "55" in prompt diff --git a/agent/tests/test_prompts.py b/agent/tests/test_prompts.py new file mode 100644 index 0000000..6e1a19e --- /dev/null +++ b/agent/tests/test_prompts.py @@ -0,0 +1,46 @@ +"""Unit tests for the prompts module.""" + +import pytest + +from prompts import get_system_prompt + + +class TestGetSystemPrompt: + def test_new_task_returns_prompt_with_create_pr(self): + prompt = get_system_prompt("new_task") + assert "Create a Pull Request" in prompt + assert "{repo_url}" in prompt + assert "{branch_name}" in prompt + assert "{workflow}" not in prompt + + def test_pr_iteration_returns_prompt_with_update_pr(self): + prompt = get_system_prompt("pr_iteration") + assert "Post a summary comment on the PR" in prompt + assert "Reply to each review comment thread" in prompt + assert "gh api" in prompt + assert "comments//replies" in prompt + assert "{pr_number}" in prompt + assert "{repo_url}" in prompt + assert "{branch_name}" in prompt + assert "{workflow}" not in prompt + + def test_pr_review_returns_prompt_with_review_workflow(self): + prompt = get_system_prompt("pr_review") + assert "READ-ONLY" in prompt + assert "must NOT modify" in prompt + assert "gh api" in prompt + assert "{pr_number}" in prompt + assert "{repo_url}" in prompt + assert "Write and Edit are not available" in prompt + assert "{workflow}" not in prompt + + def test_all_types_contain_shared_base_sections(self): + for task_type in ("new_task", "pr_iteration", "pr_review"): + prompt = get_system_prompt(task_type) + assert "## Environment" in prompt, f"Missing Environment in {task_type}" + has_rules = "## Rules" in prompt or "## Rules override" in prompt + assert has_rules, f"Missing Rules in {task_type}" + + def test_unknown_task_type_raises(self): + with pytest.raises(ValueError, match="Unknown task_type"): + get_system_prompt("invalid_type") diff --git a/agent/uv.lock b/agent/uv.lock index 7ab2d63..5e1a0f5 100644 --- a/agent/uv.lock +++ b/agent/uv.lock @@ -334,55 +334,55 @@ wheels = [ [[package]] name = "cryptography" -version = "46.0.6" +version = "46.0.7" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "cffi", marker = "platform_python_implementation != 'PyPy'" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/a4/ba/04b1bd4218cbc58dc90ce967106d51582371b898690f3ae0402876cc4f34/cryptography-46.0.6.tar.gz", hash = "sha256:27550628a518c5c6c903d84f637fbecf287f6cb9ced3804838a1295dc1fd0759", size = 750542 } -wheels = [ - { url = "https://files.pythonhosted.org/packages/47/23/9285e15e3bc57325b0a72e592921983a701efc1ee8f91c06c5f0235d86d9/cryptography-46.0.6-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:64235194bad039a10bb6d2d930ab3323baaec67e2ce36215fd0952fad0930ca8", size = 7176401 }, - { url = "https://files.pythonhosted.org/packages/60/f8/e61f8f13950ab6195b31913b42d39f0f9afc7d93f76710f299b5ec286ae6/cryptography-46.0.6-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:26031f1e5ca62fcb9d1fcb34b2b60b390d1aacaa15dc8b895a9ed00968b97b30", size = 4275275 }, - { url = "https://files.pythonhosted.org/packages/19/69/732a736d12c2631e140be2348b4ad3d226302df63ef64d30dfdb8db7ad1c/cryptography-46.0.6-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:9a693028b9cbe51b5a1136232ee8f2bc242e4e19d456ded3fa7c86e43c713b4a", size = 4425320 }, - { url = "https://files.pythonhosted.org/packages/d4/12/123be7292674abf76b21ac1fc0e1af50661f0e5b8f0ec8285faac18eb99e/cryptography-46.0.6-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:67177e8a9f421aa2d3a170c3e56eca4e0128883cf52a071a7cbf53297f18b175", size = 4278082 }, - { url = "https://files.pythonhosted.org/packages/5b/ba/d5e27f8d68c24951b0a484924a84c7cdaed7502bac9f18601cd357f8b1d2/cryptography-46.0.6-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:d9528b535a6c4f8ff37847144b8986a9a143585f0540fbcb1a98115b543aa463", size = 4926514 }, - { url = "https://files.pythonhosted.org/packages/34/71/1ea5a7352ae516d5512d17babe7e1b87d9db5150b21f794b1377eac1edc0/cryptography-46.0.6-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:22259338084d6ae497a19bae5d4c66b7ca1387d3264d1c2c0e72d9e9b6a77b97", size = 4457766 }, - { url = "https://files.pythonhosted.org/packages/01/59/562be1e653accee4fdad92c7a2e88fced26b3fdfce144047519bbebc299e/cryptography-46.0.6-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:760997a4b950ff00d418398ad73fbc91aa2894b5c1db7ccb45b4f68b42a63b3c", size = 3986535 }, - { url = "https://files.pythonhosted.org/packages/d6/8b/b1ebfeb788bf4624d36e45ed2662b8bd43a05ff62157093c1539c1288a18/cryptography-46.0.6-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:3dfa6567f2e9e4c5dceb8ccb5a708158a2a871052fa75c8b78cb0977063f1507", size = 4277618 }, - { url = "https://files.pythonhosted.org/packages/dd/52/a005f8eabdb28df57c20f84c44d397a755782d6ff6d455f05baa2785bd91/cryptography-46.0.6-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:cdcd3edcbc5d55757e5f5f3d330dd00007ae463a7e7aa5bf132d1f22a4b62b19", size = 4890802 }, - { url = "https://files.pythonhosted.org/packages/ec/4d/8e7d7245c79c617d08724e2efa397737715ca0ec830ecb3c91e547302555/cryptography-46.0.6-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:d4e4aadb7fc1f88687f47ca20bb7227981b03afaae69287029da08096853b738", size = 4457425 }, - { url = "https://files.pythonhosted.org/packages/1d/5c/f6c3596a1430cec6f949085f0e1a970638d76f81c3ea56d93d564d04c340/cryptography-46.0.6-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:2b417edbe8877cda9022dde3a008e2deb50be9c407eef034aeeb3a8b11d9db3c", size = 4405530 }, - { url = "https://files.pythonhosted.org/packages/7e/c9/9f9cea13ee2dbde070424e0c4f621c091a91ffcc504ffea5e74f0e1daeff/cryptography-46.0.6-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:380343e0653b1c9d7e1f55b52aaa2dbb2fdf2730088d48c43ca1c7c0abb7cc2f", size = 4667896 }, - { url = "https://files.pythonhosted.org/packages/ad/b5/1895bc0821226f129bc74d00eccfc6a5969e2028f8617c09790bf89c185e/cryptography-46.0.6-cp311-abi3-win32.whl", hash = "sha256:bcb87663e1f7b075e48c3be3ecb5f0b46c8fc50b50a97cf264e7f60242dca3f2", size = 3026348 }, - { url = "https://files.pythonhosted.org/packages/c3/f8/c9bcbf0d3e6ad288b9d9aa0b1dee04b063d19e8c4f871855a03ab3a297ab/cryptography-46.0.6-cp311-abi3-win_amd64.whl", hash = "sha256:6739d56300662c468fddb0e5e291f9b4d084bead381667b9e654c7dd81705124", size = 3483896 }, - { url = "https://files.pythonhosted.org/packages/01/41/3a578f7fd5c70611c0aacba52cd13cb364a5dee895a5c1d467208a9380b0/cryptography-46.0.6-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:2ef9e69886cbb137c2aef9772c2e7138dc581fad4fcbcf13cc181eb5a3ab6275", size = 7117147 }, - { url = "https://files.pythonhosted.org/packages/fa/87/887f35a6fca9dde90cad08e0de0c89263a8e59b2d2ff904fd9fcd8025b6f/cryptography-46.0.6-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:7f417f034f91dcec1cb6c5c35b07cdbb2ef262557f701b4ecd803ee8cefed4f4", size = 4266221 }, - { url = "https://files.pythonhosted.org/packages/aa/a8/0a90c4f0b0871e0e3d1ed126aed101328a8a57fd9fd17f00fb67e82a51ca/cryptography-46.0.6-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:d24c13369e856b94892a89ddf70b332e0b70ad4a5c43cf3e9cb71d6d7ffa1f7b", size = 4408952 }, - { url = "https://files.pythonhosted.org/packages/16/0b/b239701eb946523e4e9f329336e4ff32b1247e109cbab32d1a7b61da8ed7/cryptography-46.0.6-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:aad75154a7ac9039936d50cf431719a2f8d4ed3d3c277ac03f3339ded1a5e707", size = 4270141 }, - { url = "https://files.pythonhosted.org/packages/0f/a8/976acdd4f0f30df7b25605f4b9d3d89295351665c2091d18224f7ad5cdbf/cryptography-46.0.6-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:3c21d92ed15e9cfc6eb64c1f5a0326db22ca9c2566ca46d845119b45b4400361", size = 4904178 }, - { url = "https://files.pythonhosted.org/packages/b1/1b/bf0e01a88efd0e59679b69f42d4afd5bced8700bb5e80617b2d63a3741af/cryptography-46.0.6-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:4668298aef7cddeaf5c6ecc244c2302a2b8e40f384255505c22875eebb47888b", size = 4441812 }, - { url = "https://files.pythonhosted.org/packages/bb/8b/11df86de2ea389c65aa1806f331cae145f2ed18011f30234cc10ca253de8/cryptography-46.0.6-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:8ce35b77aaf02f3b59c90b2c8a05c73bac12cea5b4e8f3fbece1f5fddea5f0ca", size = 3963923 }, - { url = "https://files.pythonhosted.org/packages/91/e0/207fb177c3a9ef6a8108f234208c3e9e76a6aa8cf20d51932916bd43bda0/cryptography-46.0.6-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:c89eb37fae9216985d8734c1afd172ba4927f5a05cfd9bf0e4863c6d5465b013", size = 4269695 }, - { url = "https://files.pythonhosted.org/packages/21/5e/19f3260ed1e95bced52ace7501fabcd266df67077eeb382b79c81729d2d3/cryptography-46.0.6-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:ed418c37d095aeddf5336898a132fba01091f0ac5844e3e8018506f014b6d2c4", size = 4869785 }, - { url = "https://files.pythonhosted.org/packages/10/38/cd7864d79aa1d92ef6f1a584281433419b955ad5a5ba8d1eb6c872165bcb/cryptography-46.0.6-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:69cf0056d6947edc6e6760e5f17afe4bea06b56a9ac8a06de9d2bd6b532d4f3a", size = 4441404 }, - { url = "https://files.pythonhosted.org/packages/09/0a/4fe7a8d25fed74419f91835cf5829ade6408fd1963c9eae9c4bce390ecbb/cryptography-46.0.6-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:8e7304c4f4e9490e11efe56af6713983460ee0780f16c63f219984dab3af9d2d", size = 4397549 }, - { url = "https://files.pythonhosted.org/packages/5f/a0/7d738944eac6513cd60a8da98b65951f4a3b279b93479a7e8926d9cd730b/cryptography-46.0.6-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:b928a3ca837c77a10e81a814a693f2295200adb3352395fad024559b7be7a736", size = 4651874 }, - { url = "https://files.pythonhosted.org/packages/cb/f1/c2326781ca05208845efca38bf714f76939ae446cd492d7613808badedf1/cryptography-46.0.6-cp314-cp314t-win32.whl", hash = "sha256:97c8115b27e19e592a05c45d0dd89c57f81f841cc9880e353e0d3bf25b2139ed", size = 3001511 }, - { url = "https://files.pythonhosted.org/packages/c9/57/fe4a23eb549ac9d903bd4698ffda13383808ef0876cc912bcb2838799ece/cryptography-46.0.6-cp314-cp314t-win_amd64.whl", hash = "sha256:c797e2517cb7880f8297e2c0f43bb910e91381339336f75d2c1c2cbf811b70b4", size = 3471692 }, - { url = "https://files.pythonhosted.org/packages/c4/cc/f330e982852403da79008552de9906804568ae9230da8432f7496ce02b71/cryptography-46.0.6-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:12cae594e9473bca1a7aceb90536060643128bb274fcea0fc459ab90f7d1ae7a", size = 7162776 }, - { url = "https://files.pythonhosted.org/packages/49/b3/dc27efd8dcc4bff583b3f01d4a3943cd8b5821777a58b3a6a5f054d61b79/cryptography-46.0.6-cp38-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:639301950939d844a9e1c4464d7e07f902fe9a7f6b215bb0d4f28584729935d8", size = 4270529 }, - { url = "https://files.pythonhosted.org/packages/e6/05/e8d0e6eb4f0d83365b3cb0e00eb3c484f7348db0266652ccd84632a3d58d/cryptography-46.0.6-cp38-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:ed3775295fb91f70b4027aeba878d79b3e55c0b3e97eaa4de71f8f23a9f2eb77", size = 4414827 }, - { url = "https://files.pythonhosted.org/packages/2f/97/daba0f5d2dc6d855e2dcb70733c812558a7977a55dd4a6722756628c44d1/cryptography-46.0.6-cp38-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:8927ccfbe967c7df312ade694f987e7e9e22b2425976ddbf28271d7e58845290", size = 4271265 }, - { url = "https://files.pythonhosted.org/packages/89/06/fe1fce39a37ac452e58d04b43b0855261dac320a2ebf8f5260dd55b201a9/cryptography-46.0.6-cp38-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:b12c6b1e1651e42ab5de8b1e00dc3b6354fdfd778e7fa60541ddacc27cd21410", size = 4916800 }, - { url = "https://files.pythonhosted.org/packages/ff/8a/b14f3101fe9c3592603339eb5d94046c3ce5f7fc76d6512a2d40efd9724e/cryptography-46.0.6-cp38-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:063b67749f338ca9c5a0b7fe438a52c25f9526b851e24e6c9310e7195aad3b4d", size = 4448771 }, - { url = "https://files.pythonhosted.org/packages/01/b3/0796998056a66d1973fd52ee89dc1bb3b6581960a91ad4ac705f182d398f/cryptography-46.0.6-cp38-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:02fad249cb0e090b574e30b276a3da6a149e04ee2f049725b1f69e7b8351ec70", size = 3978333 }, - { url = "https://files.pythonhosted.org/packages/c5/3d/db200af5a4ffd08918cd55c08399dc6c9c50b0bc72c00a3246e099d3a849/cryptography-46.0.6-cp38-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:7e6142674f2a9291463e5e150090b95a8519b2fb6e6aaec8917dd8d094ce750d", size = 4271069 }, - { url = "https://files.pythonhosted.org/packages/d7/18/61acfd5b414309d74ee838be321c636fe71815436f53c9f0334bf19064fa/cryptography-46.0.6-cp38-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:456b3215172aeefb9284550b162801d62f5f264a081049a3e94307fe20792cfa", size = 4878358 }, - { url = "https://files.pythonhosted.org/packages/8b/65/5bf43286d566f8171917cae23ac6add941654ccf085d739195a4eacf1674/cryptography-46.0.6-cp38-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:341359d6c9e68834e204ceaf25936dffeafea3829ab80e9503860dcc4f4dac58", size = 4448061 }, - { url = "https://files.pythonhosted.org/packages/e0/25/7e49c0fa7205cf3597e525d156a6bce5b5c9de1fd7e8cb01120e459f205a/cryptography-46.0.6-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:9a9c42a2723999a710445bc0d974e345c32adfd8d2fac6d8a251fa829ad31cfb", size = 4399103 }, - { url = "https://files.pythonhosted.org/packages/44/46/466269e833f1c4718d6cd496ffe20c56c9c8d013486ff66b4f69c302a68d/cryptography-46.0.6-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:6617f67b1606dfd9fe4dbfa354a9508d4a6d37afe30306fe6c101b7ce3274b72", size = 4659255 }, - { url = "https://files.pythonhosted.org/packages/0a/09/ddc5f630cc32287d2c953fc5d32705e63ec73e37308e5120955316f53827/cryptography-46.0.6-cp38-abi3-win32.whl", hash = "sha256:7f6690b6c55e9c5332c0b59b9c8a3fb232ebf059094c17f9019a51e9827df91c", size = 3010660 }, - { url = "https://files.pythonhosted.org/packages/1b/82/ca4893968aeb2709aacfb57a30dec6fa2ab25b10fa9f064b8882ce33f599/cryptography-46.0.6-cp38-abi3-win_amd64.whl", hash = "sha256:79e865c642cfc5c0b3eb12af83c35c5aeff4fa5c672dc28c43721c2c9fdd2f0f", size = 3471160 }, +sdist = { url = "https://files.pythonhosted.org/packages/47/93/ac8f3d5ff04d54bc814e961a43ae5b0b146154c89c61b47bb07557679b18/cryptography-46.0.7.tar.gz", hash = "sha256:e4cfd68c5f3e0bfdad0d38e023239b96a2fe84146481852dffbcca442c245aa5", size = 750652 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/0b/5d/4a8f770695d73be252331e60e526291e3df0c9b27556a90a6b47bccca4c2/cryptography-46.0.7-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:ea42cbe97209df307fdc3b155f1b6fa2577c0defa8f1f7d3be7d31d189108ad4", size = 7179869 }, + { url = "https://files.pythonhosted.org/packages/5f/45/6d80dc379b0bbc1f9d1e429f42e4cb9e1d319c7a8201beffd967c516ea01/cryptography-46.0.7-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:b36a4695e29fe69215d75960b22577197aca3f7a25b9cf9d165dcfe9d80bc325", size = 4275492 }, + { url = "https://files.pythonhosted.org/packages/4a/9a/1765afe9f572e239c3469f2cb429f3ba7b31878c893b246b4b2994ffe2fe/cryptography-46.0.7-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5ad9ef796328c5e3c4ceed237a183f5d41d21150f972455a9d926593a1dcb308", size = 4426670 }, + { url = "https://files.pythonhosted.org/packages/8f/3e/af9246aaf23cd4ee060699adab1e47ced3f5f7e7a8ffdd339f817b446462/cryptography-46.0.7-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:73510b83623e080a2c35c62c15298096e2a5dc8d51c3b4e1740211839d0dea77", size = 4280275 }, + { url = "https://files.pythonhosted.org/packages/0f/54/6bbbfc5efe86f9d71041827b793c24811a017c6ac0fd12883e4caa86b8ed/cryptography-46.0.7-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:cbd5fb06b62bd0721e1170273d3f4d5a277044c47ca27ee257025146c34cbdd1", size = 4928402 }, + { url = "https://files.pythonhosted.org/packages/2d/cf/054b9d8220f81509939599c8bdbc0c408dbd2bdd41688616a20731371fe0/cryptography-46.0.7-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:420b1e4109cc95f0e5700eed79908cef9268265c773d3a66f7af1eef53d409ef", size = 4459985 }, + { url = "https://files.pythonhosted.org/packages/f9/46/4e4e9c6040fb01c7467d47217d2f882daddeb8828f7df800cb806d8a2288/cryptography-46.0.7-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:24402210aa54baae71d99441d15bb5a1919c195398a87b563df84468160a65de", size = 3990652 }, + { url = "https://files.pythonhosted.org/packages/36/5f/313586c3be5a2fbe87e4c9a254207b860155a8e1f3cca99f9910008e7d08/cryptography-46.0.7-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:8a469028a86f12eb7d2fe97162d0634026d92a21f3ae0ac87ed1c4a447886c83", size = 4279805 }, + { url = "https://files.pythonhosted.org/packages/69/33/60dfc4595f334a2082749673386a4d05e4f0cf4df8248e63b2c3437585f2/cryptography-46.0.7-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:9694078c5d44c157ef3162e3bf3946510b857df5a3955458381d1c7cfc143ddb", size = 4892883 }, + { url = "https://files.pythonhosted.org/packages/c7/0b/333ddab4270c4f5b972f980adef4faa66951a4aaf646ca067af597f15563/cryptography-46.0.7-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:42a1e5f98abb6391717978baf9f90dc28a743b7d9be7f0751a6f56a75d14065b", size = 4459756 }, + { url = "https://files.pythonhosted.org/packages/d2/14/633913398b43b75f1234834170947957c6b623d1701ffc7a9600da907e89/cryptography-46.0.7-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:91bbcb08347344f810cbe49065914fe048949648f6bd5c2519f34619142bbe85", size = 4410244 }, + { url = "https://files.pythonhosted.org/packages/10/f2/19ceb3b3dc14009373432af0c13f46aa08e3ce334ec6eff13492e1812ccd/cryptography-46.0.7-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:5d1c02a14ceb9148cc7816249f64f623fbfee39e8c03b3650d842ad3f34d637e", size = 4674868 }, + { url = "https://files.pythonhosted.org/packages/1a/bb/a5c213c19ee94b15dfccc48f363738633a493812687f5567addbcbba9f6f/cryptography-46.0.7-cp311-abi3-win32.whl", hash = "sha256:d23c8ca48e44ee015cd0a54aeccdf9f09004eba9fc96f38c911011d9ff1bd457", size = 3026504 }, + { url = "https://files.pythonhosted.org/packages/2b/02/7788f9fefa1d060ca68717c3901ae7fffa21ee087a90b7f23c7a603c32ae/cryptography-46.0.7-cp311-abi3-win_amd64.whl", hash = "sha256:397655da831414d165029da9bc483bed2fe0e75dde6a1523ec2fe63f3c46046b", size = 3488363 }, + { url = "https://files.pythonhosted.org/packages/7b/56/15619b210e689c5403bb0540e4cb7dbf11a6bf42e483b7644e471a2812b3/cryptography-46.0.7-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:d151173275e1728cf7839aaa80c34fe550c04ddb27b34f48c232193df8db5842", size = 7119671 }, + { url = "https://files.pythonhosted.org/packages/74/66/e3ce040721b0b5599e175ba91ab08884c75928fbeb74597dd10ef13505d2/cryptography-46.0.7-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:db0f493b9181c7820c8134437eb8b0b4792085d37dbb24da050476ccb664e59c", size = 4268551 }, + { url = "https://files.pythonhosted.org/packages/03/11/5e395f961d6868269835dee1bafec6a1ac176505a167f68b7d8818431068/cryptography-46.0.7-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:ebd6daf519b9f189f85c479427bbd6e9c9037862cf8fe89ee35503bd209ed902", size = 4408887 }, + { url = "https://files.pythonhosted.org/packages/40/53/8ed1cf4c3b9c8e611e7122fb56f1c32d09e1fff0f1d77e78d9ff7c82653e/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:b7b412817be92117ec5ed95f880defe9cf18a832e8cafacf0a22337dc1981b4d", size = 4271354 }, + { url = "https://files.pythonhosted.org/packages/50/46/cf71e26025c2e767c5609162c866a78e8a2915bbcfa408b7ca495c6140c4/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:fbfd0e5f273877695cb93baf14b185f4878128b250cc9f8e617ea0c025dfb022", size = 4905845 }, + { url = "https://files.pythonhosted.org/packages/c0/ea/01276740375bac6249d0a971ebdf6b4dc9ead0ee0a34ef3b5a88c1a9b0d4/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:ffca7aa1d00cf7d6469b988c581598f2259e46215e0140af408966a24cf086ce", size = 4444641 }, + { url = "https://files.pythonhosted.org/packages/3d/4c/7d258f169ae71230f25d9f3d06caabcff8c3baf0978e2b7d65e0acac3827/cryptography-46.0.7-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:60627cf07e0d9274338521205899337c5d18249db56865f943cbe753aa96f40f", size = 3967749 }, + { url = "https://files.pythonhosted.org/packages/b5/2a/2ea0767cad19e71b3530e4cad9605d0b5e338b6a1e72c37c9c1ceb86c333/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:80406c3065e2c55d7f49a9550fe0c49b3f12e5bfff5dedb727e319e1afb9bf99", size = 4270942 }, + { url = "https://files.pythonhosted.org/packages/41/3d/fe14df95a83319af25717677e956567a105bb6ab25641acaa093db79975d/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:c5b1ccd1239f48b7151a65bc6dd54bcfcc15e028c8ac126d3fada09db0e07ef1", size = 4871079 }, + { url = "https://files.pythonhosted.org/packages/9c/59/4a479e0f36f8f378d397f4eab4c850b4ffb79a2f0d58704b8fa0703ddc11/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:d5f7520159cd9c2154eb61eb67548ca05c5774d39e9c2c4339fd793fe7d097b2", size = 4443999 }, + { url = "https://files.pythonhosted.org/packages/28/17/b59a741645822ec6d04732b43c5d35e4ef58be7bfa84a81e5ae6f05a1d33/cryptography-46.0.7-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:fcd8eac50d9138c1d7fc53a653ba60a2bee81a505f9f8850b6b2888555a45d0e", size = 4399191 }, + { url = "https://files.pythonhosted.org/packages/59/6a/bb2e166d6d0e0955f1e9ff70f10ec4b2824c9cfcdb4da772c7dd69cc7d80/cryptography-46.0.7-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:65814c60f8cc400c63131584e3e1fad01235edba2614b61fbfbfa954082db0ee", size = 4655782 }, + { url = "https://files.pythonhosted.org/packages/95/b6/3da51d48415bcb63b00dc17c2eff3a651b7c4fed484308d0f19b30e8cb2c/cryptography-46.0.7-cp314-cp314t-win32.whl", hash = "sha256:fdd1736fed309b4300346f88f74cd120c27c56852c3838cab416e7a166f67298", size = 3002227 }, + { url = "https://files.pythonhosted.org/packages/32/a8/9f0e4ed57ec9cebe506e58db11ae472972ecb0c659e4d52bbaee80ca340a/cryptography-46.0.7-cp314-cp314t-win_amd64.whl", hash = "sha256:e06acf3c99be55aa3b516397fe42f5855597f430add9c17fa46bf2e0fb34c9bb", size = 3475332 }, + { url = "https://files.pythonhosted.org/packages/a7/7f/cd42fc3614386bc0c12f0cb3c4ae1fc2bbca5c9662dfed031514911d513d/cryptography-46.0.7-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:462ad5cb1c148a22b2e3bcc5ad52504dff325d17daf5df8d88c17dda1f75f2a4", size = 7165618 }, + { url = "https://files.pythonhosted.org/packages/a5/d0/36a49f0262d2319139d2829f773f1b97ef8aef7f97e6e5bd21455e5a8fb5/cryptography-46.0.7-cp38-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:84d4cced91f0f159a7ddacad249cc077e63195c36aac40b4150e7a57e84fffe7", size = 4270628 }, + { url = "https://files.pythonhosted.org/packages/8a/6c/1a42450f464dda6ffbe578a911f773e54dd48c10f9895a23a7e88b3e7db5/cryptography-46.0.7-cp38-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:128c5edfe5e5938b86b03941e94fac9ee793a94452ad1365c9fc3f4f62216832", size = 4415405 }, + { url = "https://files.pythonhosted.org/packages/9a/92/4ed714dbe93a066dc1f4b4581a464d2d7dbec9046f7c8b7016f5286329e2/cryptography-46.0.7-cp38-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:5e51be372b26ef4ba3de3c167cd3d1022934bc838ae9eaad7e644986d2a3d163", size = 4272715 }, + { url = "https://files.pythonhosted.org/packages/b7/e6/a26b84096eddd51494bba19111f8fffe976f6a09f132706f8f1bf03f51f7/cryptography-46.0.7-cp38-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:cdf1a610ef82abb396451862739e3fc93b071c844399e15b90726ef7470eeaf2", size = 4918400 }, + { url = "https://files.pythonhosted.org/packages/c7/08/ffd537b605568a148543ac3c2b239708ae0bd635064bab41359252ef88ed/cryptography-46.0.7-cp38-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:1d25aee46d0c6f1a501adcddb2d2fee4b979381346a78558ed13e50aa8a59067", size = 4450634 }, + { url = "https://files.pythonhosted.org/packages/16/01/0cd51dd86ab5b9befe0d031e276510491976c3a80e9f6e31810cce46c4ad/cryptography-46.0.7-cp38-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:cdfbe22376065ffcf8be74dc9a909f032df19bc58a699456a21712d6e5eabfd0", size = 3985233 }, + { url = "https://files.pythonhosted.org/packages/92/49/819d6ed3a7d9349c2939f81b500a738cb733ab62fbecdbc1e38e83d45e12/cryptography-46.0.7-cp38-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:abad9dac36cbf55de6eb49badd4016806b3165d396f64925bf2999bcb67837ba", size = 4271955 }, + { url = "https://files.pythonhosted.org/packages/80/07/ad9b3c56ebb95ed2473d46df0847357e01583f4c52a85754d1a55e29e4d0/cryptography-46.0.7-cp38-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:935ce7e3cfdb53e3536119a542b839bb94ec1ad081013e9ab9b7cfd478b05006", size = 4879888 }, + { url = "https://files.pythonhosted.org/packages/b8/c7/201d3d58f30c4c2bdbe9b03844c291feb77c20511cc3586daf7edc12a47b/cryptography-46.0.7-cp38-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:35719dc79d4730d30f1c2b6474bd6acda36ae2dfae1e3c16f2051f215df33ce0", size = 4449961 }, + { url = "https://files.pythonhosted.org/packages/a5/ef/649750cbf96f3033c3c976e112265c33906f8e462291a33d77f90356548c/cryptography-46.0.7-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:7bbc6ccf49d05ac8f7d7b5e2e2c33830d4fe2061def88210a126d130d7f71a85", size = 4401696 }, + { url = "https://files.pythonhosted.org/packages/41/52/a8908dcb1a389a459a29008c29966c1d552588d4ae6d43f3a1a4512e0ebe/cryptography-46.0.7-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:a1529d614f44b863a7b480c6d000fe93b59acee9c82ffa027cfadc77521a9f5e", size = 4664256 }, + { url = "https://files.pythonhosted.org/packages/4b/fa/f0ab06238e899cc3fb332623f337a7364f36f4bb3f2534c2bb95a35b132c/cryptography-46.0.7-cp38-abi3-win32.whl", hash = "sha256:f247c8c1a1fb45e12586afbb436ef21ff1e80670b2861a90353d9b025583d246", size = 3013001 }, + { url = "https://files.pythonhosted.org/packages/d2/f1/00ce3bde3ca542d1acd8f8cfa38e446840945aa6363f9b74746394b14127/cryptography-46.0.7-cp38-abi3-win_amd64.whl", hash = "sha256:506c4ff91eff4f82bdac7633318a526b1d1309fc07ca76a3ad182cb5b686d6d3", size = 3472985 }, ] [[package]] diff --git a/cdk/src/handlers/orchestrate-task.ts b/cdk/src/handlers/orchestrate-task.ts index e1f066c..a93b8e8 100644 --- a/cdk/src/handlers/orchestrate-task.ts +++ b/cdk/src/handlers/orchestrate-task.ts @@ -84,7 +84,7 @@ const durableHandler: DurableExecutionHandler = asyn if (TERMINAL_STATUSES.includes(current.status)) { return false; } - const result = await runPreflightChecks(task.repo, blueprintConfig); + const result = await runPreflightChecks(task.repo, blueprintConfig, task.pr_number); if (!result.passed) { const errorMessage = `Pre-flight check failed: ${result.failureReason}${result.failureDetail ? ' — ' + result.failureDetail : ''}`; await failTask(taskId, current.status, errorMessage, task.user_id, true); diff --git a/cdk/src/handlers/shared/context-hydration.ts b/cdk/src/handlers/shared/context-hydration.ts index 55e357d..950c391 100644 --- a/cdk/src/handlers/shared/context-hydration.ts +++ b/cdk/src/handlers/shared/context-hydration.ts @@ -20,7 +20,7 @@ import { GetSecretValueCommand, SecretsManagerClient } from '@aws-sdk/client-secrets-manager'; import { logger } from './logger'; import { loadMemoryContext, type MemoryContext } from './memory'; -import type { TaskRecord } from './types'; +import { isPrTaskType, type TaskRecord, type TaskType } from './types'; // --------------------------------------------------------------------------- // Types @@ -30,6 +30,7 @@ import type { TaskRecord } from './types'; * A single comment on a GitHub issue. */ export interface IssueComment { + readonly id: number; readonly author: string; readonly body: string; } @@ -44,6 +45,34 @@ export interface GitHubIssueContext { readonly comments: IssueComment[]; } +/** + * A review comment on a GitHub pull request. + */ +export interface PullRequestReviewComment { + readonly id: number; + readonly in_reply_to_id?: number; + readonly author: string; + readonly body: string; + readonly path?: string; + readonly line?: number; + readonly diff_hunk?: string; +} + +/** + * GitHub pull request context fetched from the GitHub API. + */ +export interface GitHubPullRequestContext { + readonly number: number; + readonly title: string; + readonly body: string; + readonly head_ref: string; + readonly base_ref: string; + readonly state: string; + readonly diff_summary: string; + readonly review_comments: PullRequestReviewComment[]; + readonly issue_comments: IssueComment[]; +} + /** * The result of the context hydration pipeline. */ @@ -56,6 +85,8 @@ export interface HydratedContext { readonly token_estimate: number; readonly truncated: boolean; readonly fallback_error?: string; + readonly resolved_branch_name?: string; + readonly resolved_base_branch?: string; } // --------------------------------------------------------------------------- @@ -150,11 +181,23 @@ export async function fetchGitHubIssue( if (commentsResp.ok) { const raw = await commentsResp.json() as Array>; for (const c of raw) { + if (typeof c.id !== 'number') { + logger.warn('Skipping issue comment with missing or non-numeric id', { + repo, issue_number: issueNumber, raw_id: String(c.id), + }); + continue; + } comments.push({ - author: (c.user as Record)?.login as string ?? 'unknown', + id: c.id, + author: (c.user as Record)?.login as string || 'unknown', body: c.body as string ?? '', }); } + if (raw.length > 0 && comments.length === 0) { + logger.error('All issue comments skipped due to invalid IDs — possible API response format change', { + repo, issue_number: issueNumber, total_raw: raw.length, + }); + } } else { logger.warn('GitHub comments fetch failed', { repo, issue_number: issueNumber, status: commentsResp.status, @@ -176,6 +219,268 @@ export async function fetchGitHubIssue( } } +// --------------------------------------------------------------------------- +// GraphQL review threads (filters resolved threads at fetch time) +// --------------------------------------------------------------------------- + +const REVIEW_THREADS_QUERY = ` +query($owner: String!, $repo: String!, $prNumber: Int!, $threadCursor: String) { + repository(owner: $owner, name: $repo) { + pullRequest(number: $prNumber) { + reviewThreads(first: 100, after: $threadCursor) { + pageInfo { hasNextPage endCursor } + nodes { + isResolved + comments(first: 100) { + nodes { + databaseId + author { login } + body + path + line + diffHunk + } + } + } + } + } + } +}`; + +async function fetchReviewCommentsGraphQL( + repo: string, + prNumber: number, + token: string, +): Promise { + const [owner, repoName] = repo.split('/'); + const comments: PullRequestReviewComment[] = []; + let cursor: string | null = null; + + try { + // eslint-disable-next-line no-constant-condition + while (true) { + const resp = await fetch('https://api.github.com/graphql', { + method: 'POST', + headers: { + 'Authorization': `bearer ${token}`, + 'Content-Type': 'application/json', + }, + body: JSON.stringify({ + query: REVIEW_THREADS_QUERY, + variables: { owner, repo: repoName, prNumber, threadCursor: cursor }, + }), + signal: AbortSignal.timeout(GITHUB_API_TIMEOUT_MS), + }); + + if (!resp.ok) { + logger.warn('GitHub GraphQL review threads fetch failed', { + repo, pr_number: prNumber, status: resp.status, + }); + return []; + } + + const json = await resp.json() as Record; + + if (json.errors) { + logger.warn('GitHub GraphQL review threads returned errors', { + repo, pr_number: prNumber, errors: JSON.stringify(json.errors), + }); + return []; + } + + const data = json.data as Record | undefined; + const repository = (data?.repository ?? null) as Record | null; + const pullRequest = (repository?.pullRequest ?? null) as Record | null; + const reviewThreads = (pullRequest?.reviewThreads ?? null) as Record | null; + if (!reviewThreads) { + logger.warn('GitHub GraphQL response missing expected review threads structure', { + repo, pr_number: prNumber, + }); + return []; + } + const pageInfo = reviewThreads.pageInfo as { hasNextPage: boolean; endCursor: string }; + const nodes = reviewThreads.nodes as Array>; + + for (const thread of nodes) { + if (thread.isResolved === true) { + continue; + } + + const threadComments = thread.comments as Record; + const commentNodes = threadComments.nodes as Array>; + + if (commentNodes.length >= 100) { + logger.warn('Review thread has 100+ comments — inner pagination not implemented', { + repo, pr_number: prNumber, + }); + } + + let rootId: number | undefined; + for (const c of commentNodes) { + const databaseId = c.databaseId; + if (typeof databaseId !== 'number') { + logger.warn('Skipping review comment with missing or non-numeric databaseId', { + repo, pr_number: prNumber, raw_id: String(databaseId), + }); + continue; + } + + if (rootId === undefined) { + rootId = databaseId; + } + + const author = c.author as Record | null | undefined; + comments.push({ + id: databaseId, + in_reply_to_id: databaseId === rootId ? undefined : rootId, + author: (author?.login as string) || 'unknown', + body: (c.body as string) ?? '', + path: c.path as string | undefined, + line: c.line as number | undefined, + diff_hunk: c.diffHunk as string | undefined, + }); + } + } + + if (!pageInfo.hasNextPage) { + break; + } + cursor = pageInfo.endCursor; + } + } catch (err) { + logger.warn('GitHub GraphQL review threads fetch error', { + repo, pr_number: prNumber, error: err instanceof Error ? err.message : String(err), + }); + return []; + } + + return comments; +} + +// --------------------------------------------------------------------------- +// GitHub pull request fetching +// --------------------------------------------------------------------------- + +/** + * Fetch a GitHub pull request's metadata (REST), review comments (GraphQL, filters + * resolved threads), issue comments (REST), and diff summary (REST). + * Returns null on any error (logged). + * @param repo - the "owner/repo" string. + * @param prNumber - the PR number. + * @param token - the GitHub PAT. + * @returns the PR context or null on failure. + */ +export async function fetchGitHubPullRequest( + repo: string, + prNumber: number, + token: string, +): Promise { + const headers: Record = { + Authorization: `token ${token}`, + Accept: 'application/vnd.github.v3+json', + }; + + try { + // Fetch PR metadata (REST), review comments (GraphQL), issue comments (REST), and files (REST) in parallel + // eslint-disable-next-line @cdklabs/promiseall-no-unbounded-parallelism + const [prResp, reviewComments, issueResp, filesResp] = await Promise.all([ + fetch(`https://api.github.com/repos/${repo}/pulls/${prNumber}`, { + headers, signal: AbortSignal.timeout(GITHUB_API_TIMEOUT_MS), + }), + fetchReviewCommentsGraphQL(repo, prNumber, token), + fetch(`https://api.github.com/repos/${repo}/issues/${prNumber}/comments?per_page=100`, { + headers, signal: AbortSignal.timeout(GITHUB_API_TIMEOUT_MS), + }), + fetch(`https://api.github.com/repos/${repo}/pulls/${prNumber}/files?per_page=100`, { + headers, signal: AbortSignal.timeout(GITHUB_API_TIMEOUT_MS), + }), + ]); + + if (!prResp.ok) { + logger.warn('GitHub PR fetch failed', { repo, pr_number: prNumber, status: prResp.status }); + return null; + } + + const pr = await prResp.json() as Record; + + // Parse issue/conversation comments + const issueComments: IssueComment[] = []; + if (issueResp.ok) { + const raw = await issueResp.json() as Array>; + for (const c of raw) { + if (typeof c.id !== 'number') { + logger.warn('Skipping conversation comment with missing or non-numeric id', { + repo, pr_number: prNumber, raw_id: String(c.id), + }); + continue; + } + issueComments.push({ + id: c.id, + author: (c.user as Record)?.login as string || 'unknown', + body: c.body as string ?? '', + }); + } + if (raw.length > 0 && issueComments.length === 0) { + logger.error('All conversation comments skipped due to invalid IDs — possible API response format change', { + repo, pr_number: prNumber, total_raw: raw.length, + }); + } + } else { + logger.warn('GitHub PR conversation comments fetch failed', { + repo, pr_number: prNumber, status: issueResp.status, + }); + } + + // Build diff summary from files + let diffSummary = ''; + if (filesResp.ok) { + const files = await filesResp.json() as Array>; + const fileParts: string[] = []; + for (const f of files) { + const filename = f.filename as string; + const status = f.status as string; + const additions = f.additions as number; + const deletions = f.deletions as number; + const patch = (f.patch as string | undefined) ?? ''; + const truncatedPatch = patch.length > 500 ? patch.slice(0, 500) + '\n... [truncated]' : patch; + fileParts.push(`### ${filename} (${status}, +${additions}/-${deletions})\n\`\`\`diff\n${truncatedPatch}\n\`\`\``); + } + diffSummary = fileParts.join('\n\n'); + } else { + logger.warn('GitHub PR files fetch failed', { + repo, pr_number: prNumber, status: filesResp.status, + }); + } + + // Validate critical nested fields before accessing + const head = pr.head as Record | null | undefined; + const base = pr.base as Record | null | undefined; + if (!head?.ref || !base?.ref) { + logger.warn('PR missing head_ref or base_ref (possibly deleted fork)', { + repo, pr_number: prNumber, has_head: !!head?.ref, has_base: !!base?.ref, + }); + return null; + } + + return { + number: pr.number as number, + title: pr.title as string, + body: (pr.body as string) ?? '', + head_ref: head.ref as string, + base_ref: base.ref as string, + state: pr.state as string, + diff_summary: diffSummary, + review_comments: reviewComments, + issue_comments: issueComments, + }; + } catch (err) { + logger.warn('GitHub PR fetch error', { + repo, pr_number: prNumber, error: err instanceof Error ? err.message : String(err), + }); + return null; + } +} + // --------------------------------------------------------------------------- // Token estimation and budget enforcement // --------------------------------------------------------------------------- @@ -287,6 +592,114 @@ export function assembleUserPrompt( return parts.join('\n'); } +/** + * Assemble the user prompt for a PR iteration task. + * @param taskId - the task ID. + * @param repo - the "owner/repo" string. + * @param pr - the GitHub PR context. + * @param taskDescription - optional additional user instructions. + * @returns the assembled user prompt. + */ +export function assemblePrIterationPrompt( + taskId: string, + repo: string, + pr: GitHubPullRequestContext, + taskDescription?: string, +): string { + const parts: string[] = []; + + parts.push(`Task ID: ${taskId}`); + parts.push(`Repository: ${repo}`); + parts.push(`\n## Pull Request #${pr.number}: ${pr.title}\n`); + parts.push(pr.body || '(no description)'); + parts.push(`\nBase branch: ${pr.base_ref}`); + parts.push(`Head branch: ${pr.head_ref}`); + + if (pr.review_comments.length > 0) { + parts.push('\n### Review Comments\n'); + + // Group review comments into threads using in_reply_to_id + const rootComments = new Map(); + const replies = new Map(); + + for (const c of pr.review_comments) { + if (c.in_reply_to_id === undefined) { + // Top-level comment (thread root) + if (rootComments.has(c.id)) { + logger.warn('Duplicate root comment id detected — keeping first occurrence', { + comment_id: c.id, existing_author: rootComments.get(c.id)!.author, duplicate_author: c.author, + }); + continue; + } + rootComments.set(c.id, c); + if (!replies.has(c.id)) { + replies.set(c.id, []); + } + } else { + // Reply to an existing thread + const rootId = c.in_reply_to_id; + if (!replies.has(rootId)) { + replies.set(rootId, []); + } + replies.get(rootId)!.push(c); + } + } + + // Render threads rooted by known top-level comments + for (const [rootId, root] of rootComments) { + const location = root.path ? `\`${root.path}${root.line ? `:${root.line}` : ''}\`` : 'general'; + parts.push(`**Thread on ${location}** (reply with comment_id: ${rootId})`); + parts.push(`> **@${root.author}**: ${root.body}`); + if (root.diff_hunk) { + parts.push(`> \`\`\`diff\n> ${root.diff_hunk}\n> \`\`\``); + } + const threadReplies = replies.get(rootId) ?? []; + for (const r of threadReplies) { + parts.push(`\n - **@${r.author}**: ${r.body}`); + } + parts.push(''); + } + + // Render orphan replies (in_reply_to_id points to a root not in our fetched set) + for (const [rootId, orphanReplies] of replies) { + if (rootComments.has(rootId)) continue; + for (const r of orphanReplies) { + const location = r.path ? `\`${r.path}${r.line ? `:${r.line}` : ''}\`` : 'general'; + const replyTarget = r.in_reply_to_id ?? r.id; + parts.push(`**Comment on ${location}** (reply with comment_id: ${replyTarget})`); + parts.push(`> **@${r.author}**: ${r.body}`); + if (r.diff_hunk) { + parts.push(`> \`\`\`diff\n> ${r.diff_hunk}\n> \`\`\``); + } + parts.push(''); + } + } + } + + if (pr.issue_comments.length > 0) { + parts.push('\n### Conversation Comments\n'); + for (const c of pr.issue_comments) { + parts.push(`**@${c.author}** (comment_id: ${c.id}): ${c.body}\n`); + } + } + + if (pr.diff_summary) { + parts.push('\n### Current Diff\n'); + parts.push(pr.diff_summary); + } + + if (taskDescription) { + parts.push(`\n## Additional Instructions\n\n${taskDescription}`); + } else { + parts.push( + '\n## Task\n\nAddress the review feedback on this pull request. ' + + 'Follow the workflow in your system instructions.', + ); + } + + return parts.join('\n'); +} + // --------------------------------------------------------------------------- // Main hydration pipeline // --------------------------------------------------------------------------- @@ -314,14 +727,17 @@ export async function hydrateContext(task: TaskRecord, options?: HydrateContextO let memoryContext: MemoryContext | undefined; try { - // Fetch GitHub issue and memory context in parallel + // Fetch GitHub issue, memory context, and PR context in parallel const memoryId = options?.memoryId ?? process.env.MEMORY_ID; const tokenSecretArn = options?.githubTokenSecretArn ?? GITHUB_TOKEN_SECRET_ARN; + const isPrTask = isPrTaskType(task.task_type as TaskType); + // eslint-disable-next-line @cdklabs/promiseall-no-unbounded-parallelism - const [issueResult, memoryResult] = await Promise.all([ - // Issue fetch + const [issueResult, memoryResult, prResult] = await Promise.all([ + // Issue fetch (skip for PR task types) (async () => { + if (isPrTask) return undefined; if (task.issue_number !== undefined && tokenSecretArn) { try { const token = await resolveGitHubToken(tokenSecretArn); @@ -338,6 +754,22 @@ export async function hydrateContext(task: TaskRecord, options?: HydrateContextO memoryId ? loadMemoryContext(memoryId, task.repo, task.task_description) : Promise.resolve(undefined), + // PR fetch (only for PR task types) + (async () => { + if (isPrTask && task.pr_number !== undefined && tokenSecretArn) { + try { + const token = await resolveGitHubToken(tokenSecretArn); + return await fetchGitHubPullRequest(task.repo, task.pr_number, token) ?? undefined; + } catch (err) { + logger.warn('Failed to fetch PR context', { + task_id: task.task_id, + pr_number: task.pr_number, + error: err instanceof Error ? err.message : String(err), + }); + } + } + return undefined; + })(), ]); issue = issueResult; @@ -346,6 +778,9 @@ export async function hydrateContext(task: TaskRecord, options?: HydrateContextO if (issue) { sources.push('issue'); } + if (prResult) { + sources.push('pull_request'); + } if (memoryContext) { sources.push('memory'); } @@ -353,12 +788,124 @@ export async function hydrateContext(task: TaskRecord, options?: HydrateContextO sources.push('task_description'); } - // Enforce token budget + // Build user prompt based on task type + let userPrompt: string; + let resolvedBranchName: string | undefined; + let resolvedBaseBranch: string | undefined; + + if (isPrTask) { + if (!prResult) { + // PR fetch failed — log error and return minimal context + logger.error(`PR context fetch failed for ${task.task_type} task`, { + task_id: task.task_id, pr_number: task.pr_number, task_type: task.task_type, + }); + const fallbackPrompt = assembleUserPrompt(task.task_id, task.repo, undefined, task.task_description); + return { + version: 1, + user_prompt: fallbackPrompt, + sources: task.task_description ? ['task_description'] : [], + token_estimate: estimateTokens(fallbackPrompt), + truncated: false, + fallback_error: `Failed to fetch PR #${task.pr_number} context from GitHub`, + }; + } + + // Enforce token budget on the assembled PR prompt + const budgetResult = enforceTokenBudget(undefined, task.task_description, USER_PROMPT_TOKEN_BUDGET); + let effectiveTaskDescription = budgetResult.taskDescription; + if (!effectiveTaskDescription && task.task_type === 'pr_review') { + logger.info('Using default task description for pr_review task', { task_id: task.task_id }); + effectiveTaskDescription = 'Review this pull request. Follow the workflow in your system instructions.'; + } + userPrompt = assemblePrIterationPrompt(task.task_id, task.repo, prResult, effectiveTaskDescription); + + // Trim PR context if the assembled prompt exceeds the token budget + let truncated = budgetResult.truncated; + const promptEstimate = estimateTokens(userPrompt); + if (promptEstimate > USER_PROMPT_TOKEN_BUDGET) { + logger.warn('PR task prompt exceeds token budget — trimming review comments', { + task_id: task.task_id, estimate: promptEstimate, budget: USER_PROMPT_TOKEN_BUDGET, + }); + // Build thread-grouped list so we can trim whole threads at a time + const threads: PullRequestReviewComment[][] = []; + const threadMap = new Map(); // root id -> index in threads[] + for (const c of prResult.review_comments) { + if (c.in_reply_to_id === undefined) { + threadMap.set(c.id, threads.length); + threads.push([c]); + } else { + const idx = threadMap.get(c.in_reply_to_id); + if (idx !== undefined) { + threads[idx].push(c); + } else { + // Orphan reply — treat as its own "thread" + threads.push([c]); + } + } + } + + const trimmedIssueComments = [...prResult.issue_comments]; + let trimmedReviewComments = prResult.review_comments; + let trimmedPr = { + ...prResult, + review_comments: trimmedReviewComments, + issue_comments: trimmedIssueComments, + }; + const estimateTrimmed = (): number => + estimateTokens(assemblePrIterationPrompt( + task.task_id, task.repo, trimmedPr, budgetResult.taskDescription, + )); + + // Trim oldest issue comments first + while (trimmedIssueComments.length > 0 && estimateTrimmed() > USER_PROMPT_TOKEN_BUDGET) { + trimmedIssueComments.shift(); + trimmedPr = { ...trimmedPr, issue_comments: trimmedIssueComments }; + } + + // Trim oldest review comment threads (root + all replies as a unit) + while (threads.length > 0 && estimateTrimmed() > USER_PROMPT_TOKEN_BUDGET) { + const removed = threads.shift()!; + logger.warn('Trimmed review comment thread to fit token budget', { + task_id: task.task_id, + removed_root_id: removed[0].id, + removed_count: removed.length, + }); + trimmedReviewComments = threads.flat(); + trimmedPr = { ...trimmedPr, review_comments: trimmedReviewComments }; + } + + userPrompt = assemblePrIterationPrompt(task.task_id, task.repo, trimmedPr, budgetResult.taskDescription); + const finalEstimate = estimateTokens(userPrompt); + if (finalEstimate > USER_PROMPT_TOKEN_BUDGET) { + logger.warn('Token budget still exceeded after trimming all comments — non-comment content too large', { + task_id: task.task_id, + final_estimate: finalEstimate, + budget: USER_PROMPT_TOKEN_BUDGET, + }); + } + truncated = true; + } + + resolvedBranchName = prResult.head_ref; + resolvedBaseBranch = prResult.base_ref; + + return { + version: 1, + user_prompt: userPrompt, + memory_context: memoryContext, + resolved_branch_name: resolvedBranchName, + resolved_base_branch: resolvedBaseBranch, + sources, + token_estimate: estimateTokens(userPrompt), + truncated, + }; + } + + // Standard task: existing behavior const budgetResult = enforceTokenBudget(issue, task.task_description, USER_PROMPT_TOKEN_BUDGET); issue = budgetResult.issue; - // Assemble user prompt - const userPrompt = assembleUserPrompt(task.task_id, task.repo, issue, budgetResult.taskDescription); + userPrompt = assembleUserPrompt(task.task_id, task.repo, issue, budgetResult.taskDescription); const tokenEstimate = estimateTokens(userPrompt); return { diff --git a/cdk/src/handlers/shared/create-task-core.ts b/cdk/src/handlers/shared/create-task-core.ts index dcd3487..2794afd 100644 --- a/cdk/src/handlers/shared/create-task-core.ts +++ b/cdk/src/handlers/shared/create-task-core.ts @@ -27,8 +27,8 @@ import { generateBranchName } from './gateway'; import { logger } from './logger'; import { checkRepoOnboarded } from './repo-config'; import { ErrorCode, errorResponse, successResponse } from './response'; -import { type CreateTaskRequest, type TaskRecord, toTaskDetail } from './types'; -import { computeTtlEpoch, DEFAULT_MAX_TURNS, hasTaskSpec, isValidIdempotencyKey, isValidRepo, isValidTaskDescriptionLength, MAX_TASK_DESCRIPTION_LENGTH, validateMaxBudgetUsd, validateMaxTurns } from './validation'; +import { type CreateTaskRequest, isPrTaskType, type TaskRecord, type TaskType, toTaskDetail } from './types'; +import { computeTtlEpoch, DEFAULT_MAX_TURNS, hasTaskSpec, isValidIdempotencyKey, isValidRepo, isValidTaskDescriptionLength, isValidTaskType, MAX_TASK_DESCRIPTION_LENGTH, validateMaxBudgetUsd, validateMaxTurns, validatePrNumber } from './validation'; import { TaskStatus } from '../../constructs/task-status'; /** @@ -76,6 +76,25 @@ export async function createTaskCore( return errorResponse(400, ErrorCode.VALIDATION_ERROR, 'At least one of issue_number or task_description is required.', requestId); } + // Validate task_type + if (!isValidTaskType(body.task_type)) { + return errorResponse(400, ErrorCode.VALIDATION_ERROR, 'Invalid task_type. Must be "new_task", "pr_iteration", or "pr_review".', requestId); + } + const taskType: TaskType = (body.task_type as TaskType) ?? 'new_task'; + const isPrTask = isPrTaskType(taskType); + + // Validate pr_number + const prNumberResult = validatePrNumber(body.pr_number); + if (prNumberResult === null) { + return errorResponse(400, ErrorCode.VALIDATION_ERROR, 'Invalid pr_number. Must be a positive integer.', requestId); + } + if (isPrTask && prNumberResult === undefined) { + return errorResponse(400, ErrorCode.VALIDATION_ERROR, `pr_number is required when task_type is "${taskType}".`, requestId); + } + if (!isPrTask && prNumberResult !== undefined) { + return errorResponse(400, ErrorCode.VALIDATION_ERROR, 'pr_number is only allowed when task_type is "pr_iteration" or "pr_review".', requestId); + } + if (body.task_description && !isValidTaskDescriptionLength(body.task_description)) { return errorResponse(400, ErrorCode.VALIDATION_ERROR, `task_description exceeds maximum length of ${MAX_TASK_DESCRIPTION_LENGTH} characters.`, requestId); } @@ -149,7 +168,9 @@ export async function createTaskCore( // 4. Generate identifiers and timestamps const taskId = ulid(); const now = new Date().toISOString(); - const branchName = generateBranchName(taskId, body.task_description ?? body.repo); + const branchName = isPrTask + ? 'pending:pr_resolution' + : generateBranchName(taskId, body.task_description ?? body.repo); // 5. Build task record const taskRecord: TaskRecord = { @@ -158,6 +179,8 @@ export async function createTaskCore( status: TaskStatus.SUBMITTED, repo: body.repo, ...(body.issue_number !== undefined && { issue_number: body.issue_number }), + task_type: taskType, + ...(prNumberResult !== undefined && { pr_number: prNumberResult }), ...(body.task_description !== undefined && { task_description: body.task_description }), branch_name: branchName, ...(userMaxTurns !== undefined && { max_turns: userMaxTurns }), diff --git a/cdk/src/handlers/shared/orchestrator.ts b/cdk/src/handlers/shared/orchestrator.ts index 2f4bcc6..4d59eeb 100644 --- a/cdk/src/handlers/shared/orchestrator.ts +++ b/cdk/src/handlers/shared/orchestrator.ts @@ -249,6 +249,25 @@ export async function hydrateAndTransition(task: TaskRecord, blueprintConfig?: B memoryId: MEMORY_ID, }); + // For PR iteration: resolve actual branch name from PR head_ref + if (hydratedContext.resolved_branch_name) { + try { + await ddb.send(new UpdateCommand({ + TableName: TABLE_NAME, + Key: { task_id: task.task_id }, + UpdateExpression: 'SET #bn = :bn, #ua = :now', + ExpressionAttributeNames: { '#bn': 'branch_name', '#ua': 'updated_at' }, + ExpressionAttributeValues: { ':bn': hydratedContext.resolved_branch_name, ':now': new Date().toISOString() }, + })); + } catch (err) { + logger.error('Failed to update branch_name from PR head_ref — task record will show stale placeholder', { + task_id: task.task_id, + resolved_branch: hydratedContext.resolved_branch_name, + error: err instanceof Error ? err.message : String(err), + }); + } + } + // Compute prompt version from the system prompt template + overrides // (deterministic parts only — excludes memory context which varies per invocation) const promptVersionInput = blueprintConfig?.system_prompt_overrides ?? ''; @@ -275,8 +294,11 @@ export async function hydrateAndTransition(task: TaskRecord, blueprintConfig?: B const payload: Record = { repo_url: task.repo, task_id: task.task_id, - branch_name: task.branch_name, + branch_name: hydratedContext.resolved_branch_name ?? task.branch_name, ...(task.issue_number !== undefined && { issue_number: String(task.issue_number) }), + task_type: task.task_type ?? 'new_task', + ...(task.pr_number !== undefined && { pr_number: task.pr_number }), + ...(hydratedContext.resolved_base_branch && { base_branch: hydratedContext.resolved_base_branch }), ...(task.task_description && { prompt: task.task_description }), max_turns: task.max_turns ?? blueprintConfig?.max_turns ?? DEFAULT_MAX_TURNS, ...(effectiveBudget !== undefined && { max_budget_usd: effectiveBudget }), diff --git a/cdk/src/handlers/shared/preflight.ts b/cdk/src/handlers/shared/preflight.ts index d4b7da9..48719c8 100644 --- a/cdk/src/handlers/shared/preflight.ts +++ b/cdk/src/handlers/shared/preflight.ts @@ -29,6 +29,7 @@ export const PreflightFailureReason = { GITHUB_UNREACHABLE: 'GITHUB_UNREACHABLE', REPO_NOT_FOUND_OR_NO_ACCESS: 'REPO_NOT_FOUND_OR_NO_ACCESS', RUNTIME_UNAVAILABLE: 'RUNTIME_UNAVAILABLE', + PR_NOT_FOUND_OR_CLOSED: 'PR_NOT_FOUND_OR_CLOSED', } as const; export type PreflightFailureReasonType = typeof PreflightFailureReason[keyof typeof PreflightFailureReason]; @@ -139,6 +140,51 @@ async function checkRepoAccess(repo: string, token: string): Promise { + const start = Date.now(); + try { + const resp = await fetch(`https://api.github.com/repos/${repo}/pulls/${prNumber}`, { + headers: { + Authorization: `token ${token}`, + Accept: 'application/vnd.github.v3+json', + }, + signal: AbortSignal.timeout(GITHUB_API_TIMEOUT_MS), + }); + const durationMs = Date.now() - start; + if (!resp.ok) { + return { + check: 'pr_accessible', + passed: false, + reason: PreflightFailureReason.PR_NOT_FOUND_OR_CLOSED, + detail: `GitHub API returned HTTP ${resp.status} for PR #${prNumber} in ${repo}`, + httpStatus: resp.status, + durationMs, + }; + } + const pr = await resp.json() as Record; + if (pr.state !== 'open') { + return { + check: 'pr_accessible', + passed: false, + reason: PreflightFailureReason.PR_NOT_FOUND_OR_CLOSED, + detail: `PR #${prNumber} in ${repo} is ${pr.state}, not open`, + durationMs, + }; + } + return { check: 'pr_accessible', passed: true, durationMs }; + } catch (err) { + const detail = err instanceof Error ? err.message : String(err); + logger.warn('PR accessibility check failed', { repo, pr_number: prNumber, error: detail }); + return { + check: 'pr_accessible', + passed: false, + reason: PreflightFailureReason.GITHUB_UNREACHABLE, + detail, + durationMs: Date.now() - start, + }; + } +} + async function checkRuntimeAvailability(): Promise { const start = Date.now(); return { check: 'runtime_availability', passed: true, durationMs: Date.now() - start }; @@ -148,7 +194,7 @@ async function checkRuntimeAvailability(): Promise { // Main pre-flight check runner // --------------------------------------------------------------------------- -export async function runPreflightChecks(repo: string, blueprintConfig: BlueprintConfig): Promise { +export async function runPreflightChecks(repo: string, blueprintConfig: BlueprintConfig, prNumber?: number): Promise { const checks: PreflightCheckResult[] = []; if (blueprintConfig.github_token_secret_arn) { @@ -180,6 +226,7 @@ export async function runPreflightChecks(repo: string, blueprintConfig: Blueprin const results = await Promise.allSettled([ checkGitHubReachability(token), checkRepoAccess(repo, token), + ...(prNumber !== undefined ? [checkPrAccessible(repo, prNumber, token)] : []), ]); for (const result of results) { diff --git a/cdk/src/handlers/shared/types.ts b/cdk/src/handlers/shared/types.ts index 2eab816..be5a7e0 100644 --- a/cdk/src/handlers/shared/types.ts +++ b/cdk/src/handlers/shared/types.ts @@ -19,6 +19,14 @@ import type { TaskStatusType } from '../../constructs/task-status'; +/** Valid task types for task creation. */ +export type TaskType = 'new_task' | 'pr_iteration' | 'pr_review'; + +/** Task types that operate on an existing pull request. */ +export function isPrTaskType(taskType: TaskType): boolean { + return taskType === 'pr_iteration' || taskType === 'pr_review'; +} + /** * Full task record as stored in DynamoDB. */ @@ -28,6 +36,8 @@ export interface TaskRecord { readonly status: TaskStatusType; readonly repo: string; readonly issue_number?: number; + readonly task_type: TaskType; + readonly pr_number?: number; readonly task_description?: string; readonly branch_name: string; readonly session_id?: string; @@ -61,6 +71,8 @@ export interface TaskDetail { readonly status: TaskStatusType; readonly repo: string; readonly issue_number: number | null; + readonly task_type: TaskType; + readonly pr_number: number | null; readonly task_description: string | null; readonly branch_name: string; readonly session_id: string | null; @@ -86,6 +98,8 @@ export interface TaskSummary { readonly status: TaskStatusType; readonly repo: string; readonly issue_number: number | null; + readonly task_type: TaskType; + readonly pr_number: number | null; readonly task_description: string | null; readonly branch_name: string; readonly pr_url: string | null; @@ -114,6 +128,8 @@ export interface CreateTaskRequest { readonly task_description?: string; readonly max_turns?: number; readonly max_budget_usd?: number; + readonly task_type?: TaskType; + readonly pr_number?: number; readonly attachments?: Attachment[]; } @@ -139,6 +155,8 @@ export function toTaskDetail(record: TaskRecord): TaskDetail { status: record.status, repo: record.repo, issue_number: record.issue_number ?? null, + task_type: record.task_type ?? 'new_task', + pr_number: record.pr_number ?? null, task_description: record.task_description ?? null, branch_name: record.branch_name, session_id: record.session_id ?? null, @@ -227,6 +245,8 @@ export function toTaskSummary(record: TaskRecord): TaskSummary { status: record.status, repo: record.repo, issue_number: record.issue_number ?? null, + task_type: record.task_type ?? 'new_task', + pr_number: record.pr_number ?? null, task_description: record.task_description ?? null, branch_name: record.branch_name, pr_url: record.pr_url ?? null, diff --git a/cdk/src/handlers/shared/validation.ts b/cdk/src/handlers/shared/validation.ts index a8ca7f4..356c399 100644 --- a/cdk/src/handlers/shared/validation.ts +++ b/cdk/src/handlers/shared/validation.ts @@ -17,7 +17,7 @@ * SOFTWARE. */ -import type { CreateTaskRequest } from './types'; +import { isPrTaskType, type CreateTaskRequest, type TaskType } from './types'; import { TaskStatus } from '../../constructs/task-status'; /** Default maximum agent turns per task. */ @@ -64,9 +64,13 @@ export function isValidRepo(repo: string): boolean { /** * Validate that a create task request has at least one task specification. * @param req - the parsed create task request. - * @returns true if issue_number or task_description is provided. + * @returns true if the request has a sufficient task specification: + * issue_number or task_description for new_task; pr_number for pr_iteration or pr_review. */ export function hasTaskSpec(req: CreateTaskRequest): boolean { + if ((req.task_type === 'pr_iteration' || req.task_type === 'pr_review') && req.pr_number !== undefined && req.pr_number !== null) { + return true; + } return (req.issue_number !== undefined && req.issue_number !== null) || (req.task_description !== undefined && req.task_description !== null && req.task_description.trim().length > 0); } @@ -188,3 +192,33 @@ export function isValidTaskDescriptionLength(description: string): boolean { export function computeTtlEpoch(retentionDays: number): number { return Math.floor(Date.now() / 1000) + retentionDays * 86400; } + +/** Valid task type values. Compile-time check ensures this stays in sync with TaskType. */ +const TASK_TYPE_LIST = ['new_task', 'pr_iteration', 'pr_review'] as const satisfies readonly TaskType[]; +type _AssertExhaustive = Exclude extends never ? true : never; +const _exhaustiveCheck: _AssertExhaustive = true; // eslint-disable-line @typescript-eslint/no-unused-vars +export const VALID_TASK_TYPES = new Set(TASK_TYPE_LIST); + +/** + * Validate a task_type value from a request body. + * @param value - the raw value from the request. + * @returns true if the value is a valid task type or undefined/null (defaults to 'new_task'). + */ +export function isValidTaskType(value: unknown): boolean { + if (value === undefined || value === null) return true; + if (typeof value !== 'string') return false; + return VALID_TASK_TYPES.has(value); +} + +/** + * Validate a pr_number value from a request body. + * @param value - the raw value from the request. + * @returns the valid number, null if invalid (caller should return 400), or undefined if absent. + */ +export function validatePrNumber(value: unknown): number | null | undefined { + if (value === undefined || value === null) return undefined; + if (typeof value !== 'number') return null; + if (!Number.isInteger(value)) return null; + if (value < 1) return null; + return value; +} diff --git a/cdk/test/handlers/shared/context-hydration.test.ts b/cdk/test/handlers/shared/context-hydration.test.ts index 3ddec78..cc548fb 100644 --- a/cdk/test/handlers/shared/context-hydration.test.ts +++ b/cdk/test/handlers/shared/context-hydration.test.ts @@ -34,14 +34,17 @@ process.env.GITHUB_TOKEN_SECRET_ARN = 'arn:aws:secretsmanager:us-east-1:12345678 process.env.USER_PROMPT_TOKEN_BUDGET = '100000'; import { + assemblePrIterationPrompt, assembleUserPrompt, clearTokenCache, enforceTokenBudget, estimateTokens, fetchGitHubIssue, + fetchGitHubPullRequest, hydrateContext, resolveGitHubToken, type GitHubIssueContext, + type IssueComment, } from '../../../src/handlers/shared/context-hydration'; // Mock global fetch @@ -53,6 +56,37 @@ beforeEach(() => { clearTokenCache(); }); +// --------------------------------------------------------------------------- +// Test helpers +// --------------------------------------------------------------------------- + +function makeGraphQLThreadsResponse( + threads: Array<{ isResolved?: boolean; comments: Array> }>, + hasNextPage = false, + endCursor?: string, +): { ok: boolean; json: () => Promise> } { + return { + ok: true, + json: async () => ({ + data: { + repository: { + pullRequest: { + reviewThreads: { + pageInfo: { hasNextPage, endCursor: endCursor ?? null }, + nodes: threads.map(t => ({ + isResolved: t.isResolved ?? false, + comments: { + nodes: t.comments, + }, + })), + }, + }, + }, + }, + }), + }; +} + // --------------------------------------------------------------------------- // resolveGitHubToken // --------------------------------------------------------------------------- @@ -105,8 +139,8 @@ describe('fetchGitHubIssue', () => { comments: 2, }; const commentsResponse = [ - { user: { login: 'alice' }, body: 'I can reproduce this.' }, - { user: { login: 'bob' }, body: 'Me too.' }, + { id: 101, user: { login: 'alice' }, body: 'I can reproduce this.' }, + { id: 102, user: { login: 'bob' }, body: 'Me too.' }, ]; test('fetches issue with comments', async () => { @@ -120,8 +154,8 @@ describe('fetchGitHubIssue', () => { title: 'Fix the bug', body: 'The bug is in login.ts', comments: [ - { author: 'alice', body: 'I can reproduce this.' }, - { author: 'bob', body: 'Me too.' }, + { id: 101, author: 'alice', body: 'I can reproduce this.' }, + { id: 102, author: 'bob', body: 'Me too.' }, ], }); expect(mockFetch).toHaveBeenCalledTimes(2); @@ -161,6 +195,36 @@ describe('fetchGitHubIssue', () => { expect(result).toBeNull(); }); + test('skips issue comments with non-numeric id', async () => { + mockFetch + .mockResolvedValueOnce({ ok: true, json: async () => ({ ...issueResponse, comments: 2 }) }) + .mockResolvedValueOnce({ + ok: true, + json: async () => ([ + { id: undefined, user: { login: 'alice' }, body: 'Bad' }, + { id: 101, user: { login: 'bob' }, body: 'Good' }, + ]), + }); + + const result = await fetchGitHubIssue('owner/repo', 42, 'ghp_token'); + expect(result!.comments).toHaveLength(1); + expect(result!.comments[0].id).toBe(101); + }); + + test('falls back to unknown for empty string author on issue comments', async () => { + mockFetch + .mockResolvedValueOnce({ ok: true, json: async () => ({ ...issueResponse, comments: 1 }) }) + .mockResolvedValueOnce({ + ok: true, + json: async () => ([ + { id: 101, user: { login: '' }, body: 'Empty login' }, + ]), + }); + + const result = await fetchGitHubIssue('owner/repo', 42, 'ghp_token'); + expect(result!.comments[0].author).toBe('unknown'); + }); + test('handles null issue body gracefully', async () => { mockFetch.mockResolvedValueOnce({ ok: true, @@ -200,6 +264,7 @@ describe('enforceTokenBudget', () => { title: 'Test', body: 'Body text', comments: Array.from({ length: commentCount }, (_, i) => ({ + id: 1000 + i, author: `user${i}`, body: 'x'.repeat(400), // ~100 tokens per comment })), @@ -244,7 +309,7 @@ describe('assembleUserPrompt', () => { number: 42, title: 'Fix login bug', body: 'The login form crashes.', - comments: [{ author: 'alice', body: 'Confirmed.' }], + comments: [{ id: 201, author: 'alice', body: 'Confirmed.' }], }; const result = assembleUserPrompt('TASK001', 'org/repo', issue, 'Fix the login crash'); @@ -263,7 +328,7 @@ describe('assembleUserPrompt', () => { number: 10, title: 'Add feature', body: 'Please add dark mode.', - comments: [], + comments: [] as IssueComment[], }; const result = assembleUserPrompt('TASK002', 'org/repo', issue); @@ -285,7 +350,7 @@ describe('assembleUserPrompt', () => { number: 5, title: 'Empty issue', body: '', - comments: [], + comments: [] as IssueComment[], }; const result = assembleUserPrompt('TASK004', 'org/repo', issue); expect(result).toContain('(no description)'); @@ -297,7 +362,7 @@ describe('assembleUserPrompt', () => { number: 1, title: 'Test issue', body: 'Issue body here', - comments: [{ author: 'dev', body: 'A comment' }], + comments: [{ id: 301, author: 'dev', body: 'A comment' }], }; const result = assembleUserPrompt('T1', 'o/r', issue, 'Do the thing'); @@ -308,6 +373,145 @@ describe('assembleUserPrompt', () => { }); }); +// --------------------------------------------------------------------------- +// fetchGitHubPullRequest — id fields +// --------------------------------------------------------------------------- + +describe('fetchGitHubPullRequest — id fields', () => { + test('returns id and in_reply_to_id on review comments', async () => { + mockFetch + .mockResolvedValueOnce({ ok: true, json: async () => ({ number: 10, title: 'PR', body: '', head: { ref: 'feat' }, base: { ref: 'main' }, state: 'open' }) }) + .mockResolvedValueOnce(makeGraphQLThreadsResponse([{ + isResolved: false, + comments: [ + { databaseId: 100, author: { login: 'alice' }, body: 'Fix this', path: 'a.ts', line: 1, diffHunk: undefined }, + { databaseId: 200, author: { login: 'bob' }, body: 'Agreed', path: undefined, line: undefined, diffHunk: undefined }, + ], + }])) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }); + + const result = await fetchGitHubPullRequest('owner/repo', 10, 'ghp_test'); + expect(result!.review_comments[0].id).toBe(100); + expect(result!.review_comments[0].in_reply_to_id).toBeUndefined(); + expect(result!.review_comments[1].id).toBe(200); + expect(result!.review_comments[1].in_reply_to_id).toBe(100); + }); + + test('sets in_reply_to_id to undefined for thread root comments', async () => { + mockFetch + .mockResolvedValueOnce({ ok: true, json: async () => ({ number: 10, title: 'PR', body: '', head: { ref: 'feat' }, base: { ref: 'main' }, state: 'open' }) }) + .mockResolvedValueOnce(makeGraphQLThreadsResponse([{ + isResolved: false, + comments: [ + { databaseId: 100, author: { login: 'alice' }, body: 'Fix this', path: 'a.ts', line: 1, diffHunk: undefined }, + ], + }])) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }); + + const result = await fetchGitHubPullRequest('owner/repo', 10, 'ghp_test'); + expect(result!.review_comments[0].id).toBe(100); + expect(result!.review_comments[0].in_reply_to_id).toBeUndefined(); + }); + + test('skips review comments with non-numeric databaseId', async () => { + mockFetch + .mockResolvedValueOnce({ ok: true, json: async () => ({ number: 10, title: 'PR', body: '', head: { ref: 'feat' }, base: { ref: 'main' }, state: 'open' }) }) + .mockResolvedValueOnce(makeGraphQLThreadsResponse([{ + isResolved: false, + comments: [ + { databaseId: 'not-a-number', author: { login: 'alice' }, body: 'Bad id' }, + { databaseId: undefined, author: { login: 'bob' }, body: 'Missing id' }, + { databaseId: 100, author: { login: 'carol' }, body: 'Valid' }, + ], + }])) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }); + + const result = await fetchGitHubPullRequest('owner/repo', 10, 'ghp_test'); + // Only the valid comment should be included + expect(result!.review_comments).toHaveLength(1); + expect(result!.review_comments[0].id).toBe(100); + // First valid comment becomes root — no in_reply_to_id + expect(result!.review_comments[0].in_reply_to_id).toBeUndefined(); + }); + + test('promotes first valid comment to root when earlier comments have invalid databaseId', async () => { + mockFetch + .mockResolvedValueOnce({ ok: true, json: async () => ({ number: 10, title: 'PR', body: '', head: { ref: 'feat' }, base: { ref: 'main' }, state: 'open' }) }) + .mockResolvedValueOnce(makeGraphQLThreadsResponse([{ + isResolved: false, + comments: [ + { databaseId: 'bad', author: { login: 'alice' }, body: 'Invalid root' }, + { databaseId: 100, author: { login: 'bob' }, body: 'Becomes root' }, + { databaseId: 200, author: { login: 'carol' }, body: 'Reply' }, + ], + }])) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }); + + const result = await fetchGitHubPullRequest('owner/repo', 10, 'ghp_test'); + expect(result!.review_comments).toHaveLength(2); + // Comment 100 becomes root since the first comment was invalid + expect(result!.review_comments[0].id).toBe(100); + expect(result!.review_comments[0].in_reply_to_id).toBeUndefined(); + // Comment 200 is a reply to the promoted root + expect(result!.review_comments[1].id).toBe(200); + expect(result!.review_comments[1].in_reply_to_id).toBe(100); + }); + + test('skips issue comments with non-numeric id', async () => { + mockFetch + .mockResolvedValueOnce({ ok: true, json: async () => ({ number: 10, title: 'PR', body: '', head: { ref: 'feat' }, base: { ref: 'main' }, state: 'open' }) }) + .mockResolvedValueOnce(makeGraphQLThreadsResponse([])) + .mockResolvedValueOnce({ + ok: true, + json: async () => ([ + { id: null, user: { login: 'alice' }, body: 'Null id' }, + { id: 300, user: { login: 'bob' }, body: 'Valid' }, + ]), + }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }); + + const result = await fetchGitHubPullRequest('owner/repo', 10, 'ghp_test'); + expect(result!.issue_comments).toHaveLength(1); + expect(result!.issue_comments[0].id).toBe(300); + }); + + test('falls back to unknown for empty string author', async () => { + mockFetch + .mockResolvedValueOnce({ ok: true, json: async () => ({ number: 10, title: 'PR', body: '', head: { ref: 'feat' }, base: { ref: 'main' }, state: 'open' }) }) + .mockResolvedValueOnce(makeGraphQLThreadsResponse([{ + isResolved: false, + comments: [ + { databaseId: 100, author: { login: '' }, body: 'Empty login' }, + ], + }])) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }); + + const result = await fetchGitHubPullRequest('owner/repo', 10, 'ghp_test'); + expect(result!.review_comments[0].author).toBe('unknown'); + }); + + test('returns id on issue comments', async () => { + mockFetch + .mockResolvedValueOnce({ ok: true, json: async () => ({ number: 10, title: 'PR', body: '', head: { ref: 'feat' }, base: { ref: 'main' }, state: 'open' }) }) + .mockResolvedValueOnce(makeGraphQLThreadsResponse([])) + .mockResolvedValueOnce({ + ok: true, + json: async () => ([ + { id: 300, user: { login: 'carol' }, body: 'Looks good' }, + ]), + }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }); + + const result = await fetchGitHubPullRequest('owner/repo', 10, 'ghp_test'); + expect(result!.issue_comments[0].id).toBe(300); + }); +}); + // --------------------------------------------------------------------------- // hydrateContext // --------------------------------------------------------------------------- @@ -469,4 +673,316 @@ describe('hydrateContext', () => { expect(result.sources).toContain('task_description'); expect(result.version).toBe(1); }); + + test('pr_review task hydrates PR context', async () => { + mockSmSend.mockResolvedValueOnce({ SecretString: 'ghp_test' }); + mockFetch + .mockResolvedValueOnce({ + ok: true, + json: async () => ({ + number: 55, title: 'Review PR', body: 'Please review', head: { ref: 'feature/review' }, base: { ref: 'main' }, state: 'open', + }), + }) + .mockResolvedValueOnce(makeGraphQLThreadsResponse([])) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }); + + const task = { + ...baseTask, + task_type: 'pr_review', + pr_number: 55, + }; + const result = await hydrateContext(task as any); + + expect(result.sources).toContain('pull_request'); + expect(result.resolved_branch_name).toBe('feature/review'); + expect(result.resolved_base_branch).toBe('main'); + expect(result.user_prompt).toContain('Review this pull request'); + }); + + test('pr_iteration prompt: removing a thread root also removes its replies from prompt', () => { + // Thread 1: root (id 100) + reply (id 200) + // Thread 2: root (id 300) + // Simulate what the trimming code does: remove thread 1 entirely, keep thread 2. + const fullPr = { + number: 5, + title: 'Test', + body: '', + head_ref: 'feat', + base_ref: 'main', + state: 'open', + diff_summary: '', + review_comments: [ + { id: 100, author: 'alice', body: 'Fix this', path: 'a.ts', line: 1 }, + { id: 200, in_reply_to_id: 100, author: 'bob', body: 'Agreed' }, + { id: 300, author: 'carol', body: 'Rename this', path: 'b.ts', line: 5 }, + ], + issue_comments: [], + }; + + // Full prompt has both threads + const fullResult = assemblePrIterationPrompt('task-1', 'owner/repo', fullPr); + expect(fullResult).toContain('comment_id: 100'); + expect(fullResult).toContain('@bob'); + expect(fullResult).toContain('comment_id: 300'); + + // After trimming thread 1 (root 100 + reply 200), only thread 2 remains + const trimmedPr = { + ...fullPr, + review_comments: [ + { id: 300, author: 'carol', body: 'Rename this', path: 'b.ts', line: 5 }, + ], + }; + const trimmedResult = assemblePrIterationPrompt('task-1', 'owner/repo', trimmedPr); + + // Thread 1 root and reply are both gone + expect(trimmedResult).not.toContain('comment_id: 100'); + expect(trimmedResult).not.toContain('@alice'); + expect(trimmedResult).not.toContain('@bob'); + + // Thread 2 is still present + expect(trimmedResult).toContain('comment_id: 300'); + expect(trimmedResult).toContain('@carol'); + + // Critically: reply 200 does NOT appear as an orphan + expect(trimmedResult).not.toContain('Comment on'); + }); +}); + +// --------------------------------------------------------------------------- +// fetchGitHubPullRequest +// --------------------------------------------------------------------------- + +describe('fetchGitHubPullRequest', () => { + test('returns PR context on success', async () => { + mockSmSend.mockResolvedValueOnce({ SecretString: 'ghp_test' }); + mockFetch + .mockResolvedValueOnce({ ok: true, json: async () => ({ number: 42, title: 'Fix bug', body: 'desc', head: { ref: 'feature' }, base: { ref: 'main' }, state: 'open' }) }) + .mockResolvedValueOnce(makeGraphQLThreadsResponse([{ + isResolved: false, + comments: [{ databaseId: 501, author: { login: 'alice' }, body: 'LGTM', path: 'src/a.ts', line: 10, diffHunk: '@@ -1,3 +1,4 @@' }], + }])) + .mockResolvedValueOnce({ ok: true, json: async () => ([{ id: 601, user: { login: 'bob' }, body: 'Nice work' }]) }) + .mockResolvedValueOnce({ ok: true, json: async () => ([{ filename: 'src/a.ts', status: 'modified', additions: 5, deletions: 2, patch: '+added\n-removed' }]) }); + + const result = await fetchGitHubPullRequest('owner/repo', 42, 'ghp_test'); + + expect(result).not.toBeNull(); + expect(result!.number).toBe(42); + expect(result!.head_ref).toBe('feature'); + expect(result!.base_ref).toBe('main'); + expect(result!.review_comments).toHaveLength(1); + expect(result!.issue_comments).toHaveLength(1); + expect(result!.diff_summary).toContain('src/a.ts'); + }); + + test('returns null when PR fetch fails with 404', async () => { + mockFetch.mockResolvedValueOnce({ ok: false, status: 404 }); + + const result = await fetchGitHubPullRequest('owner/repo', 999, 'ghp_test'); + expect(result).toBeNull(); + }); + + test('filters out resolved review threads', async () => { + mockFetch + .mockResolvedValueOnce({ ok: true, json: async () => ({ number: 10, title: 'PR', body: '', head: { ref: 'feat' }, base: { ref: 'main' }, state: 'open' }) }) + .mockResolvedValueOnce(makeGraphQLThreadsResponse([ + { + isResolved: true, + comments: [{ databaseId: 100, author: { login: 'alice' }, body: 'Resolved comment', path: 'a.ts', line: 1, diffHunk: '@@ -1 +1 @@' }], + }, + { + isResolved: false, + comments: [{ databaseId: 200, author: { login: 'bob' }, body: 'Open comment', path: 'b.ts', line: 5, diffHunk: '@@ -5 +5 @@' }], + }, + ])) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }); + + const result = await fetchGitHubPullRequest('owner/repo', 10, 'ghp_test'); + expect(result!.review_comments).toHaveLength(1); + expect(result!.review_comments[0].id).toBe(200); + expect(result!.review_comments[0].body).toBe('Open comment'); + }); + + test('returns empty review comments when GraphQL returns errors', async () => { + mockFetch + .mockResolvedValueOnce({ ok: true, json: async () => ({ number: 10, title: 'PR', body: '', head: { ref: 'feat' }, base: { ref: 'main' }, state: 'open' }) }) + .mockResolvedValueOnce({ + ok: true, + json: async () => ({ + errors: [{ message: 'Something went wrong' }], + }), + }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }); + + const result = await fetchGitHubPullRequest('owner/repo', 10, 'ghp_test'); + expect(result!.review_comments).toEqual([]); + }); + + test('paginates through review threads', async () => { + mockFetch + // #1: PR metadata + .mockResolvedValueOnce({ ok: true, json: async () => ({ number: 10, title: 'PR', body: '', head: { ref: 'feat' }, base: { ref: 'main' }, state: 'open' }) }) + // #2: GraphQL page 1 + .mockResolvedValueOnce(makeGraphQLThreadsResponse( + [{ isResolved: false, comments: [{ databaseId: 100, author: { login: 'alice' }, body: 'Page 1', path: 'a.ts', line: 1, diffHunk: undefined }] }], + true, + 'cursor-1', + )) + // #3: Issue comments + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }) + // #4: Files + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }) + // #5: GraphQL page 2 + .mockResolvedValueOnce(makeGraphQLThreadsResponse( + [{ isResolved: false, comments: [{ databaseId: 200, author: { login: 'bob' }, body: 'Page 2', path: 'b.ts', line: 5, diffHunk: undefined }] }], + )); + + const result = await fetchGitHubPullRequest('owner/repo', 10, 'ghp_test'); + expect(result!.review_comments).toHaveLength(2); + expect(result!.review_comments[0].id).toBe(100); + expect(result!.review_comments[1].id).toBe(200); + }); + + test('returns empty review comments when GraphQL response has unexpected structure', async () => { + mockFetch + .mockResolvedValueOnce({ ok: true, json: async () => ({ number: 10, title: 'PR', body: '', head: { ref: 'feat' }, base: { ref: 'main' }, state: 'open' }) }) + .mockResolvedValueOnce({ + ok: true, + json: async () => ({ + data: { repository: { pullRequest: null } }, + }), + }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }); + + const result = await fetchGitHubPullRequest('owner/repo', 10, 'ghp_test'); + expect(result!.review_comments).toEqual([]); + }); + + test('returns empty review comments when GraphQL fetch fails', async () => { + mockFetch + .mockResolvedValueOnce({ ok: true, json: async () => ({ number: 10, title: 'PR', body: '', head: { ref: 'feat' }, base: { ref: 'main' }, state: 'open' }) }) + .mockResolvedValueOnce({ ok: false, status: 502 }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }) + .mockResolvedValueOnce({ ok: true, json: async () => ([]) }); + + const result = await fetchGitHubPullRequest('owner/repo', 10, 'ghp_test'); + expect(result!.review_comments).toEqual([]); + }); +}); + +// --------------------------------------------------------------------------- +// assemblePrIterationPrompt +// --------------------------------------------------------------------------- + +describe('assemblePrIterationPrompt', () => { + test('formats PR context into a user prompt', () => { + const pr = { + number: 42, + title: 'Fix null check', + body: 'Fixes a null pointer', + head_ref: 'fix/null-check', + base_ref: 'main', + state: 'open', + diff_summary: '### src/a.ts (modified, +5/-2)', + review_comments: [{ id: 1001, author: 'alice', body: 'Please add a test', path: 'src/a.ts', line: 10 }], + issue_comments: [{ id: 2001, author: 'bob', body: 'Looks good overall' }], + }; + + const result = assemblePrIterationPrompt('task-1', 'owner/repo', pr, 'Fix the null check Alice flagged'); + + expect(result).toContain('Pull Request #42'); + expect(result).toContain('Fix null check'); + expect(result).toContain('alice'); + expect(result).toContain('Please add a test'); + expect(result).toContain('bob'); + expect(result).toContain('Fix the null check Alice flagged'); + expect(result).toContain('src/a.ts'); + expect(result).toContain('reply with comment_id: 1001'); + expect(result).toContain('comment_id: 2001'); + }); + + test('groups review comments by thread', () => { + const pr = { + number: 10, + title: 'Refactor auth', + body: 'Auth changes', + head_ref: 'refactor/auth', + base_ref: 'main', + state: 'open', + diff_summary: '', + review_comments: [ + { id: 100, author: 'alice', body: 'Please add a null check here', path: 'src/auth.ts', line: 42, diff_hunk: '@@ -40,3 +40,5 @@' }, + { id: 200, in_reply_to_id: 100, author: 'bob', body: 'I agree with Alice' }, + { id: 300, author: 'alice', body: 'This function name is misleading', path: 'src/api.ts', line: 10 }, + ], + issue_comments: [], + }; + + const result = assemblePrIterationPrompt('task-2', 'owner/repo', pr); + + // First thread — rooted at comment 100 + expect(result).toContain('reply with comment_id: 100'); + expect(result).toContain('**@alice**: Please add a null check here'); + expect(result).toContain('**@bob**: I agree with Alice'); + expect(result).toContain('`src/auth.ts:42`'); + + // Second thread — rooted at comment 300 + expect(result).toContain('reply with comment_id: 300'); + expect(result).toContain('**@alice**: This function name is misleading'); + expect(result).toContain('`src/api.ts:10`'); + }); + + test('groups threads correctly when in_reply_to_id is undefined for thread roots', () => { + const pr = { + number: 10, + title: 'Test', + body: '', + head_ref: 'feat', + base_ref: 'main', + state: 'open', + diff_summary: '', + review_comments: [ + { id: 100, in_reply_to_id: undefined, author: 'alice', body: 'Add null check', path: 'src/a.ts', line: 5 }, + { id: 200, in_reply_to_id: 100, author: 'bob', body: 'Agreed' }, + ], + issue_comments: [], + }; + + const result = assemblePrIterationPrompt('task-4', 'owner/repo', pr); + + // Comment 100 should be a thread root, not an orphan + expect(result).toContain('reply with comment_id: 100'); + expect(result).toContain('**@alice**: Add null check'); + expect(result).toContain('**@bob**: Agreed'); + // Should NOT contain orphan markers for these comments + expect(result).not.toContain('Comment on'); + }); + + test('renders orphan replies as standalone entries', () => { + const pr = { + number: 10, + title: 'Test', + body: '', + head_ref: 'feat', + base_ref: 'main', + state: 'open', + diff_summary: '', + review_comments: [ + // Reply whose root (id 999) is not in the fetched set + { id: 500, in_reply_to_id: 999, author: 'carol', body: 'What about edge cases?', path: 'src/util.ts', line: 5 }, + ], + issue_comments: [], + }; + + const result = assemblePrIterationPrompt('task-3', 'owner/repo', pr); + + // Orphan uses in_reply_to_id (999) as reply target, not its own id (500) + expect(result).toContain('reply with comment_id: 999'); + expect(result).toContain('**@carol**: What about edge cases?'); + expect(result).toContain('`src/util.ts:5`'); + }); }); diff --git a/cdk/test/handlers/shared/create-task-core.test.ts b/cdk/test/handlers/shared/create-task-core.test.ts index 5fc5f6c..9f81bdb 100644 --- a/cdk/test/handlers/shared/create-task-core.test.ts +++ b/cdk/test/handlers/shared/create-task-core.test.ts @@ -287,4 +287,80 @@ describe('createTaskCore', () => { ); expect(result.statusCode).toBe(201); }); + + test('creates pr_iteration task with pr_number', async () => { + const result = await createTaskCore( + { repo: 'org/repo', task_type: 'pr_iteration', pr_number: 42 }, + makeContext(), + 'req-pr-1', + ); + expect(result.statusCode).toBe(201); + const body = JSON.parse(result.body); + expect(body.data.task_type).toBe('pr_iteration'); + expect(body.data.pr_number).toBe(42); + expect(body.data.branch_name).toBe('pending:pr_resolution'); + }); + + test('returns 400 for pr_iteration without pr_number', async () => { + const result = await createTaskCore( + { repo: 'org/repo', task_type: 'pr_iteration', task_description: 'Fix it' }, + makeContext(), + 'req-pr-2', + ); + expect(result.statusCode).toBe(400); + expect(result.body).toContain('pr_number is required'); + }); + + test('returns 400 for pr_number without pr_iteration task_type', async () => { + const result = await createTaskCore( + { repo: 'org/repo', task_description: 'Fix it', pr_number: 42 } as any, + makeContext(), + 'req-pr-3', + ); + expect(result.statusCode).toBe(400); + expect(result.body).toContain('pr_number is only allowed'); + }); + + test('returns 400 for invalid task_type', async () => { + const result = await createTaskCore( + { repo: 'org/repo', task_description: 'Fix it', task_type: 'invalid' as any }, + makeContext(), + 'req-pr-4', + ); + expect(result.statusCode).toBe(400); + expect(result.body).toContain('Invalid task_type'); + }); + + test('creates pr_review task with pr_number', async () => { + const result = await createTaskCore( + { repo: 'org/repo', task_type: 'pr_review', pr_number: 99 }, + makeContext(), + 'req-review-1', + ); + expect(result.statusCode).toBe(201); + const body = JSON.parse(result.body); + expect(body.data.task_type).toBe('pr_review'); + expect(body.data.pr_number).toBe(99); + expect(body.data.branch_name).toBe('pending:pr_resolution'); + }); + + test('returns 400 for pr_review without pr_number', async () => { + const result = await createTaskCore( + { repo: 'org/repo', task_type: 'pr_review', task_description: 'Review it' }, + makeContext(), + 'req-review-2', + ); + expect(result.statusCode).toBe(400); + expect(result.body).toContain('pr_number is required'); + }); + + test('returns 400 for pr_number with new_task', async () => { + const result = await createTaskCore( + { repo: 'org/repo', task_description: 'Fix it', pr_number: 42 } as any, + makeContext(), + 'req-review-3', + ); + expect(result.statusCode).toBe(400); + expect(result.body).toContain('pr_number is only allowed'); + }); }); diff --git a/cdk/test/handlers/shared/preflight.test.ts b/cdk/test/handlers/shared/preflight.test.ts index 9658c4b..194909f 100644 --- a/cdk/test/handlers/shared/preflight.test.ts +++ b/cdk/test/handlers/shared/preflight.test.ts @@ -290,6 +290,47 @@ describe('runPreflightChecks', () => { expect(failedChecks.length).toBeGreaterThanOrEqual(1); }); + test('passes when prNumber is provided and PR is open', async () => { + mockSmSend.mockResolvedValueOnce({ SecretString: 'ghp_test123' }); + mockFetch + .mockResolvedValueOnce({ ok: true, status: 200 }) // reachability + .mockResolvedValueOnce({ ok: true, status: 200 }) // repo access + .mockResolvedValueOnce({ ok: true, status: 200, json: async () => ({ state: 'open' }) }); // PR + + const result = await runPreflightChecks('owner/repo', baseBlueprintConfig, 42); + + expect(result.passed).toBe(true); + expect(result.checks).toHaveLength(3); + expect(result.checks[2].check).toBe('pr_accessible'); + expect(result.checks[2].passed).toBe(true); + }); + + test('fails when PR returns 404', async () => { + mockSmSend.mockResolvedValueOnce({ SecretString: 'ghp_test123' }); + mockFetch + .mockResolvedValueOnce({ ok: true, status: 200 }) + .mockResolvedValueOnce({ ok: true, status: 200 }) + .mockResolvedValueOnce({ ok: false, status: 404 }); + + const result = await runPreflightChecks('owner/repo', baseBlueprintConfig, 42); + + expect(result.passed).toBe(false); + expect(result.failureReason).toBe(PreflightFailureReason.PR_NOT_FOUND_OR_CLOSED); + }); + + test('fails when PR is closed', async () => { + mockSmSend.mockResolvedValueOnce({ SecretString: 'ghp_test123' }); + mockFetch + .mockResolvedValueOnce({ ok: true, status: 200 }) + .mockResolvedValueOnce({ ok: true, status: 200 }) + .mockResolvedValueOnce({ ok: true, status: 200, json: async () => ({ state: 'closed' }) }); + + const result = await runPreflightChecks('owner/repo', baseBlueprintConfig, 42); + + expect(result.passed).toBe(false); + expect(result.failureReason).toBe(PreflightFailureReason.PR_NOT_FOUND_OR_CLOSED); + }); + test('uses per-repo github_token_secret_arn from blueprint config', async () => { const perRepoArn = 'arn:aws:secretsmanager:us-east-1:123:secret:per-repo-token'; const config: BlueprintConfig = { diff --git a/cdk/test/handlers/shared/validation.test.ts b/cdk/test/handlers/shared/validation.test.ts index ff1a95b..21a4389 100644 --- a/cdk/test/handlers/shared/validation.test.ts +++ b/cdk/test/handlers/shared/validation.test.ts @@ -25,12 +25,15 @@ import { isValidIdempotencyKey, isValidRepo, isValidTaskDescriptionLength, + isValidTaskType, isValidWebhookName, MAX_TASK_DESCRIPTION_LENGTH, parseBody, parseLimit, parseStatusFilter, + VALID_TASK_TYPES, validateMaxTurns, + validatePrNumber, } from '../../../src/handlers/shared/validation'; describe('parseBody', () => { @@ -89,6 +92,22 @@ describe('hasTaskSpec', () => { test('returns false when task_description is empty/whitespace', () => { expect(hasTaskSpec({ repo: 'org/repo', task_description: ' ' })).toBe(false); }); + + test('returns true when task_type is pr_iteration and pr_number is provided', () => { + expect(hasTaskSpec({ repo: 'org/repo', task_type: 'pr_iteration', pr_number: 42 })).toBe(true); + }); + + test('returns false for pr_iteration without pr_number', () => { + expect(hasTaskSpec({ repo: 'org/repo', task_type: 'pr_iteration' })).toBe(false); + }); + + test('returns true when task_type is pr_review and pr_number is provided', () => { + expect(hasTaskSpec({ repo: 'org/repo', task_type: 'pr_review', pr_number: 42 })).toBe(true); + }); + + test('returns false for pr_review without pr_number', () => { + expect(hasTaskSpec({ repo: 'org/repo', task_type: 'pr_review' })).toBe(false); + }); }); describe('isValidIdempotencyKey', () => { @@ -300,3 +319,47 @@ describe('pagination token encode/decode', () => { expect(decodePaginationToken('not-valid-base64!!!')).toBeUndefined(); }); }); + +describe('isValidTaskType', () => { + test('returns true for valid task types', () => { + expect(isValidTaskType('new_task')).toBe(true); + expect(isValidTaskType('pr_iteration')).toBe(true); + }); + + test('returns true for undefined/null (defaults to new_task)', () => { + expect(isValidTaskType(undefined)).toBe(true); + expect(isValidTaskType(null)).toBe(true); + }); + + test('returns true for pr_review', () => { + expect(isValidTaskType('pr_review')).toBe(true); + }); + + test('returns false for invalid values', () => { + expect(isValidTaskType('invalid')).toBe(false); + expect(isValidTaskType('')).toBe(false); + expect(isValidTaskType(42)).toBe(false); + expect(isValidTaskType(true)).toBe(false); + }); +}); + +describe('validatePrNumber', () => { + test('returns the number for valid positive integers', () => { + expect(validatePrNumber(1)).toBe(1); + expect(validatePrNumber(42)).toBe(42); + expect(validatePrNumber(999)).toBe(999); + }); + + test('returns undefined for absent values', () => { + expect(validatePrNumber(undefined)).toBeUndefined(); + expect(validatePrNumber(null)).toBeUndefined(); + }); + + test('returns null for invalid values', () => { + expect(validatePrNumber(0)).toBeNull(); + expect(validatePrNumber(-1)).toBeNull(); + expect(validatePrNumber(1.5)).toBeNull(); + expect(validatePrNumber('42')).toBeNull(); + expect(validatePrNumber(true)).toBeNull(); + }); +}); diff --git a/cli/src/commands/submit.ts b/cli/src/commands/submit.ts index 807b9a4..a09c356 100644 --- a/cli/src/commands/submit.ts +++ b/cli/src/commands/submit.ts @@ -32,12 +32,23 @@ export function makeSubmitCommand(): Command { .option('--task ', 'Task description') .option('--max-turns ', 'Maximum agent turns (1-500)', parseInt) .option('--max-budget ', 'Maximum budget in USD (0.01-100)', parseFloat) + .option('--pr ', 'PR number to iterate on (sets task_type to pr_iteration)', parseInt) + .option('--review-pr ', 'PR number to review (sets task_type to pr_review)', parseInt) .option('--idempotency-key ', 'Idempotency key for deduplication') .option('--wait', 'Wait for task to complete') .option('--output ', 'Output format (text or json)', 'text') .action(async (opts) => { - if (opts.issue === undefined && !opts.task) { - throw new CliError('At least one of --issue or --task is required.'); + if (opts.pr !== undefined && isNaN(opts.pr)) { + throw new CliError('--pr must be a valid number.'); + } + if (opts.reviewPr !== undefined && isNaN(opts.reviewPr)) { + throw new CliError('--review-pr must be a valid number.'); + } + if (opts.pr !== undefined && opts.reviewPr !== undefined) { + throw new CliError('--pr and --review-pr cannot be used together.'); + } + if (opts.pr === undefined && opts.reviewPr === undefined && opts.issue === undefined && !opts.task) { + throw new CliError('At least one of --issue, --task, --pr, or --review-pr is required.'); } if (opts.issue !== undefined && isNaN(opts.issue)) { throw new CliError('--issue must be a valid number.'); @@ -60,6 +71,9 @@ export function makeSubmitCommand(): Command { ...(opts.task && { task_description: opts.task }), ...(opts.maxTurns !== undefined && { max_turns: opts.maxTurns }), ...(opts.maxBudget !== undefined && { max_budget_usd: opts.maxBudget }), + // Note: --pr and --review-pr are mutually exclusive (validated above). + ...(opts.pr !== undefined && { task_type: 'pr_iteration' as const, pr_number: opts.pr }), + ...(opts.reviewPr !== undefined && { task_type: 'pr_review' as const, pr_number: opts.reviewPr }), }; const task = await client.createTask(body, opts.idempotencyKey); diff --git a/cli/src/format.ts b/cli/src/format.ts index 87adf43..6138b25 100644 --- a/cli/src/format.ts +++ b/cli/src/format.ts @@ -26,6 +26,12 @@ export function formatTaskDetail(task: TaskDetail): string { `Status: ${task.status}`, `Repo: ${task.repo}`, ]; + if (task.task_type && task.task_type !== 'new_task') { + lines.push(`Type: ${task.task_type}`); + } + if (task.pr_number !== null) { + lines.push(`PR #: ${task.pr_number}`); + } if (task.issue_number !== null) { lines.push(`Issue: #${task.issue_number}`); } @@ -74,13 +80,19 @@ export function formatTaskList(tasks: TaskSummary[]): string { } const headers = ['TASK ID', 'STATUS', 'REPO', 'CREATED', 'DESCRIPTION']; - const rows = tasks.map(t => [ - t.task_id, - t.status, - t.repo, - t.created_at, - truncate(t.task_description || (t.issue_number !== null ? `#${t.issue_number}` : '-'), 40), - ]); + const rows = tasks.map(t => { + let desc = t.task_description || (t.issue_number !== null ? `#${t.issue_number}` : '-'); + if (t.task_type === 'pr_iteration' && t.pr_number !== null) { + desc = `PR #${t.pr_number}` + (t.task_description ? `: ${t.task_description}` : ''); + } + return [ + t.task_id, + t.status, + t.repo, + t.created_at, + truncate(desc, 40), + ]; + }); return formatTable(headers, rows); } diff --git a/cli/src/types.ts b/cli/src/types.ts index 0ca9fc1..59b7c2a 100644 --- a/cli/src/types.ts +++ b/cli/src/types.ts @@ -17,12 +17,17 @@ * SOFTWARE. */ +/** Valid task types for task creation. */ +export type TaskType = 'new_task' | 'pr_iteration' | 'pr_review'; + /** Task detail returned by GET /v1/tasks/{task_id}. */ export interface TaskDetail { readonly task_id: string; readonly status: string; readonly repo: string; readonly issue_number: number | null; + readonly task_type: TaskType; + readonly pr_number: number | null; readonly task_description: string | null; readonly branch_name: string; readonly session_id: string | null; @@ -45,6 +50,8 @@ export interface TaskSummary { readonly status: string; readonly repo: string; readonly issue_number: number | null; + readonly task_type: TaskType; + readonly pr_number: number | null; readonly task_description: string | null; readonly branch_name: string; readonly pr_url: string | null; @@ -67,6 +74,8 @@ export interface CreateTaskRequest { readonly task_description?: string; readonly max_turns?: number; readonly max_budget_usd?: number; + readonly task_type?: TaskType; + readonly pr_number?: number; } /** Cancel task response from DELETE /v1/tasks/{task_id}. */ diff --git a/cli/test/commands/submit.test.ts b/cli/test/commands/submit.test.ts index 2ac7bd0..25768ff 100644 --- a/cli/test/commands/submit.test.ts +++ b/cli/test/commands/submit.test.ts @@ -51,6 +51,8 @@ describe('submit command', () => { status: 'SUBMITTED', repo: 'owner/repo', issue_number: 42, + task_type: 'new_task', + pr_number: null, task_description: null, branch_name: 'bgagent/abc/fix', session_id: null, @@ -64,6 +66,7 @@ describe('submit command', () => { cost_usd: null, build_passed: null, max_turns: null, + max_budget_usd: null, }); const cmd = makeSubmitCommand(); @@ -86,6 +89,8 @@ describe('submit command', () => { status: 'SUBMITTED', repo: 'owner/repo', issue_number: null, + task_type: 'new_task', + pr_number: null, task_description: 'Fix the bug', branch_name: 'bgagent/abc/fix', session_id: null, @@ -99,6 +104,7 @@ describe('submit command', () => { cost_usd: null, build_passed: null, max_turns: null, + max_budget_usd: null, }); const cmd = makeSubmitCommand(); @@ -135,6 +141,8 @@ describe('submit command', () => { status: 'SUBMITTED', repo: 'owner/repo', issue_number: null, + task_type: 'new_task', + pr_number: null, task_description: 'Fix the bug', branch_name: 'bgagent/abc/fix', session_id: null, @@ -148,6 +156,7 @@ describe('submit command', () => { cost_usd: null, build_passed: null, max_turns: 50, + max_budget_usd: null, }); const cmd = makeSubmitCommand(); @@ -176,13 +185,144 @@ describe('submit command', () => { ).rejects.toThrow('--max-turns must be an integer between 1 and 500'); }); - test('errors when neither --issue nor --task provided', async () => { + test('errors when neither --issue nor --task nor --pr provided', async () => { const cmd = makeSubmitCommand(); await expect( cmd.parseAsync([ 'node', 'test', '--repo', 'owner/repo', ]), - ).rejects.toThrow('At least one of --issue or --task is required'); + ).rejects.toThrow('At least one of --issue, --task, --pr, or --review-pr is required'); + }); + + test('submits a pr_iteration task with --pr', async () => { + mockCreateTask.mockResolvedValue({ + task_id: 'pr-abc', + status: 'SUBMITTED', + repo: 'owner/repo', + issue_number: null, + task_type: 'pr_iteration', + pr_number: 42, + task_description: null, + branch_name: 'pending:pr_resolution', + session_id: null, + pr_url: null, + error_message: null, + created_at: '2026-01-01T00:00:00Z', + updated_at: '2026-01-01T00:00:00Z', + started_at: null, + completed_at: null, + duration_s: null, + cost_usd: null, + build_passed: null, + max_turns: null, + max_budget_usd: null, + }); + + const cmd = makeSubmitCommand(); + await cmd.parseAsync([ + 'node', 'test', + '--repo', 'owner/repo', + '--pr', '42', + ]); + + expect(mockCreateTask).toHaveBeenCalledWith( + { repo: 'owner/repo', task_type: 'pr_iteration', pr_number: 42 }, + undefined, + ); + expect(consoleSpy).toHaveBeenCalled(); + }); + + test('submits a pr_review task with --review-pr', async () => { + mockCreateTask.mockResolvedValue({ + task_id: 'review-abc', + status: 'SUBMITTED', + repo: 'owner/repo', + issue_number: null, + task_type: 'pr_review', + pr_number: 55, + task_description: null, + branch_name: 'pending:pr_resolution', + session_id: null, + pr_url: null, + error_message: null, + created_at: '2026-01-01T00:00:00Z', + updated_at: '2026-01-01T00:00:00Z', + started_at: null, + completed_at: null, + duration_s: null, + cost_usd: null, + build_passed: null, + max_turns: null, + max_budget_usd: null, + }); + + const cmd = makeSubmitCommand(); + await cmd.parseAsync([ + 'node', 'test', + '--repo', 'owner/repo', + '--review-pr', '55', + ]); + + expect(mockCreateTask).toHaveBeenCalledWith( + { repo: 'owner/repo', task_type: 'pr_review', pr_number: 55 }, + undefined, + ); + expect(consoleSpy).toHaveBeenCalled(); + }); + + test('rejects --pr and --review-pr together', async () => { + const cmd = makeSubmitCommand(); + await expect( + cmd.parseAsync([ + 'node', 'test', + '--repo', 'owner/repo', + '--pr', '42', + '--review-pr', '55', + ]), + ).rejects.toThrow('--pr and --review-pr cannot be used together'); + }); + + test('submits a pr_iteration task with --pr and --task', async () => { + mockCreateTask.mockResolvedValue({ + task_id: 'pr-abc', + status: 'SUBMITTED', + repo: 'owner/repo', + issue_number: null, + task_type: 'pr_iteration', + pr_number: 42, + task_description: 'Fix the null check', + branch_name: 'pending:pr_resolution', + session_id: null, + pr_url: null, + error_message: null, + created_at: '2026-01-01T00:00:00Z', + updated_at: '2026-01-01T00:00:00Z', + started_at: null, + completed_at: null, + duration_s: null, + cost_usd: null, + build_passed: null, + max_turns: null, + max_budget_usd: null, + }); + + const cmd = makeSubmitCommand(); + await cmd.parseAsync([ + 'node', 'test', + '--repo', 'owner/repo', + '--pr', '42', + '--task', 'Fix the null check', + ]); + + expect(mockCreateTask).toHaveBeenCalledWith( + { + repo: 'owner/repo', + task_description: 'Fix the null check', + task_type: 'pr_iteration', + pr_number: 42, + }, + undefined, + ); }); }); diff --git a/cli/test/format.test.ts b/cli/test/format.test.ts index d20cb60..009ede4 100644 --- a/cli/test/format.test.ts +++ b/cli/test/format.test.ts @@ -26,6 +26,8 @@ describe('format', () => { status: 'COMPLETED', repo: 'owner/repo', issue_number: 42, + task_type: 'new_task', + pr_number: null, task_description: 'Fix the bug', branch_name: 'bgagent/abc123/fix-the-bug', session_id: 'sess-1', @@ -81,6 +83,24 @@ describe('format', () => { expect(output).not.toContain('Build:'); expect(output).not.toContain('Max Turns:'); }); + + test('shows task_type and pr_number for pr_iteration', () => { + const prTask: TaskDetail = { + ...task, + task_type: 'pr_iteration', + pr_number: 42, + issue_number: null, + }; + const output = formatTaskDetail(prTask); + expect(output).toContain('Type: pr_iteration'); + expect(output).toContain('PR #: 42'); + }); + + test('omits task_type line for new_task', () => { + const output = formatTaskDetail(task); + expect(output).not.toContain('Type:'); + expect(output).not.toContain('PR #:'); + }); }); describe('formatTaskList', () => { @@ -90,6 +110,8 @@ describe('format', () => { status: 'RUNNING', repo: 'owner/repo', issue_number: 1, + task_type: 'new_task', + pr_number: null, task_description: null, branch_name: 'bgagent/abc/fix', pr_url: null, @@ -109,6 +131,8 @@ describe('format', () => { status: 'RUNNING', repo: 'owner/repo', issue_number: null, + task_type: 'new_task', + pr_number: null, task_description: 'Fix the login bug', branch_name: 'bgagent/abc/fix', pr_url: null, @@ -126,6 +150,8 @@ describe('format', () => { status: 'RUNNING', repo: 'owner/repo', issue_number: 42, + task_type: 'new_task', + pr_number: null, task_description: null, branch_name: 'bgagent/abc/fix', pr_url: null, diff --git a/docs/design/AGENT_HARNESS.md b/docs/design/AGENT_HARNESS.md index 449cc79..6fb7425 100644 --- a/docs/design/AGENT_HARNESS.md +++ b/docs/design/AGENT_HARNESS.md @@ -14,7 +14,7 @@ Many AI assistants include an embedded agent harness. Those products provide bui The agent harness runs **inside the compute environment** (e.g. AgentCore Runtime MicroVM). The platform orchestrates the task and **hydrates context** (user message, GitHub issue, system instructions); the harness receives the assembled prompt and runs the **agent loop** (reason, plan, call tools, repeat) until the task is done or the session ends. -- **Behavioral contract** — The platform defines **what** the agent should do (commit often, create PR, branch naming, error handling, etc.) via the **system prompt**, which is assembled by the context hydration step and passed into the session. The harness is the **execution framework**; it does not define policy. See the architecture and planning docs for the full agent behavioral contract. Deterministic hooks run to execute steps. +- **Behavioral contract** — The platform defines **what** the agent should do via the **system prompt**, which is selected by task type and assembled in the agent container. The system prompt is structured as a shared base template (`agent/prompts/base.py`) with per-task-type workflow sections: `new_task` (create branch, implement, create PR), `pr_iteration` (read review feedback, address, push to existing branch, comment on PR), and `pr_review` (read-only analysis of PR changes, post structured review comments via the GitHub Reviews API). The harness is the **execution framework**; it does not define policy. See the architecture and planning docs for the full agent behavioral contract. Deterministic hooks run to execute steps. - **Execution model** — Tasks are **fully unattended** and **one-shot**: the user submits a task, the harness runs to completion or failure with no mid-task human interaction. The harness must support long-running execution (hours) and a single continuous loop. On AgentCore Runtime, the harness entrypoint must not block (the agent loop runs in a separate thread so the health ping can respond); the platform or harness adapter is responsible for that pattern. **Important:** The agent thread uses `asyncio.run()` with the stdlib asyncio event loop. The uvicorn server is configured with `--loop asyncio` to avoid uvloop, which conflicts with subprocess SIGCHLD handling when multiple event loops run in different threads. - **Result** — The agent does not call back to the platform; it follows the contract (push work, create PR) and exits. The platform infers success or failure from the PR and branch state via the GitHub API. diff --git a/docs/design/API_CONTRACT.md b/docs/design/API_CONTRACT.md index 2f034fa..41bb83a 100644 --- a/docs/design/API_CONTRACT.md +++ b/docs/design/API_CONTRACT.md @@ -122,7 +122,9 @@ POST /v1/tasks |---|---|---|---| | `repo` | String | Yes | GitHub repository in `owner/repo` format. | | `issue_number` | Number | No | GitHub issue number. If provided, the issue title, body, and comments are fetched during context hydration. | -| `task_description` | String | No | Free-text task description. At least one of `issue_number` or `task_description` must be provided. | +| `task_description` | String | No | Free-text task description. At least one of `issue_number`, `task_description`, or `pr_number` must be provided. | +| `task_type` | String | No | Task type: `new_task` (default), `pr_iteration`, or `pr_review`. When `pr_iteration`, the agent iterates on an existing PR. When `pr_review`, the agent performs a read-only review and posts structured comments. | +| `pr_number` | Number | No | Pull request number to iterate on or review. Required when `task_type` is `pr_iteration` or `pr_review`; rejected otherwise. For `pr_iteration`, the agent checks out the PR's branch, reads review feedback, addresses it, and pushes back. For `pr_review`, the agent checks out the PR's branch, analyzes changes read-only, and posts a structured review. | | `max_turns` | Number | No | Maximum agent turns (1–500). Controls how many reasoning/tool-call iterations the agent can perform. Defaults to 100 if omitted. | | `max_budget_usd` | Number | No | Maximum cost budget in USD (0.01–100). When reached, the agent stops regardless of remaining turns. If omitted, no budget limit is applied (turn limit and session timeout still apply). | | `attachments` | Array | No | Multi-modal attachments (images, files). See Attachments schema below. | @@ -169,18 +171,22 @@ POST /v1/tasks "task_id": "01HYX...", "status": "SUBMITTED", "repo": "org/myapp", + "task_type": "new_task", "issue_number": 42, + "pr_number": null, "branch_name": "bgagent/01HYX.../fix-auth-bug", "created_at": "2025-03-15T10:30:00Z" } } ``` +For `pr_iteration` and `pr_review` tasks, `branch_name` is initially set to `pending:pr_resolution` and resolved to the PR's `head_ref` during context hydration. + **Error responses:** | Status | Code | Condition | |---|---|---| -| `400` | `VALIDATION_ERROR` | Missing required fields, invalid repo format, no task description or issue, invalid `max_turns` (not an integer or outside 1–500 range), invalid `max_budget_usd` (not a number or outside 0.01–100 range). | +| `400` | `VALIDATION_ERROR` | Missing required fields, invalid repo format, no task description or issue or PR number, invalid `task_type`, `pr_number` provided without `task_type: 'pr_iteration'` or `'pr_review'`, `pr_number` missing when `task_type` is `pr_iteration` or `pr_review`, invalid `max_turns` (not an integer or outside 1–500 range), invalid `max_budget_usd` (not a number or outside 0.01–100 range). | | `401` | `UNAUTHORIZED` | Missing or invalid auth token. | | `409` | `DUPLICATE_TASK` | Idempotency key matches an existing task (returns the existing task in `data`). | | `422` | `REPO_NOT_ONBOARDED` | Repository is not registered with the platform. Repos are onboarded via CDK deployment (`Blueprint` construct), not via a runtime API. See [REPO_ONBOARDING.md](./REPO_ONBOARDING.md). | @@ -210,7 +216,9 @@ GET /v1/tasks/{task_id} "task_id": "01HYX...", "status": "RUNNING", "repo": "org/myapp", + "task_type": "new_task", "issue_number": 42, + "pr_number": null, "task_description": "Fix the authentication bug in the login flow", "branch_name": "bgagent/01HYX.../fix-auth-bug", "session_id": "sess-uuid", @@ -231,6 +239,8 @@ GET /v1/tasks/{task_id} | Field | Type | Description | |---|---|---| +| `task_type` | String | Task type: `new_task`, `pr_iteration`, or `pr_review`. | +| `pr_number` | Number or null | Pull request number being iterated on or reviewed. Only set for `pr_iteration` and `pr_review` tasks. | | `max_turns` | Number or null | Maximum agent turns for this task. Always present in the response — reflects the effective value (user-specified or platform default of 100). | | `max_budget_usd` | Number or null | Maximum cost budget in USD for this task. Null if no budget limit was specified. | @@ -270,7 +280,9 @@ GET /v1/tasks "task_id": "01HYX...", "status": "RUNNING", "repo": "org/myapp", + "task_type": "new_task", "issue_number": 42, + "pr_number": null, "task_description": "Fix the authentication bug...", "branch_name": "bgagent/01HYX.../fix-auth-bug", "pr_url": null, @@ -387,7 +399,7 @@ GET /v1/tasks/{task_id}/events **Event types** (see [OBSERVABILITY.md](./OBSERVABILITY.md) for the full list): -**Fixed event types:** `task_created`, `admission_passed`, `admission_rejected`, `preflight_failed`, `hydration_started`, `hydration_complete`, `session_started`, `session_ended`, `pr_created`, `task_completed`, `task_failed`, `task_cancelled`, `task_timed_out` +**Fixed event types:** `task_created`, `admission_passed`, `admission_rejected`, `preflight_failed`, `hydration_started`, `hydration_complete`, `session_started`, `session_ended`, `pr_created`, `pr_updated`, `task_completed`, `task_failed`, `task_cancelled`, `task_timed_out` **Step-level event types** (from the blueprint framework): The orchestrator emits events for each pipeline step following the pattern `{step_name}_{started|completed|failed}`. For built-in steps these overlap with the fixed types above (e.g. `hydration_started`). For custom Lambda steps (see [REPO_ONBOARDING.md](./REPO_ONBOARDING.md)), the step name is user-defined (e.g. `sast-scan_started`, `sast-scan_completed`, `prepare-environment_failed`). Step event `metadata` includes `StepOutput.metadata` from the step execution. @@ -539,7 +551,7 @@ Creates a task via webhook. Uses HMAC-SHA256 authentication instead of Cognito J POST /v1/webhooks/tasks ``` -**Request body:** Same as `POST /v1/tasks` (see [Create task](#create-task)). +**Request body:** Same as `POST /v1/tasks` (see [Create task](#create-task)), including `task_type` and `pr_number` fields. **Required headers:** @@ -566,7 +578,7 @@ HMAC verification is performed by the handler (not the authorizer) because API G | Status | Code | Condition | |---|---|---| -| `400` | `VALIDATION_ERROR` | Missing required fields, invalid repo format, no task description or issue, invalid `max_turns`, invalid `max_budget_usd`. | +| `400` | `VALIDATION_ERROR` | Missing required fields, invalid repo format, no task description or issue or PR number, invalid `task_type`, invalid `pr_number`, invalid `max_turns`, invalid `max_budget_usd`. | | `401` | `UNAUTHORIZED` | Missing webhook headers, webhook not found, revoked, or invalid signature. | | `409` | `DUPLICATE_TASK` | Idempotency key matches an existing task. | @@ -601,6 +613,7 @@ Rate limit status is communicated via response headers (see Standard response he | `WEBHOOK_NOT_FOUND` | 404 | Webhook does not exist or belongs to a different user. | | `WEBHOOK_ALREADY_REVOKED` | 409 | Webhook is already revoked. | | `REPO_NOT_ONBOARDED` | 422 | Repository is not registered with the platform. Repos are onboarded via CDK deployment, not via a runtime API. There are no `/v1/repos` endpoints. | +| `PR_NOT_FOUND_OR_CLOSED` | 422 | For `pr_iteration` and `pr_review` tasks: the specified PR does not exist, is not open, or is not accessible with the configured GitHub token. Checked during the orchestrator's pre-flight step. | | `INVALID_STEP_SEQUENCE` | 500 | The blueprint's step sequence is invalid (missing required steps or incorrect ordering). This indicates a CDK configuration error that slipped past synth-time validation. Visible via `GET /v1/tasks/{id}` as `error_code`. See [REPO_ONBOARDING.md](./REPO_ONBOARDING.md#step-sequence-validation). | | `RATE_LIMIT_EXCEEDED` | 429 | User exceeded rate limit. | | `INTERNAL_ERROR` | 500 | Unexpected server error. Includes `request_id` for support. | diff --git a/docs/design/ARCHITECTURE.md b/docs/design/ARCHITECTURE.md index 94b50a4..7db666c 100644 --- a/docs/design/ARCHITECTURE.md +++ b/docs/design/ARCHITECTURE.md @@ -202,10 +202,13 @@ Each concept has a **source-of-truth document** and one or more documents that r | Adaptive model router | ROADMAP.md (Iter 5) | COST_MODEL.md | | Capability-based security | ROADMAP.md (Iter 5) | SECURITY.md | | Live session replay | ROADMAP.md (Iter 4) | API_CONTRACT.md | +| PR iteration task type | API_CONTRACT.md, ORCHESTRATOR.md | USER_GUIDE.md, PROMPT_GUIDE.md, SECURITY.md, AGENT_HARNESS.md | +| PR review task type | API_CONTRACT.md, ORCHESTRATOR.md | USER_GUIDE.md, PROMPT_GUIDE.md, SECURITY.md, AGENT_HARNESS.md | ### Per-repo model selection Different tasks and repos may benefit from different models. The `model_id` field in the blueprint config (see [ORCHESTRATOR.md](./ORCHESTRATOR.md)) allows per-repo overrides. Suggested defaults: -- **Implementation tasks:** Claude Sonnet 4 (good balance of quality and cost) -- **PR review tasks:** Claude Haiku (fast, cheap — review is read-only analysis) +- **Implementation tasks (`new_task`):** Claude Sonnet 4 (good balance of quality and cost) +- **PR iteration tasks (`pr_iteration`):** Claude Sonnet 4 (needs to understand review feedback and make code changes — similar complexity to implementation) +- **PR review tasks (`pr_review`):** Claude Haiku (fast, cheap — review is read-only analysis) - **Complex/critical repos:** Claude Opus 4 (highest quality, highest cost — opt-in per repo) \ No newline at end of file diff --git a/docs/design/ORCHESTRATOR.md b/docs/design/ORCHESTRATOR.md index c6ff944..150f1a0 100644 --- a/docs/design/ORCHESTRATOR.md +++ b/docs/design/ORCHESTRATOR.md @@ -166,7 +166,7 @@ See the Admission control section for details. Validates that the task is allowe #### Step 2: Context hydration (deterministic) -See the Context hydration section for details. Assembles the agent's prompt from multiple sources: user message, GitHub issue (title, body, comments), memory (relevant past context), repo configuration (system prompt template, rules), and platform defaults. The output is a fully assembled prompt and system prompt, ready to pass to the compute session. +See the Context hydration section for details. Assembles the agent's prompt from multiple sources depending on task type. For `new_task`: user message, GitHub issue (title, body, comments), memory, repo configuration, and platform defaults. For `pr_iteration`: PR metadata, review comments, diff summary, and optional user instructions. An additional **pre-flight** sub-step verifies PR accessibility when `pr_number` is set (see [preflight.ts](../../cdk/src/handlers/shared/preflight.ts)). The output is a fully assembled prompt, ready to pass to the compute session. #### Step 3: Session start and agent execution (deterministic start + agentic execution) @@ -246,20 +246,26 @@ Admission control runs immediately after the input gateway dispatches a "create Context hydration assembles the agent's user prompt from multiple sources. It runs as a deterministic step in the orchestrator Lambda after admission control and before session start. The goal is to perform I/O-bound work (GitHub API calls, Secrets Manager lookups) *before* expensive agent compute is consumed, enabling fast failure when external APIs are unavailable. -### Current implementation (Iteration 3a) +### Current implementation (Iteration 3a+) The orchestrator's `hydrateAndTransition()` function calls `hydrateContext()` (`src/handlers/shared/context-hydration.ts`) which: 1. **Resolves the GitHub token** from Secrets Manager (if `GITHUB_TOKEN_SECRET_ARN` is configured). The token is cached in a module-level variable with a 5-minute TTL for Lambda execution context reuse. -2. **Fetches the GitHub issue** (title, body, comments) via the GitHub REST API using native `fetch()` — if `issue_number` is present on the task and a token is available. +2. **Fetches external context** based on task type: + - **`new_task`**: Fetches the GitHub issue (title, body, comments) via the GitHub REST API if `issue_number` is present. + - **`pr_iteration`** / **`pr_review`**: Fetches the pull request context via `fetchGitHubPullRequest()` — four parallel calls: three REST API calls (PR metadata, conversation comments, changed files) plus one GraphQL query for inline review comments. The GraphQL query filters out resolved review threads at fetch time so the agent only sees unresolved feedback. PR metadata includes title, body, head/base refs, and state; the diff summary covers changed files. The PR's `head_ref` is stored as `resolved_branch_name` and `base_ref` as `resolved_base_branch` on the hydrated context. These are used by the orchestrator to update the task record's `branch_name` from the placeholder `pending:pr_resolution` to the actual PR branch. For `pr_review`, if no `task_description` is provided, a default review instruction is used. 3. **Enforces a token budget** on the combined context. Uses a character-based heuristic (~4 chars per token). Default budget: 100K tokens (configurable via `USER_PROMPT_TOKEN_BUDGET` environment variable). When the budget is exceeded, oldest comments are removed first. The `truncated` flag is set in the result. -4. **Assembles the user prompt** — a structured markdown document containing Task ID, Repository, GitHub Issue section (title, body, comments), and Task section (user's task description or default instruction). The format mirrors the Python `assemble_prompt()` in `agent/entrypoint.py` for cross-language consistency. -5. **Returns a `HydratedContext` object** containing `version`, `user_prompt`, `issue`, `sources`, `token_estimate`, and `truncated`. +4. **Assembles the user prompt** based on task type: + - **`new_task`**: A structured markdown document with Task ID, Repository, GitHub Issue section, and Task section. The format mirrors the Python `assemble_prompt()` in `agent/entrypoint.py`. + - **`pr_iteration`**: Assembled by `assemblePrIterationPrompt()` — includes PR metadata (number, title, body), the diff summary (changed files and patches), review comments (inline and conversation), and optional user instructions from `task_description`. +5. **Returns a `HydratedContext` object** containing `version`, `user_prompt`, `issue`, `sources`, `token_estimate`, `truncated`, and for `pr_iteration`/`pr_review` tasks: `resolved_branch_name` and `resolved_base_branch`. The hydrated context is passed to the agent as a new `hydrated_context` field in the invocation payload, alongside the existing legacy fields (`repo_url`, `task_id`, `branch_name`, `issue_number`, `prompt`). The agent checks for `hydrated_context` with `version == 1`; if present, it uses the pre-assembled `user_prompt` directly and skips in-container GitHub fetching and prompt assembly. If absent (e.g. during a deployment rollout or when the secret ARN isn't configured), the agent falls back to its existing behavior. **Graceful degradation:** If any step fails (Secrets Manager unavailable, GitHub API error, network timeout), the orchestrator proceeds with whatever context is available. The worst case is a minimal prompt with just the task ID and repository — the agent can still attempt its own GitHub fetch as a fallback via the legacy `issue_number` field. +**PR iteration branch resolution:** After hydration, if `resolved_branch_name` is present on the hydrated context, the orchestrator updates the task record's `branch_name` in DynamoDB from the placeholder (`pending:pr_resolution`) to the PR's actual `head_ref`. This ensures the task record always reflects the real branch name that the agent will push to. + ### Hydration events The orchestrator emits two task events during hydration: @@ -277,7 +283,9 @@ We evaluated routing GitHub API calls through AgentCore Gateway (with the GitHub 2. **Repo configuration (Iteration 3+).** Per-repo rules, instructions, or context loaded from the onboarding store. This can include static artifacts discovered during onboarding (e.g. content from `.cursor/rules`, `CLAUDE.md`, `CONTRIBUTING.md`) and dynamic artifacts generated by the onboarding pipeline (e.g. codebase summaries, dependency graphs). See [REPO_ONBOARDING.md](./REPO_ONBOARDING.md). -3. **GitHub issue context.** If the task references a GitHub issue: fetch the issue title, body, and comments via the GitHub REST API. **Now done in the orchestrator** (`fetchGitHubIssue` in `src/handlers/shared/context-hydration.ts`), not in the agent container. +3. **GitHub issue context** (`new_task`). If the task references a GitHub issue: fetch the issue title, body, and comments via the GitHub REST API. **Now done in the orchestrator** (`fetchGitHubIssue` in `src/handlers/shared/context-hydration.ts`), not in the agent container. + +3b. **Pull request context** (`pr_iteration`, `pr_review`). If the task references a PR (`pr_number` set): fetch the PR metadata, conversation comments, and changed files via REST API, and inline review comments via GraphQL (which filters out resolved threads at fetch time) — four parallel calls total via `fetchGitHubPullRequest()`. The PR's `head_ref` and `base_ref` are extracted for branch resolution. Review comments and diff are formatted into the user prompt so the agent understands the feedback to address. 4. **User message.** The free-text task description provided by the user (via CLI `--task` flag or equivalent). May supplement or replace the issue context. @@ -289,36 +297,71 @@ We evaluated routing GitHub API calls through AgentCore Gateway (with the GitHub The orchestrator assembles one artifact during hydration: -- **User prompt.** Assembled by `assembleUserPrompt()` in the orchestrator from the issue context and user message. Format: `Task ID: {id}\nRepository: {repo}\n\n## GitHub Issue #{n}: {title}\n...\n\n## Task\n\n{description}`. This mirrors the Python `assemble_prompt()` function. +- **User prompt.** Assembled differently based on task type: + - **`new_task`**: `assembleUserPrompt()` — Format: `Task ID: {id}\nRepository: {repo}\n\n## GitHub Issue #{n}: {title}\n...\n\n## Task\n\n{description}`. This mirrors the Python `assemble_prompt()` function. + - **`pr_iteration`**: `assemblePrIterationPrompt()` — Format: `Task ID: {id}\nRepository: {repo}\n\n## Pull Request #{n}: {title}\n\n{body}\n\n### Changed Files\n...\n\n### Review Comments\n...\n\n## Additional Instructions\n\n{description}`. This provides the agent with the full PR context, diff summary, and reviewer feedback. + - **`pr_review`**: Uses `assemblePrIterationPrompt()` (same format as `pr_iteration`). If no task description is provided, defaults to "Review this pull request. Follow the workflow in your system instructions." -The system prompt is **not** assembled in the orchestrator — it remains in the agent container because it depends on `setup_repo()` output (`{setup_notes}` placeholder). In the target state, additional sections may be injected: repo-specific rules, memory-derived insights. +The system prompt is **not** assembled in the orchestrator — it remains in the agent container because it depends on `setup_repo()` output (`{setup_notes}` placeholder). The agent selects the appropriate system prompt template based on `task_type`: the `new_task` workflow (understand → implement → test → commit → create PR), the `pr_iteration` workflow (understand feedback → address → test → push → comment on PR), or the `pr_review` workflow (analyze changes → compose findings → post review comments → post summary). In the target state, additional sections may be injected: repo-specific rules, memory-derived insights. ### Payload contract ``` Legacy: { repo_url, task_id, branch_name, issue_number?, prompt? } -Current: { repo_url, task_id, branch_name, issue_number?, prompt?, hydrated_context } +Current: { repo_url, task_id, branch_name, issue_number?, prompt?, task_type, pr_number?, base_branch?, hydrated_context } +``` + +For `new_task` (default): +```json +{ + "repo_url": "owner/repo", + "task_id": "01HYX...", + "branch_name": "bgagent/01HYX.../fix-auth-bug", + "task_type": "new_task", + "hydrated_context": { + "version": 1, + "user_prompt": "Task ID: ...\nRepository: ...\n\n## GitHub Issue #42: ...", + "issue": { "number": 42, "title": "...", "body": "...", "comments": [...] }, + "sources": ["issue", "task_description"], + "token_estimate": 1250, + "truncated": false + } +} ``` -Where `hydrated_context`: +For `pr_iteration`: ```json { - "version": 1, - "user_prompt": "Task ID: ...\nRepository: ...\n\n## GitHub Issue #42: ...", - "issue": { "number": 42, "title": "...", "body": "...", "comments": [...] }, - "sources": ["issue", "task_description"], - "token_estimate": 1250, - "truncated": false + "repo_url": "owner/repo", + "task_id": "01HYX...", + "branch_name": "feature/my-branch", + "task_type": "pr_iteration", + "pr_number": 42, + "base_branch": "main", + "hydrated_context": { + "version": 1, + "user_prompt": "Task ID: ...\nRepository: ...\n\n## Pull Request #42: ...\n\n### Review Comments\n...", + "sources": ["pr_context", "task_description"], + "token_estimate": 3400, + "truncated": false, + "resolved_branch_name": "feature/my-branch", + "resolved_base_branch": "main" + } } ``` +The `branch_name` for `pr_iteration` and `pr_review` tasks is the PR's `head_ref` (resolved during hydration), not a generated `bgagent/...` branch. The `base_branch` field is populated from the PR's `base_ref` so the agent knows the merge target. + ### Token budget The orchestrator enforces a token budget on the user prompt before assembly: - **Estimation heuristic:** `Math.ceil(text.length / 4)` (~4 characters per token). - **Default budget:** 100,000 tokens (configurable via `USER_PROMPT_TOKEN_BUDGET` CDK prop / environment variable). -- **Truncation strategy:** When the combined estimated token count (issue body + comments + task description) exceeds the budget, oldest comments are removed first. If still over budget after removing all comments, the issue body and task description are kept as-is (they are assumed to be essential). The `truncated` flag is set in the hydrated context metadata. +- **Truncation strategy:** Differs by task type: + - **`new_task`:** When the combined estimated token count (issue body + comments + task description) exceeds the budget, oldest comments are removed first. If still over budget after removing all comments, the issue body and task description are kept as-is (they are assumed to be essential). + - **`pr_iteration`/`pr_review`:** When the assembled PR prompt exceeds the budget, oldest issue comments are trimmed first (conversation comments on the PR), then oldest review comments (inline code review comments). The PR metadata, diff summary, and user instructions are preserved. + - The `truncated` flag is set in the hydrated context metadata when truncation occurs. - The agent harness handles its own context compaction during the run for multi-turn conversations. --- @@ -791,9 +834,11 @@ The primary table for task state. DynamoDB. | `user_id` | String | Cognito sub or mapped platform user ID. | | `status` | String | Current state (see state machine). | | `repo` | String | GitHub owner/repo (e.g. `org/myapp`). | +| `task_type` | String | Task type: `new_task` (default), `pr_iteration`, or `pr_review`. Determines the agent workflow (create new PR, iterate on existing PR, or review a PR). | | `issue_number` | Number (optional) | GitHub issue number, if task is issue-based. | -| `task_description` | String (optional) | Free-text task description. | -| `branch_name` | String | Agent branch: `bgagent/{task_id}/{slug}`. | +| `pr_number` | Number (optional) | Pull request number, required when task type is `pr_iteration` or `pr_review`. | +| `task_description` | String (optional) | Free-text task description. For `pr_iteration`/`pr_review`, used as additional instructions alongside PR context. | +| `branch_name` | String | Agent branch. For `new_task`: `bgagent/{task_id}/{slug}`. For `pr_iteration`/`pr_review`: initially `pending:pr_resolution`, resolved to the PR's `head_ref` during context hydration. | | `session_id` | String (optional) | AgentCore runtime session ID, set when session is started. | | `execution_id` | String (optional) | Lambda durable execution ID, set when the orchestrator starts. | | `pr_url` | String (optional) | Pull request URL, set during finalization. | diff --git a/docs/design/SECURITY.md b/docs/design/SECURITY.md index 4320f04..adef99e 100644 --- a/docs/design/SECURITY.md +++ b/docs/design/SECURITY.md @@ -195,10 +195,11 @@ AgentCore Memory has **no native backup mechanism**. This is a significant gap f ## Known limitations - **Single GitHub OAuth token** — one token may be shared for all users and repos the platform can access. Any authenticated user can trigger agent work against any repo that token can access. There is no per-user repo scoping. -- **Guardrails are input-only** — the `PROMPT_ATTACK` filter screens task descriptions at submission. No guardrails are applied to model output during agent execution or to review feedback entering the memory system. +- **Guardrails are input-only** — the `PROMPT_ATTACK` filter screens task descriptions at submission. No guardrails are applied to model output during agent execution or to review feedback entering the memory system. For `pr_iteration` and `pr_review` tasks, the PR context (review comments, diff, PR body, issue comments) fetched during context hydration is **not** screened by Bedrock Guardrails — it bypasses the submission-time filter entirely. A `pr_iteration` or `pr_review` task submitted without a `task_description` receives no guardrail screening at all. - **No memory content validation** — retrieved memory records are injected into the agent's context without sanitization, injection pattern scanning, or trust scoring. This is the most critical memory security gap (OWASP ASI06). See [MEMORY.md](./MEMORY.md#memory-security-analysis) for the full gap analysis and [ROADMAP.md Iteration 3e](../guides/ROADMAP.md) for the remediation plan. - **No memory provenance or integrity checking** — memory entries carry no source attribution, content hashing, or trust metadata. The system cannot distinguish agent-generated memory from externally-influenced content. - **GitHub issue content as untrusted input** — issue bodies and comments (attacker-controlled) are injected into the agent's context during hydration without trust differentiation. +- **PR review comments as untrusted input** — for `pr_iteration` and `pr_review` tasks, review comments, PR body, and conversation comments are fetched and injected into the agent's context. These are attacker-controlled inputs subject to the same prompt injection risks as issue comments. Unlike task descriptions, PR context is not screened by the Bedrock Guardrails `PROMPT_ATTACK` filter. For `pr_review` tasks, defense-in-depth mitigates the risk: the agent runs without `Write` or `Edit` tools, so even if prompt injection succeeds, the agent cannot modify files or push code. - **No memory rollback or quarantine** — the 365-day AgentCore Memory expiration is the only cleanup mechanism. There is no snapshot, rollback, or quarantine capability for suspected poisoned entries. - **No MFA** — Cognito MFA is disabled (CLI-based auth flow). Should be enabled for production deployments. - **No customer-managed KMS** — all encryption at rest uses AWS-managed keys. Customer-managed KMS can be added if required by compliance policy. diff --git a/docs/guides/PROMPT_GUIDE.md b/docs/guides/PROMPT_GUIDE.md index 62da237..cd77d9c 100644 --- a/docs/guides/PROMPT_GUIDE.md +++ b/docs/guides/PROMPT_GUIDE.md @@ -15,10 +15,11 @@ This guide covers how to write effective task descriptions, common anti-patterns When you submit a task, the platform does not pass your input directly to the agent. Instead, it goes through a **context hydration** step — a distinct phase in the task lifecycle (you'll see the task status change to `HYDRATING`) where the platform fetches external data and assembles the full prompt on your behalf. During hydration: - If you provided `--issue`, the platform calls the GitHub API to fetch the issue title, body, and comments. -- Your task description, the issue content, and task metadata are combined into a single **user prompt**. -- If the assembled prompt exceeds the token budget, older issue comments are trimmed to fit. +- If you provided `--pr`, the platform fetches the PR metadata, conversation comments, and changed files via the REST API, and inline review comments via the GraphQL API — four parallel calls. Resolved review threads are filtered out at fetch time so the agent only sees unresolved feedback. +- Your task description, the issue/PR content, and task metadata are combined into a single **user prompt**. +- If the assembled prompt exceeds the token budget, older comments are trimmed to fit. -The hydrated prompt is then passed to the agent alongside a fixed **system prompt**. Understanding this assembly helps you write better descriptions — you control what goes in, but the platform decides the final shape. +The hydrated prompt is then passed to the agent alongside a **system prompt** selected by task type. For `new_task`, the system prompt instructs the agent to create a branch, implement changes, and open a new PR. For `pr_iteration`, it instructs the agent to read review feedback, address it, push to the existing branch, and comment on the PR. For `pr_review`, it instructs the agent to analyze the PR's changes and post structured review comments without modifying code. Understanding this assembly helps you write better descriptions — you control what goes in, but the platform decides the final shape. ### What the agent receives @@ -43,9 +44,31 @@ Repository: owner/repo [your task description, if provided] ``` +For `pr_iteration` tasks (when using `--pr`) and `pr_review` tasks (when using `--review-pr`), the user prompt has a different structure: + +``` +Task ID: bgagent-01HYX... +Repository: owner/repo + +## Pull Request #42: Fix login timeout on slow connections +[PR body] + +### Changed Files +- src/auth.ts (+12, -3) +- src/middleware.ts (+5, -1) + +### Review Comments +**@alice** (src/auth.ts:88): The timeout should be configurable... +**@bob** (general): Consider adding a retry mechanism... + +## Additional Instructions +[your task description, if provided] +``` + The user prompt includes: - **Task ID** and **Repository** — always present. -- **GitHub Issue** (title, body, and comments) — included when you use `--issue`. +- **GitHub Issue** (title, body, and comments) — included when you use `--issue` (`new_task`). +- **Pull Request context** (title, body, diff, review comments) — included when you use `--pr` (`pr_iteration`) or `--review-pr` (`pr_review`). - **Task description** — included when you use `--task`. ### Token budget @@ -124,16 +147,24 @@ Both are active simultaneously. Blueprint overrides are part of the system promp ## Choosing the right input mode -You must provide at least one of `--issue` or `--task` (or both). +You must provide at least one of `--issue`, `--task`, `--pr`, or `--review-pr`. | Mode | When to use | Example | |---|---|---| | `--issue` only | The GitHub issue is well-written with clear requirements, reproduction steps, and acceptance criteria. | `bgagent submit --repo owner/repo --issue 42` | | `--task` only | Ad-hoc task not tied to an issue, or the issue doesn't exist yet. | `bgagent submit --repo owner/repo --task "Add rate limiting to the /search endpoint"` | | `--issue` + `--task` | The issue exists but needs clarification, scope narrowing, or additional instructions. | `bgagent submit --repo owner/repo --issue 42 --task "Focus only on the timeout in the OAuth flow. Don't change the retry logic."` | +| `--pr` only | A PR has review feedback that needs addressing. The agent reads the diff, review comments, and pushes fixes. | `bgagent submit --repo owner/repo --pr 42` | +| `--pr` + `--task` | A PR has review feedback, and you want to provide additional instructions or scope the work. | `bgagent submit --repo owner/repo --pr 42 --task "Focus on the null check Alice flagged in the auth module"` | +| `--review-pr` only | You want a structured code review of an existing PR. The agent reads the changes and posts review comments without modifying code. | `bgagent submit --repo owner/repo --review-pr 42` | +| `--review-pr` + `--task` | You want a focused review of specific aspects of a PR. | `bgagent submit --repo owner/repo --review-pr 42 --task "Focus on security issues and error handling"` | **When to combine both:** Use `--issue` + `--task` when you want the agent to see the full issue context (including comments from other contributors) but need to override or narrow the scope. Your `--task` text appears after the issue content, so it acts as the final instruction. +**PR iteration:** Use `--pr` when a reviewer has left feedback on an existing PR. The agent checks out the PR's branch, reads all review comments and the current diff, makes targeted changes to address the feedback, and pushes back to the same branch. The `--task` flag is optional but useful for narrowing scope (e.g., "Only address the security concern, not the style nits"). + +**PR review:** Use `--review-pr` when you want the agent to analyze a PR and post structured review comments without modifying any code. The agent reads the full source files, runs the build for analysis, and posts findings using a structured format (type, severity, description, proposed fix, AI prompt). The `--task` flag is optional but useful for focusing the review (e.g., "Focus on security issues"). + ## Writing effective task descriptions ### Describe the end state, not the steps @@ -245,6 +276,8 @@ The `--max-turns` flag (API field: `max_turns`) controls how many agent turns (m | Bug fix with clear reproduction | 50–100 | The agent needs to understand the issue, find the root cause, implement the fix, add tests, and verify. | | New feature (single module) | 100–200 | More exploration, implementation, and testing. Default of 100 is usually sufficient. | | Large refactoring or multi-file feature | 200–500 | Extensive codebase exploration and many file changes. Consider whether the task should be split instead. | +| PR iteration (address review feedback) | 30–100 | The agent reads the existing diff and review comments, makes targeted changes, and pushes. Typically fewer turns than a new task since the scope is narrower. | +| PR review (code review) | 30–80 | The agent reads the diff and source files, runs the build for analysis, and posts review comments. No code changes, so fewer turns needed. | If a task consistently times out or uses all turns without finishing, consider whether the task description is too broad. Splitting into smaller, focused tasks is usually more effective than increasing the turn limit. @@ -340,3 +373,22 @@ The fix should target screen widths below 768px. Use the existing breakpoint variables in src/styles/variables.css. " ``` + +### PR iteration (address review feedback) + +```bash +bgagent submit --repo acme/api-server --pr 95 +``` + +When the review feedback is broad and you want the agent to focus: + +```bash +bgagent submit --repo acme/api-server --pr 95 --task " +Address only the security concerns flagged by @alice: +- The SQL injection risk in the search query +- The missing CSRF token on the form submission + +Ignore the style suggestions for now — those will be addressed +in a follow-up. +" +``` diff --git a/docs/guides/ROADMAP.md b/docs/guides/ROADMAP.md index 0718ab7..5a00c00 100644 --- a/docs/guides/ROADMAP.md +++ b/docs/guides/ROADMAP.md @@ -160,7 +160,7 @@ These practices apply continuously across iterations and are not treated as one- - **Tier 2 — Code quality analysis** — Static analysis of the agent's diff against code quality principles: DRY (duplicated code detection), SOLID violations, design pattern adherence, complexity metrics (cyclomatic, cognitive), naming conventions, and repo-specific style rules (from onboarding config). Implemented as an LLM-based review step or a combination of static analysis tools (e.g. SonarQube rules, custom linters) and LLM judgment. Produces structured findings (severity, location, rule, suggestion) that the agent can act on in a fix cycle. Findings below a configurable severity threshold are advisory (included in the PR as comments) rather than blocking. - **Tier 3 — Risk and blast radius analysis** — Analyze the scope and impact of the agent's changes to detect unintended side effects in other parts of the codebase. Includes: dependency graph analysis (what modules/functions consume the changed code), change surface area (number of files, lines, and modules touched), semantic impact assessment (does the change alter public APIs, shared types, configuration, or database schemas), and regression risk scoring. Produces a **risk level** (low / medium / high / critical) attached to the PR as a label and included in the validation report. High-risk changes may require explicit human approval before merge (foundation for the HITL approval mode in Iteration 6). The risk level considers: number of downstream dependents affected, whether the change touches shared infrastructure or core abstractions, test coverage of the affected area, and whether the change introduces new external dependencies. - **PR risk level and validation report** — Every agent-created PR includes a structured **validation report** (as a PR comment or check run) summarizing: Tier 1 results (pass/fail per tool), Tier 2 findings (code quality issues by severity), Tier 3 risk assessment (risk level, blast radius summary, affected modules). The PR is labeled with the computed risk level (`risk:low`, `risk:medium`, `risk:high`, `risk:critical`). Risk level is persisted in the task record for evaluation and trending. See [EVALUATION.md](../design/EVALUATION.md#pr-risk-level). -- **Other task types: PR review** — Support at least one additional task type beyond "implement from issue": **review pull request** (read-only or comment-only). The agent reads the PR diff, runs analysis (tests, lint, code review heuristics), and posts review comments. This uses a different blueprint (no branch creation, no PR creation — just analysis and comments) and a different system prompt. It validates that the platform is not hardwired to a single task type. +- [x] **Other task types: PR review and PR-iteration** — Support additional task types beyond "implement from issue": **iterate on pull request** (`pr_iteration`) reads review comments and addresses them (implement changes, push updates, post summary). **Review pull request** (`pr_review`) is a read-only task type where the agent analyzes a PR's changes and posts structured review comments via the GitHub Reviews API. The `pr_review` agent runs without `Write` or `Edit` tools (defense-in-depth), skips `ensure_committed` and push, and treats build status as informational only. Each review comment uses a structured format: type (comment/question/issue/good_point), severity for issues (minor/medium/major/critical), title, description with memory attribution, proposed fix, and a ready-to-use AI prompt. The CLI exposes `--review-pr ` (mutually exclusive with `--pr`). - **Multi-modal input** — Accept text and images (or other modalities) in the task payload; pass through to the agent. Gateway and schema support it; agent harness supports it where available. Primary use case: screenshots of bugs, UI mockups, or design specs attached to issues. **Builds on Iteration 3b:** Memory is operational; this iteration changes the orchestrator blueprint (tiered validation pipeline, new task type) and broadens the input schema. These are independently testable from memory. @@ -288,7 +288,7 @@ Deep research identified **9 memory-layer security gaps** in the current archite - **Iteration 2** — Production orchestrator, API contract, task management (list/status/cancel), durable execution, observability, threat model, network isolation, basic cost guardrails, CI/CD. - **Iteration 3a** — Repo onboarding, DNS Firewall (domain-level egress filtering), webhook trigger, GitHub Actions, per-repo customization (prompt from repo), data retention, turn/iteration caps, cost budget caps, user prompt guide, agent harness improvements (turn budget, default branch, safety net, lint, softened conventions), operator dashboard, WAF, model invocation logging, input length limits. - **Iteration 3b** ✅ — Memory Tier 1 (repo knowledge, task episodes), insights, agent self-feedback, prompt versioning, per-prompt commit attribution. CDK L2 construct with named semantic + episodic strategies using namespace templates (`/{actorId}/knowledge/`, `/{actorId}/episodes/{sessionId}/`), fail-open memory load/write, orchestrator fallback episode, SHA-256 prompt hashing, git trailer attribution. -- **Iteration 3c** — Per-repo GitHub App credentials, orchestrator pre-flight checks (fail-closed before session start), persistent session storage for select caches (AgentCore Runtime `/mnt/workspace` mount for npm/Claude config; mise/uv/repo on local disk due to FUSE `flock()` limitation), pre-execution task risk classification (model/limits/approval policy selection), tiered validation pipeline (tool validation, code quality analysis, post-execution risk/blast radius analysis), PR risk level, PR review task type, multi-modal input. +- **Iteration 3c** — Per-repo GitHub App credentials, orchestrator pre-flight checks (fail-closed before session start), persistent session storage for select caches (AgentCore Runtime `/mnt/workspace` mount for npm/Claude config; mise/uv/repo on local disk due to FUSE `flock()` limitation), pre-execution task risk classification (model/limits/approval policy selection), tiered validation pipeline (tool validation, code quality analysis, post-execution risk/blast radius analysis), PR risk level, PR review task type (`pr_review` — read-only structured review with tool restriction, defense-in-depth enforcement, CLI `--review-pr` flag), multi-modal input. - **Iteration 3d** — Review feedback memory loop (Tier 2), PR outcome tracking, evaluation pipeline (basic). - **Iteration 3e** — Memory security and integrity: input hardening (content sanitization, provenance tagging, integrity hashing), trust-aware retrieval (trust scoring, temporal decay, guardian validation), detection and response (anomaly detection, circuit breaker, quarantine, rollback), advanced protections (write-ahead validation, behavioral drift detection, cryptographic provenance, red teaming). Addresses OWASP ASI06 (Memory & Context Poisoning). - **Iteration 3bis** (hardening) — Orchestrator IAM grant for Memory (was silently AccessDenied), memory schema versioning (`schema_version: "2"`), Python repo format validation, severity-aware error logging in Python memory, narrowed entrypoint try-catch, orchestrator fallback episode observability, conditional writes in agent task_state.py (ConditionExpression guards), orchestrator Lambda error alarm (CloudWatch, retryAttempts: 0), concurrency counter reconciliation (scheduled Lambda, drift correction), multi-AZ NAT documentation (already configurable), Python unit tests (pytest), entrypoint decomposition (4 extracted subfunctions), dual prompt assembly deprecation docstring, graceful thread drain in server.py (shutdown hook + atexit), dead QUEUED state removal (8 states, 4 active). diff --git a/docs/guides/USER_GUIDE.md b/docs/guides/USER_GUIDE.md index bf52e28..bb96cb7 100644 --- a/docs/guides/USER_GUIDE.md +++ b/docs/guides/USER_GUIDE.md @@ -111,6 +111,16 @@ When you specify `--max-turns` (CLI) or `max_turns` (API) on a task, your value The Task API exposes 5 endpoints under the base URL from the `ApiUrl` stack output. +### Task types + +The platform supports three task types: + +| Type | Description | Outcome | +|---|---|---| +| `new_task` (default) | Create a new branch, implement changes, and open a new PR. | New pull request | +| `pr_iteration` | Check out an existing PR's branch, read review feedback, address it, and push updates. | Updated pull request | +| `pr_review` | Check out an existing PR's branch, analyze the changes read-only, and post a structured review. | Review comments on the PR | + ### Create a task ```bash @@ -145,6 +155,42 @@ curl -X POST "$API_URL/tasks" \ -d '{"repo": "owner/repo", "issue_number": 42}' ``` +To iterate on an existing pull request (address review feedback): + +```bash +curl -X POST "$API_URL/tasks" \ + -H "Authorization: $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"repo": "owner/repo", "task_type": "pr_iteration", "pr_number": 42}' +``` + +You can optionally include `task_description` with `pr_iteration` to provide additional instructions alongside the review feedback: + +```bash +curl -X POST "$API_URL/tasks" \ + -H "Authorization: $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"repo": "owner/repo", "task_type": "pr_iteration", "pr_number": 42, "task_description": "Focus on the null check Alice flagged in the auth module"}' +``` + +To request a read-only review of an existing pull request: + +```bash +curl -X POST "$API_URL/tasks" \ + -H "Authorization: $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"repo": "owner/repo", "task_type": "pr_review", "pr_number": 55}' +``` + +You can optionally include `task_description` with `pr_review` to focus the review on specific areas: + +```bash +curl -X POST "$API_URL/tasks" \ + -H "Authorization: $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"repo": "owner/repo", "task_type": "pr_review", "pr_number": 55, "task_description": "Focus on security implications and error handling"}' +``` + **Request body fields:** | Field | Type | Required | Description | @@ -152,6 +198,8 @@ curl -X POST "$API_URL/tasks" \ | `repo` | string | Yes | GitHub repository in `owner/repo` format | | `issue_number` | number | One of these | GitHub issue number | | `task_description` | string | is required | Free-text task description | +| `pr_number` | number | | PR number to iterate on or review (required for `pr_iteration` and `pr_review`) | +| `task_type` | string | No | `new_task` (default), `pr_iteration`, or `pr_review`. | | `max_turns` | number | No | Maximum agent turns (1–500). Overrides the per-repo Blueprint default. Platform default: 100. | | `max_budget_usd` | number | No | Maximum cost budget in USD (0.01–100). When reached, the agent stops regardless of remaining turns. Overrides the per-repo Blueprint default. If omitted, no budget limit is applied. | @@ -253,6 +301,18 @@ node lib/bin/bgagent.js submit --repo owner/repo --issue 42 # From a text description node lib/bin/bgagent.js submit --repo owner/repo --task "Add input validation to the /users POST endpoint" +# Iterate on an existing pull request (address review feedback) +node lib/bin/bgagent.js submit --repo owner/repo --pr 42 + +# Iterate on a PR with additional instructions +node lib/bin/bgagent.js submit --repo owner/repo --pr 42 --task "Focus on the null check Alice flagged" + +# Review an existing pull request (read-only — posts structured review comments) +node lib/bin/bgagent.js submit --repo owner/repo --review-pr 55 + +# Review a PR with a specific focus area +node lib/bin/bgagent.js submit --repo owner/repo --review-pr 55 --task "Focus on security and error handling" + # Submit and wait for completion node lib/bin/bgagent.js submit --repo owner/repo --issue 42 --wait ``` @@ -279,13 +339,15 @@ Created: 2026-04-01T00:39:51.271Z | `--repo` | GitHub repository (`owner/repo`). Required. | | `--issue` | GitHub issue number. | | `--task` | Task description text. | +| `--pr` | PR number to iterate on. Sets task type to `pr_iteration`. The agent checks out the PR's branch, reads review feedback, and pushes updates. | +| `--review-pr` | PR number to review. Sets task type to `pr_review`. The agent checks out the PR's branch, analyzes changes read-only, and posts structured review comments. | | `--max-turns` | Maximum agent turns (1–500). Overrides per-repo Blueprint default. Platform default: 100. | | `--max-budget` | Maximum cost budget in USD (0.01–100). Overrides per-repo Blueprint default. No default limit. | | `--idempotency-key` | Idempotency key for deduplication. | | `--wait` | Poll until the task reaches a terminal status. | | `--output` | Output format: `text` (default) or `json`. | -At least one of `--issue` or `--task` is required. +At least one of `--issue`, `--task`, `--pr`, or `--review-pr` is required. The `--pr` and `--review-pr` flags are mutually exclusive. ### Checking task status @@ -415,7 +477,7 @@ curl -X POST "$API_URL/webhooks/tasks" \ -d "$BODY" ``` -The request body is identical to `POST /v1/tasks` (same `repo`, `issue_number`, `task_description`, `max_turns`, `max_budget_usd` fields). The `Idempotency-Key` header is also supported. +The request body is identical to `POST /v1/tasks` (same `repo`, `issue_number`, `task_description`, `task_type`, `pr_number`, `max_turns`, `max_budget_usd` fields). The `Idempotency-Key` header is also supported. You can submit `pr_iteration` tasks via webhook to automate PR feedback loops, or `pr_review` tasks to trigger automated code reviews. **Example response** (same shape as a successful `POST /tasks` — `status` is `SUBMITTED`; session, PR, and cost fields are `null` until the run progresses): @@ -484,11 +546,13 @@ Each lifecycle transition is recorded as an audit event. Use the events endpoint curl "$API_URL/tasks//events" -H "Authorization: $TOKEN" ``` -Events include: `task_created`, `admission_rejected`, `preflight_failed`, `hydration_started`, `hydration_complete`, `session_started`, `task_completed`, `task_failed`, `task_cancelled`, `task_timed_out`. Event records are subject to the same 90-day retention as task records and are automatically deleted after that period. +Events include: `task_created`, `admission_rejected`, `preflight_failed`, `hydration_started`, `hydration_complete`, `session_started`, `pr_created`, `pr_updated`, `task_completed`, `task_failed`, `task_cancelled`, `task_timed_out`. Event records are subject to the same 90-day retention as task records and are automatically deleted after that period. ## What the agent does -When a task is submitted, the agent: +### New task (`new_task`) + +When a `new_task` is submitted, the agent: 1. Clones the repository into an isolated workspace 2. Creates a branch named `bgagent//` @@ -502,6 +566,38 @@ When a task is submitted, the agent: The PR title follows conventional commit format (e.g., `feat(auth): add OAuth2 login flow`). +### PR iteration (`pr_iteration`) + +When a `pr_iteration` task is submitted, the agent: + +1. Clones the repository into an isolated workspace +2. Checks out the existing PR branch (fetched from the remote) +3. Installs dependencies via `mise install` and runs an initial build +4. Loads repo-level project configuration if present +5. Reads the review feedback (inline comments, conversation comments, and the PR diff) +6. Addresses the feedback with focused changes +7. Runs the build and tests (`mise run build`) +8. Commits and pushes to the existing PR branch +9. Posts a summary comment on the PR describing what was addressed + +The agent does **not** create a new PR — it updates the existing one in place. The PR's branch, title, and description remain unchanged; the agent adds commits and a comment summarizing its work. + +### PR review (`pr_review`) + +When a `pr_review` task is submitted, the agent: + +1. Clones the repository into an isolated workspace +2. Checks out the existing PR branch (fetched from the remote) +3. Installs dependencies via `mise install` and runs an initial build (informational only — build failures do not block the review) +4. Loads repo-level project configuration if present +5. Reads the PR context (diff, description, existing comments) and analyzes the changes +6. Leverages repository memory context (codebase patterns, past episodes) when available +7. Composes structured findings using a defined comment format: type (comment / question / issue / good_point), severity for issues (minor / medium / major / critical), title, description, proposed fix, and a ready-to-use AI prompt for addressing each finding +8. Posts the review via the GitHub Reviews API (`gh api repos/{repo}/pulls/{pr_number}/reviews`) as a single batch review +9. Posts a summary conversation comment on the PR + +The agent operates in **read-only mode** — it does not modify any files, create commits, or push changes. The `Write` and `Edit` tools are not available during `pr_review` tasks. + ## Viewing logs Each task record includes a `logs_url` field with a direct link to filtered CloudWatch logs. You can get this URL from the task status output or from the `GET /tasks/{task_id}` API response. diff --git a/docs/src/content/docs/design/Agent-harness.md b/docs/src/content/docs/design/Agent-harness.md index e043b3a..effda15 100644 --- a/docs/src/content/docs/design/Agent-harness.md +++ b/docs/src/content/docs/design/Agent-harness.md @@ -18,7 +18,7 @@ Many AI assistants include an embedded agent harness. Those products provide bui The agent harness runs **inside the compute environment** (e.g. AgentCore Runtime MicroVM). The platform orchestrates the task and **hydrates context** (user message, GitHub issue, system instructions); the harness receives the assembled prompt and runs the **agent loop** (reason, plan, call tools, repeat) until the task is done or the session ends. -- **Behavioral contract** — The platform defines **what** the agent should do (commit often, create PR, branch naming, error handling, etc.) via the **system prompt**, which is assembled by the context hydration step and passed into the session. The harness is the **execution framework**; it does not define policy. See the architecture and planning docs for the full agent behavioral contract. Deterministic hooks run to execute steps. +- **Behavioral contract** — The platform defines **what** the agent should do via the **system prompt**, which is selected by task type and assembled in the agent container. The system prompt is structured as a shared base template (`agent/prompts/base.py`) with per-task-type workflow sections: `new_task` (create branch, implement, create PR), `pr_iteration` (read review feedback, address, push to existing branch, comment on PR), and `pr_review` (read-only analysis of PR changes, post structured review comments via the GitHub Reviews API). The harness is the **execution framework**; it does not define policy. See the architecture and planning docs for the full agent behavioral contract. Deterministic hooks run to execute steps. - **Execution model** — Tasks are **fully unattended** and **one-shot**: the user submits a task, the harness runs to completion or failure with no mid-task human interaction. The harness must support long-running execution (hours) and a single continuous loop. On AgentCore Runtime, the harness entrypoint must not block (the agent loop runs in a separate thread so the health ping can respond); the platform or harness adapter is responsible for that pattern. **Important:** The agent thread uses `asyncio.run()` with the stdlib asyncio event loop. The uvicorn server is configured with `--loop asyncio` to avoid uvloop, which conflicts with subprocess SIGCHLD handling when multiple event loops run in different threads. - **Result** — The agent does not call back to the platform; it follows the contract (push work, create PR) and exits. The platform infers success or failure from the PR and branch state via the GitHub API. diff --git a/docs/src/content/docs/design/Api-contract.md b/docs/src/content/docs/design/Api-contract.md index 9a3c145..1586441 100644 --- a/docs/src/content/docs/design/Api-contract.md +++ b/docs/src/content/docs/design/Api-contract.md @@ -126,7 +126,9 @@ POST /v1/tasks |---|---|---|---| | `repo` | String | Yes | GitHub repository in `owner/repo` format. | | `issue_number` | Number | No | GitHub issue number. If provided, the issue title, body, and comments are fetched during context hydration. | -| `task_description` | String | No | Free-text task description. At least one of `issue_number` or `task_description` must be provided. | +| `task_description` | String | No | Free-text task description. At least one of `issue_number`, `task_description`, or `pr_number` must be provided. | +| `task_type` | String | No | Task type: `new_task` (default), `pr_iteration`, or `pr_review`. When `pr_iteration`, the agent iterates on an existing PR. When `pr_review`, the agent performs a read-only review and posts structured comments. | +| `pr_number` | Number | No | Pull request number to iterate on or review. Required when `task_type` is `pr_iteration` or `pr_review`; rejected otherwise. For `pr_iteration`, the agent checks out the PR's branch, reads review feedback, addresses it, and pushes back. For `pr_review`, the agent checks out the PR's branch, analyzes changes read-only, and posts a structured review. | | `max_turns` | Number | No | Maximum agent turns (1–500). Controls how many reasoning/tool-call iterations the agent can perform. Defaults to 100 if omitted. | | `max_budget_usd` | Number | No | Maximum cost budget in USD (0.01–100). When reached, the agent stops regardless of remaining turns. If omitted, no budget limit is applied (turn limit and session timeout still apply). | | `attachments` | Array | No | Multi-modal attachments (images, files). See Attachments schema below. | @@ -173,18 +175,22 @@ POST /v1/tasks "task_id": "01HYX...", "status": "SUBMITTED", "repo": "org/myapp", + "task_type": "new_task", "issue_number": 42, + "pr_number": null, "branch_name": "bgagent/01HYX.../fix-auth-bug", "created_at": "2025-03-15T10:30:00Z" } } ``` +For `pr_iteration` and `pr_review` tasks, `branch_name` is initially set to `pending:pr_resolution` and resolved to the PR's `head_ref` during context hydration. + **Error responses:** | Status | Code | Condition | |---|---|---| -| `400` | `VALIDATION_ERROR` | Missing required fields, invalid repo format, no task description or issue, invalid `max_turns` (not an integer or outside 1–500 range), invalid `max_budget_usd` (not a number or outside 0.01–100 range). | +| `400` | `VALIDATION_ERROR` | Missing required fields, invalid repo format, no task description or issue or PR number, invalid `task_type`, `pr_number` provided without `task_type: 'pr_iteration'` or `'pr_review'`, `pr_number` missing when `task_type` is `pr_iteration` or `pr_review`, invalid `max_turns` (not an integer or outside 1–500 range), invalid `max_budget_usd` (not a number or outside 0.01–100 range). | | `401` | `UNAUTHORIZED` | Missing or invalid auth token. | | `409` | `DUPLICATE_TASK` | Idempotency key matches an existing task (returns the existing task in `data`). | | `422` | `REPO_NOT_ONBOARDED` | Repository is not registered with the platform. Repos are onboarded via CDK deployment (`Blueprint` construct), not via a runtime API. See [REPO_ONBOARDING.md](/design/repo-onboarding). | @@ -214,7 +220,9 @@ GET /v1/tasks/{task_id} "task_id": "01HYX...", "status": "RUNNING", "repo": "org/myapp", + "task_type": "new_task", "issue_number": 42, + "pr_number": null, "task_description": "Fix the authentication bug in the login flow", "branch_name": "bgagent/01HYX.../fix-auth-bug", "session_id": "sess-uuid", @@ -235,6 +243,8 @@ GET /v1/tasks/{task_id} | Field | Type | Description | |---|---|---| +| `task_type` | String | Task type: `new_task`, `pr_iteration`, or `pr_review`. | +| `pr_number` | Number or null | Pull request number being iterated on or reviewed. Only set for `pr_iteration` and `pr_review` tasks. | | `max_turns` | Number or null | Maximum agent turns for this task. Always present in the response — reflects the effective value (user-specified or platform default of 100). | | `max_budget_usd` | Number or null | Maximum cost budget in USD for this task. Null if no budget limit was specified. | @@ -274,7 +284,9 @@ GET /v1/tasks "task_id": "01HYX...", "status": "RUNNING", "repo": "org/myapp", + "task_type": "new_task", "issue_number": 42, + "pr_number": null, "task_description": "Fix the authentication bug...", "branch_name": "bgagent/01HYX.../fix-auth-bug", "pr_url": null, @@ -391,7 +403,7 @@ GET /v1/tasks/{task_id}/events **Event types** (see [OBSERVABILITY.md](/design/observability) for the full list): -**Fixed event types:** `task_created`, `admission_passed`, `admission_rejected`, `preflight_failed`, `hydration_started`, `hydration_complete`, `session_started`, `session_ended`, `pr_created`, `task_completed`, `task_failed`, `task_cancelled`, `task_timed_out` +**Fixed event types:** `task_created`, `admission_passed`, `admission_rejected`, `preflight_failed`, `hydration_started`, `hydration_complete`, `session_started`, `session_ended`, `pr_created`, `pr_updated`, `task_completed`, `task_failed`, `task_cancelled`, `task_timed_out` **Step-level event types** (from the blueprint framework): The orchestrator emits events for each pipeline step following the pattern `{step_name}_{started|completed|failed}`. For built-in steps these overlap with the fixed types above (e.g. `hydration_started`). For custom Lambda steps (see [REPO_ONBOARDING.md](/design/repo-onboarding)), the step name is user-defined (e.g. `sast-scan_started`, `sast-scan_completed`, `prepare-environment_failed`). Step event `metadata` includes `StepOutput.metadata` from the step execution. @@ -543,7 +555,7 @@ Creates a task via webhook. Uses HMAC-SHA256 authentication instead of Cognito J POST /v1/webhooks/tasks ``` -**Request body:** Same as `POST /v1/tasks` (see [Create task](#create-task)). +**Request body:** Same as `POST /v1/tasks` (see [Create task](#create-task)), including `task_type` and `pr_number` fields. **Required headers:** @@ -570,7 +582,7 @@ HMAC verification is performed by the handler (not the authorizer) because API G | Status | Code | Condition | |---|---|---| -| `400` | `VALIDATION_ERROR` | Missing required fields, invalid repo format, no task description or issue, invalid `max_turns`, invalid `max_budget_usd`. | +| `400` | `VALIDATION_ERROR` | Missing required fields, invalid repo format, no task description or issue or PR number, invalid `task_type`, invalid `pr_number`, invalid `max_turns`, invalid `max_budget_usd`. | | `401` | `UNAUTHORIZED` | Missing webhook headers, webhook not found, revoked, or invalid signature. | | `409` | `DUPLICATE_TASK` | Idempotency key matches an existing task. | @@ -605,6 +617,7 @@ Rate limit status is communicated via response headers (see Standard response he | `WEBHOOK_NOT_FOUND` | 404 | Webhook does not exist or belongs to a different user. | | `WEBHOOK_ALREADY_REVOKED` | 409 | Webhook is already revoked. | | `REPO_NOT_ONBOARDED` | 422 | Repository is not registered with the platform. Repos are onboarded via CDK deployment, not via a runtime API. There are no `/v1/repos` endpoints. | +| `PR_NOT_FOUND_OR_CLOSED` | 422 | For `pr_iteration` and `pr_review` tasks: the specified PR does not exist, is not open, or is not accessible with the configured GitHub token. Checked during the orchestrator's pre-flight step. | | `INVALID_STEP_SEQUENCE` | 500 | The blueprint's step sequence is invalid (missing required steps or incorrect ordering). This indicates a CDK configuration error that slipped past synth-time validation. Visible via `GET /v1/tasks/{id}` as `error_code`. See [REPO_ONBOARDING.md](/design/repo-onboarding#step-sequence-validation). | | `RATE_LIMIT_EXCEEDED` | 429 | User exceeded rate limit. | | `INTERNAL_ERROR` | 500 | Unexpected server error. Includes `request_id` for support. | diff --git a/docs/src/content/docs/design/Architecture.md b/docs/src/content/docs/design/Architecture.md index d7d9df3..9504b89 100644 --- a/docs/src/content/docs/design/Architecture.md +++ b/docs/src/content/docs/design/Architecture.md @@ -206,10 +206,13 @@ Each concept has a **source-of-truth document** and one or more documents that r | Adaptive model router | ROADMAP.md (Iter 5) | COST_MODEL.md | | Capability-based security | ROADMAP.md (Iter 5) | SECURITY.md | | Live session replay | ROADMAP.md (Iter 4) | API_CONTRACT.md | +| PR iteration task type | API_CONTRACT.md, ORCHESTRATOR.md | USER_GUIDE.md, PROMPT_GUIDE.md, SECURITY.md, AGENT_HARNESS.md | +| PR review task type | API_CONTRACT.md, ORCHESTRATOR.md | USER_GUIDE.md, PROMPT_GUIDE.md, SECURITY.md, AGENT_HARNESS.md | ### Per-repo model selection Different tasks and repos may benefit from different models. The `model_id` field in the blueprint config (see [ORCHESTRATOR.md](/design/orchestrator)) allows per-repo overrides. Suggested defaults: -- **Implementation tasks:** Claude Sonnet 4 (good balance of quality and cost) -- **PR review tasks:** Claude Haiku (fast, cheap — review is read-only analysis) +- **Implementation tasks (`new_task`):** Claude Sonnet 4 (good balance of quality and cost) +- **PR iteration tasks (`pr_iteration`):** Claude Sonnet 4 (needs to understand review feedback and make code changes — similar complexity to implementation) +- **PR review tasks (`pr_review`):** Claude Haiku (fast, cheap — review is read-only analysis) - **Complex/critical repos:** Claude Opus 4 (highest quality, highest cost — opt-in per repo) \ No newline at end of file diff --git a/docs/src/content/docs/design/Orchestrator.md b/docs/src/content/docs/design/Orchestrator.md index e4e01d3..4443aa4 100644 --- a/docs/src/content/docs/design/Orchestrator.md +++ b/docs/src/content/docs/design/Orchestrator.md @@ -170,7 +170,7 @@ See the Admission control section for details. Validates that the task is allowe #### Step 2: Context hydration (deterministic) -See the Context hydration section for details. Assembles the agent's prompt from multiple sources: user message, GitHub issue (title, body, comments), memory (relevant past context), repo configuration (system prompt template, rules), and platform defaults. The output is a fully assembled prompt and system prompt, ready to pass to the compute session. +See the Context hydration section for details. Assembles the agent's prompt from multiple sources depending on task type. For `new_task`: user message, GitHub issue (title, body, comments), memory, repo configuration, and platform defaults. For `pr_iteration`: PR metadata, review comments, diff summary, and optional user instructions. An additional **pre-flight** sub-step verifies PR accessibility when `pr_number` is set (see [preflight.ts](../../cdk/src/handlers/shared/preflight.ts)). The output is a fully assembled prompt, ready to pass to the compute session. #### Step 3: Session start and agent execution (deterministic start + agentic execution) @@ -250,20 +250,26 @@ Admission control runs immediately after the input gateway dispatches a "create Context hydration assembles the agent's user prompt from multiple sources. It runs as a deterministic step in the orchestrator Lambda after admission control and before session start. The goal is to perform I/O-bound work (GitHub API calls, Secrets Manager lookups) *before* expensive agent compute is consumed, enabling fast failure when external APIs are unavailable. -### Current implementation (Iteration 3a) +### Current implementation (Iteration 3a+) The orchestrator's `hydrateAndTransition()` function calls `hydrateContext()` (`src/handlers/shared/context-hydration.ts`) which: 1. **Resolves the GitHub token** from Secrets Manager (if `GITHUB_TOKEN_SECRET_ARN` is configured). The token is cached in a module-level variable with a 5-minute TTL for Lambda execution context reuse. -2. **Fetches the GitHub issue** (title, body, comments) via the GitHub REST API using native `fetch()` — if `issue_number` is present on the task and a token is available. +2. **Fetches external context** based on task type: + - **`new_task`**: Fetches the GitHub issue (title, body, comments) via the GitHub REST API if `issue_number` is present. + - **`pr_iteration`** / **`pr_review`**: Fetches the pull request context via `fetchGitHubPullRequest()` — four parallel calls: three REST API calls (PR metadata, conversation comments, changed files) plus one GraphQL query for inline review comments. The GraphQL query filters out resolved review threads at fetch time so the agent only sees unresolved feedback. PR metadata includes title, body, head/base refs, and state; the diff summary covers changed files. The PR's `head_ref` is stored as `resolved_branch_name` and `base_ref` as `resolved_base_branch` on the hydrated context. These are used by the orchestrator to update the task record's `branch_name` from the placeholder `pending:pr_resolution` to the actual PR branch. For `pr_review`, if no `task_description` is provided, a default review instruction is used. 3. **Enforces a token budget** on the combined context. Uses a character-based heuristic (~4 chars per token). Default budget: 100K tokens (configurable via `USER_PROMPT_TOKEN_BUDGET` environment variable). When the budget is exceeded, oldest comments are removed first. The `truncated` flag is set in the result. -4. **Assembles the user prompt** — a structured markdown document containing Task ID, Repository, GitHub Issue section (title, body, comments), and Task section (user's task description or default instruction). The format mirrors the Python `assemble_prompt()` in `agent/entrypoint.py` for cross-language consistency. -5. **Returns a `HydratedContext` object** containing `version`, `user_prompt`, `issue`, `sources`, `token_estimate`, and `truncated`. +4. **Assembles the user prompt** based on task type: + - **`new_task`**: A structured markdown document with Task ID, Repository, GitHub Issue section, and Task section. The format mirrors the Python `assemble_prompt()` in `agent/entrypoint.py`. + - **`pr_iteration`**: Assembled by `assemblePrIterationPrompt()` — includes PR metadata (number, title, body), the diff summary (changed files and patches), review comments (inline and conversation), and optional user instructions from `task_description`. +5. **Returns a `HydratedContext` object** containing `version`, `user_prompt`, `issue`, `sources`, `token_estimate`, `truncated`, and for `pr_iteration`/`pr_review` tasks: `resolved_branch_name` and `resolved_base_branch`. The hydrated context is passed to the agent as a new `hydrated_context` field in the invocation payload, alongside the existing legacy fields (`repo_url`, `task_id`, `branch_name`, `issue_number`, `prompt`). The agent checks for `hydrated_context` with `version == 1`; if present, it uses the pre-assembled `user_prompt` directly and skips in-container GitHub fetching and prompt assembly. If absent (e.g. during a deployment rollout or when the secret ARN isn't configured), the agent falls back to its existing behavior. **Graceful degradation:** If any step fails (Secrets Manager unavailable, GitHub API error, network timeout), the orchestrator proceeds with whatever context is available. The worst case is a minimal prompt with just the task ID and repository — the agent can still attempt its own GitHub fetch as a fallback via the legacy `issue_number` field. +**PR iteration branch resolution:** After hydration, if `resolved_branch_name` is present on the hydrated context, the orchestrator updates the task record's `branch_name` in DynamoDB from the placeholder (`pending:pr_resolution`) to the PR's actual `head_ref`. This ensures the task record always reflects the real branch name that the agent will push to. + ### Hydration events The orchestrator emits two task events during hydration: @@ -281,7 +287,9 @@ We evaluated routing GitHub API calls through AgentCore Gateway (with the GitHub 2. **Repo configuration (Iteration 3+).** Per-repo rules, instructions, or context loaded from the onboarding store. This can include static artifacts discovered during onboarding (e.g. content from `.cursor/rules`, `CLAUDE.md`, `CONTRIBUTING.md`) and dynamic artifacts generated by the onboarding pipeline (e.g. codebase summaries, dependency graphs). See [REPO_ONBOARDING.md](/design/repo-onboarding). -3. **GitHub issue context.** If the task references a GitHub issue: fetch the issue title, body, and comments via the GitHub REST API. **Now done in the orchestrator** (`fetchGitHubIssue` in `src/handlers/shared/context-hydration.ts`), not in the agent container. +3. **GitHub issue context** (`new_task`). If the task references a GitHub issue: fetch the issue title, body, and comments via the GitHub REST API. **Now done in the orchestrator** (`fetchGitHubIssue` in `src/handlers/shared/context-hydration.ts`), not in the agent container. + +3b. **Pull request context** (`pr_iteration`, `pr_review`). If the task references a PR (`pr_number` set): fetch the PR metadata, conversation comments, and changed files via REST API, and inline review comments via GraphQL (which filters out resolved threads at fetch time) — four parallel calls total via `fetchGitHubPullRequest()`. The PR's `head_ref` and `base_ref` are extracted for branch resolution. Review comments and diff are formatted into the user prompt so the agent understands the feedback to address. 4. **User message.** The free-text task description provided by the user (via CLI `--task` flag or equivalent). May supplement or replace the issue context. @@ -293,36 +301,71 @@ We evaluated routing GitHub API calls through AgentCore Gateway (with the GitHub The orchestrator assembles one artifact during hydration: -- **User prompt.** Assembled by `assembleUserPrompt()` in the orchestrator from the issue context and user message. Format: `Task ID: {id}\nRepository: {repo}\n\n## GitHub Issue #{n}: {title}\n...\n\n## Task\n\n{description}`. This mirrors the Python `assemble_prompt()` function. +- **User prompt.** Assembled differently based on task type: + - **`new_task`**: `assembleUserPrompt()` — Format: `Task ID: {id}\nRepository: {repo}\n\n## GitHub Issue #{n}: {title}\n...\n\n## Task\n\n{description}`. This mirrors the Python `assemble_prompt()` function. + - **`pr_iteration`**: `assemblePrIterationPrompt()` — Format: `Task ID: {id}\nRepository: {repo}\n\n## Pull Request #{n}: {title}\n\n{body}\n\n### Changed Files\n...\n\n### Review Comments\n...\n\n## Additional Instructions\n\n{description}`. This provides the agent with the full PR context, diff summary, and reviewer feedback. + - **`pr_review`**: Uses `assemblePrIterationPrompt()` (same format as `pr_iteration`). If no task description is provided, defaults to "Review this pull request. Follow the workflow in your system instructions." -The system prompt is **not** assembled in the orchestrator — it remains in the agent container because it depends on `setup_repo()` output (`{setup_notes}` placeholder). In the target state, additional sections may be injected: repo-specific rules, memory-derived insights. +The system prompt is **not** assembled in the orchestrator — it remains in the agent container because it depends on `setup_repo()` output (`{setup_notes}` placeholder). The agent selects the appropriate system prompt template based on `task_type`: the `new_task` workflow (understand → implement → test → commit → create PR), the `pr_iteration` workflow (understand feedback → address → test → push → comment on PR), or the `pr_review` workflow (analyze changes → compose findings → post review comments → post summary). In the target state, additional sections may be injected: repo-specific rules, memory-derived insights. ### Payload contract ``` Legacy: { repo_url, task_id, branch_name, issue_number?, prompt? } -Current: { repo_url, task_id, branch_name, issue_number?, prompt?, hydrated_context } +Current: { repo_url, task_id, branch_name, issue_number?, prompt?, task_type, pr_number?, base_branch?, hydrated_context } +``` + +For `new_task` (default): +```json +{ + "repo_url": "owner/repo", + "task_id": "01HYX...", + "branch_name": "bgagent/01HYX.../fix-auth-bug", + "task_type": "new_task", + "hydrated_context": { + "version": 1, + "user_prompt": "Task ID: ...\nRepository: ...\n\n## GitHub Issue #42: ...", + "issue": { "number": 42, "title": "...", "body": "...", "comments": [...] }, + "sources": ["issue", "task_description"], + "token_estimate": 1250, + "truncated": false + } +} ``` -Where `hydrated_context`: +For `pr_iteration`: ```json { - "version": 1, - "user_prompt": "Task ID: ...\nRepository: ...\n\n## GitHub Issue #42: ...", - "issue": { "number": 42, "title": "...", "body": "...", "comments": [...] }, - "sources": ["issue", "task_description"], - "token_estimate": 1250, - "truncated": false + "repo_url": "owner/repo", + "task_id": "01HYX...", + "branch_name": "feature/my-branch", + "task_type": "pr_iteration", + "pr_number": 42, + "base_branch": "main", + "hydrated_context": { + "version": 1, + "user_prompt": "Task ID: ...\nRepository: ...\n\n## Pull Request #42: ...\n\n### Review Comments\n...", + "sources": ["pr_context", "task_description"], + "token_estimate": 3400, + "truncated": false, + "resolved_branch_name": "feature/my-branch", + "resolved_base_branch": "main" + } } ``` +The `branch_name` for `pr_iteration` and `pr_review` tasks is the PR's `head_ref` (resolved during hydration), not a generated `bgagent/...` branch. The `base_branch` field is populated from the PR's `base_ref` so the agent knows the merge target. + ### Token budget The orchestrator enforces a token budget on the user prompt before assembly: - **Estimation heuristic:** `Math.ceil(text.length / 4)` (~4 characters per token). - **Default budget:** 100,000 tokens (configurable via `USER_PROMPT_TOKEN_BUDGET` CDK prop / environment variable). -- **Truncation strategy:** When the combined estimated token count (issue body + comments + task description) exceeds the budget, oldest comments are removed first. If still over budget after removing all comments, the issue body and task description are kept as-is (they are assumed to be essential). The `truncated` flag is set in the hydrated context metadata. +- **Truncation strategy:** Differs by task type: + - **`new_task`:** When the combined estimated token count (issue body + comments + task description) exceeds the budget, oldest comments are removed first. If still over budget after removing all comments, the issue body and task description are kept as-is (they are assumed to be essential). + - **`pr_iteration`/`pr_review`:** When the assembled PR prompt exceeds the budget, oldest issue comments are trimmed first (conversation comments on the PR), then oldest review comments (inline code review comments). The PR metadata, diff summary, and user instructions are preserved. + - The `truncated` flag is set in the hydrated context metadata when truncation occurs. - The agent harness handles its own context compaction during the run for multi-turn conversations. --- @@ -795,9 +838,11 @@ The primary table for task state. DynamoDB. | `user_id` | String | Cognito sub or mapped platform user ID. | | `status` | String | Current state (see state machine). | | `repo` | String | GitHub owner/repo (e.g. `org/myapp`). | +| `task_type` | String | Task type: `new_task` (default), `pr_iteration`, or `pr_review`. Determines the agent workflow (create new PR, iterate on existing PR, or review a PR). | | `issue_number` | Number (optional) | GitHub issue number, if task is issue-based. | -| `task_description` | String (optional) | Free-text task description. | -| `branch_name` | String | Agent branch: `bgagent/{task_id}/{slug}`. | +| `pr_number` | Number (optional) | Pull request number, required when task type is `pr_iteration` or `pr_review`. | +| `task_description` | String (optional) | Free-text task description. For `pr_iteration`/`pr_review`, used as additional instructions alongside PR context. | +| `branch_name` | String | Agent branch. For `new_task`: `bgagent/{task_id}/{slug}`. For `pr_iteration`/`pr_review`: initially `pending:pr_resolution`, resolved to the PR's `head_ref` during context hydration. | | `session_id` | String (optional) | AgentCore runtime session ID, set when session is started. | | `execution_id` | String (optional) | Lambda durable execution ID, set when the orchestrator starts. | | `pr_url` | String (optional) | Pull request URL, set during finalization. | diff --git a/docs/src/content/docs/design/Security.md b/docs/src/content/docs/design/Security.md index a9d9aa5..ef0222a 100644 --- a/docs/src/content/docs/design/Security.md +++ b/docs/src/content/docs/design/Security.md @@ -199,10 +199,11 @@ AgentCore Memory has **no native backup mechanism**. This is a significant gap f ## Known limitations - **Single GitHub OAuth token** — one token may be shared for all users and repos the platform can access. Any authenticated user can trigger agent work against any repo that token can access. There is no per-user repo scoping. -- **Guardrails are input-only** — the `PROMPT_ATTACK` filter screens task descriptions at submission. No guardrails are applied to model output during agent execution or to review feedback entering the memory system. +- **Guardrails are input-only** — the `PROMPT_ATTACK` filter screens task descriptions at submission. No guardrails are applied to model output during agent execution or to review feedback entering the memory system. For `pr_iteration` and `pr_review` tasks, the PR context (review comments, diff, PR body, issue comments) fetched during context hydration is **not** screened by Bedrock Guardrails — it bypasses the submission-time filter entirely. A `pr_iteration` or `pr_review` task submitted without a `task_description` receives no guardrail screening at all. - **No memory content validation** — retrieved memory records are injected into the agent's context without sanitization, injection pattern scanning, or trust scoring. This is the most critical memory security gap (OWASP ASI06). See [MEMORY.md](/design/memory#memory-security-analysis) for the full gap analysis and [ROADMAP.md Iteration 3e](/roadmap/roadmap) for the remediation plan. - **No memory provenance or integrity checking** — memory entries carry no source attribution, content hashing, or trust metadata. The system cannot distinguish agent-generated memory from externally-influenced content. - **GitHub issue content as untrusted input** — issue bodies and comments (attacker-controlled) are injected into the agent's context during hydration without trust differentiation. +- **PR review comments as untrusted input** — for `pr_iteration` and `pr_review` tasks, review comments, PR body, and conversation comments are fetched and injected into the agent's context. These are attacker-controlled inputs subject to the same prompt injection risks as issue comments. Unlike task descriptions, PR context is not screened by the Bedrock Guardrails `PROMPT_ATTACK` filter. For `pr_review` tasks, defense-in-depth mitigates the risk: the agent runs without `Write` or `Edit` tools, so even if prompt injection succeeds, the agent cannot modify files or push code. - **No memory rollback or quarantine** — the 365-day AgentCore Memory expiration is the only cleanup mechanism. There is no snapshot, rollback, or quarantine capability for suspected poisoned entries. - **No MFA** — Cognito MFA is disabled (CLI-based auth flow). Should be enabled for production deployments. - **No customer-managed KMS** — all encryption at rest uses AWS-managed keys. Customer-managed KMS can be added if required by compliance policy. diff --git a/docs/src/content/docs/index.md b/docs/src/content/docs/index.md index 4da3de0..cf481bc 100644 --- a/docs/src/content/docs/index.md +++ b/docs/src/content/docs/index.md @@ -17,7 +17,7 @@ The platform is built on AWS CDK with a modular architecture: an input gateway n ## The use case -Users submit tasks through webhooks, CLI, or Slack. For each task, the orchestrator executes the blueprint: an isolated environment is provisioned, an agent clones the target GitHub repository, creates a branch, works on the task, and opens a pull request. +Users submit tasks through webhooks, CLI, or Slack. For each task, the orchestrator executes the blueprint: an isolated environment is provisioned, an agent clones the target GitHub repository and works on it. Depending on the task type, the agent creates a new branch and opens a pull request (`new_task`), iterates on an existing PR to address review feedback (`pr_iteration`), or performs a read-only review and posts structured comments on an existing PR (`pr_review`). Key characteristics: @@ -34,6 +34,7 @@ Each task follows a **blueprint** — a hybrid workflow that mixes deterministic 1. **Admission** — the orchestrator validates the request, checks concurrency limits, and queues the task if needed. 2. **Context hydration** — the platform gathers context: task description, GitHub issue body, repo-intrinsic knowledge (CLAUDE.md, README), and memory from past tasks on the same repo. -3. **Agent execution** — the agent runs in an isolated MicroVM: clones the repo, creates a branch, edits code, commits, runs tests and lint. The orchestrator polls for completion without blocking compute. -4. **Finalization** — the orchestrator infers the result (PR created or not), runs optional validation (lint, tests), extracts learnings into memory, and updates task status. +3. **Pre-flight** — fail-closed readiness checks verify GitHub API reachability and repository access before consuming compute. Doomed tasks fail fast with a clear reason (`GITHUB_UNREACHABLE`, `REPO_NOT_FOUND_OR_NO_ACCESS`) instead of burning runtime. +4. **Agent execution** — the agent runs in an isolated MicroVM with persistent session storage for select caches: clones the repo, creates a branch, edits code, commits, runs tests and lint. The orchestrator polls for completion without blocking compute. +5. **Finalization** — the orchestrator infers the result (PR created or not), runs optional validation (lint, tests), extracts learnings into memory, and updates task status. diff --git a/docs/src/content/docs/roadmap/Roadmap.md b/docs/src/content/docs/roadmap/Roadmap.md index affbbe0..3e8b1d6 100644 --- a/docs/src/content/docs/roadmap/Roadmap.md +++ b/docs/src/content/docs/roadmap/Roadmap.md @@ -164,7 +164,7 @@ These practices apply continuously across iterations and are not treated as one- - **Tier 2 — Code quality analysis** — Static analysis of the agent's diff against code quality principles: DRY (duplicated code detection), SOLID violations, design pattern adherence, complexity metrics (cyclomatic, cognitive), naming conventions, and repo-specific style rules (from onboarding config). Implemented as an LLM-based review step or a combination of static analysis tools (e.g. SonarQube rules, custom linters) and LLM judgment. Produces structured findings (severity, location, rule, suggestion) that the agent can act on in a fix cycle. Findings below a configurable severity threshold are advisory (included in the PR as comments) rather than blocking. - **Tier 3 — Risk and blast radius analysis** — Analyze the scope and impact of the agent's changes to detect unintended side effects in other parts of the codebase. Includes: dependency graph analysis (what modules/functions consume the changed code), change surface area (number of files, lines, and modules touched), semantic impact assessment (does the change alter public APIs, shared types, configuration, or database schemas), and regression risk scoring. Produces a **risk level** (low / medium / high / critical) attached to the PR as a label and included in the validation report. High-risk changes may require explicit human approval before merge (foundation for the HITL approval mode in Iteration 6). The risk level considers: number of downstream dependents affected, whether the change touches shared infrastructure or core abstractions, test coverage of the affected area, and whether the change introduces new external dependencies. - **PR risk level and validation report** — Every agent-created PR includes a structured **validation report** (as a PR comment or check run) summarizing: Tier 1 results (pass/fail per tool), Tier 2 findings (code quality issues by severity), Tier 3 risk assessment (risk level, blast radius summary, affected modules). The PR is labeled with the computed risk level (`risk:low`, `risk:medium`, `risk:high`, `risk:critical`). Risk level is persisted in the task record for evaluation and trending. See [EVALUATION.md](/design/evaluation#pr-risk-level). -- **Other task types: PR review** — Support at least one additional task type beyond "implement from issue": **review pull request** (read-only or comment-only). The agent reads the PR diff, runs analysis (tests, lint, code review heuristics), and posts review comments. This uses a different blueprint (no branch creation, no PR creation — just analysis and comments) and a different system prompt. It validates that the platform is not hardwired to a single task type. +- [x] **Other task types: PR review and PR-iteration** — Support additional task types beyond "implement from issue": **iterate on pull request** (`pr_iteration`) reads review comments and addresses them (implement changes, push updates, post summary). **Review pull request** (`pr_review`) is a read-only task type where the agent analyzes a PR's changes and posts structured review comments via the GitHub Reviews API. The `pr_review` agent runs without `Write` or `Edit` tools (defense-in-depth), skips `ensure_committed` and push, and treats build status as informational only. Each review comment uses a structured format: type (comment/question/issue/good_point), severity for issues (minor/medium/major/critical), title, description with memory attribution, proposed fix, and a ready-to-use AI prompt. The CLI exposes `--review-pr ` (mutually exclusive with `--pr`). - **Multi-modal input** — Accept text and images (or other modalities) in the task payload; pass through to the agent. Gateway and schema support it; agent harness supports it where available. Primary use case: screenshots of bugs, UI mockups, or design specs attached to issues. **Builds on Iteration 3b:** Memory is operational; this iteration changes the orchestrator blueprint (tiered validation pipeline, new task type) and broadens the input schema. These are independently testable from memory. @@ -292,7 +292,7 @@ Deep research identified **9 memory-layer security gaps** in the current archite - **Iteration 2** — Production orchestrator, API contract, task management (list/status/cancel), durable execution, observability, threat model, network isolation, basic cost guardrails, CI/CD. - **Iteration 3a** — Repo onboarding, DNS Firewall (domain-level egress filtering), webhook trigger, GitHub Actions, per-repo customization (prompt from repo), data retention, turn/iteration caps, cost budget caps, user prompt guide, agent harness improvements (turn budget, default branch, safety net, lint, softened conventions), operator dashboard, WAF, model invocation logging, input length limits. - **Iteration 3b** ✅ — Memory Tier 1 (repo knowledge, task episodes), insights, agent self-feedback, prompt versioning, per-prompt commit attribution. CDK L2 construct with named semantic + episodic strategies using namespace templates (`/{actorId}/knowledge/`, `/{actorId}/episodes/{sessionId}/`), fail-open memory load/write, orchestrator fallback episode, SHA-256 prompt hashing, git trailer attribution. -- **Iteration 3c** — Per-repo GitHub App credentials, orchestrator pre-flight checks (fail-closed before session start), persistent session storage for select caches (AgentCore Runtime `/mnt/workspace` mount for npm/Claude config; mise/uv/repo on local disk due to FUSE `flock()` limitation), pre-execution task risk classification (model/limits/approval policy selection), tiered validation pipeline (tool validation, code quality analysis, post-execution risk/blast radius analysis), PR risk level, PR review task type, multi-modal input. +- **Iteration 3c** — Per-repo GitHub App credentials, orchestrator pre-flight checks (fail-closed before session start), persistent session storage for select caches (AgentCore Runtime `/mnt/workspace` mount for npm/Claude config; mise/uv/repo on local disk due to FUSE `flock()` limitation), pre-execution task risk classification (model/limits/approval policy selection), tiered validation pipeline (tool validation, code quality analysis, post-execution risk/blast radius analysis), PR risk level, PR review task type (`pr_review` — read-only structured review with tool restriction, defense-in-depth enforcement, CLI `--review-pr` flag), multi-modal input. - **Iteration 3d** — Review feedback memory loop (Tier 2), PR outcome tracking, evaluation pipeline (basic). - **Iteration 3e** — Memory security and integrity: input hardening (content sanitization, provenance tagging, integrity hashing), trust-aware retrieval (trust scoring, temporal decay, guardian validation), detection and response (anomaly detection, circuit breaker, quarantine, rollback), advanced protections (write-ahead validation, behavioral drift detection, cryptographic provenance, red teaming). Addresses OWASP ASI06 (Memory & Context Poisoning). - **Iteration 3bis** (hardening) — Orchestrator IAM grant for Memory (was silently AccessDenied), memory schema versioning (`schema_version: "2"`), Python repo format validation, severity-aware error logging in Python memory, narrowed entrypoint try-catch, orchestrator fallback episode observability, conditional writes in agent task_state.py (ConditionExpression guards), orchestrator Lambda error alarm (CloudWatch, retryAttempts: 0), concurrency counter reconciliation (scheduled Lambda, drift correction), multi-AZ NAT documentation (already configurable), Python unit tests (pytest), entrypoint decomposition (4 extracted subfunctions), dual prompt assembly deprecation docstring, graceful thread drain in server.py (shutdown hook + atexit), dead QUEUED state removal (8 states, 4 active). diff --git a/docs/src/content/docs/user-guide/Prompt-guide.md b/docs/src/content/docs/user-guide/Prompt-guide.md index 7a37ec1..6ee217d 100644 --- a/docs/src/content/docs/user-guide/Prompt-guide.md +++ b/docs/src/content/docs/user-guide/Prompt-guide.md @@ -19,10 +19,11 @@ This guide covers how to write effective task descriptions, common anti-patterns When you submit a task, the platform does not pass your input directly to the agent. Instead, it goes through a **context hydration** step — a distinct phase in the task lifecycle (you'll see the task status change to `HYDRATING`) where the platform fetches external data and assembles the full prompt on your behalf. During hydration: - If you provided `--issue`, the platform calls the GitHub API to fetch the issue title, body, and comments. -- Your task description, the issue content, and task metadata are combined into a single **user prompt**. -- If the assembled prompt exceeds the token budget, older issue comments are trimmed to fit. +- If you provided `--pr`, the platform fetches the PR metadata, conversation comments, and changed files via the REST API, and inline review comments via the GraphQL API — four parallel calls. Resolved review threads are filtered out at fetch time so the agent only sees unresolved feedback. +- Your task description, the issue/PR content, and task metadata are combined into a single **user prompt**. +- If the assembled prompt exceeds the token budget, older comments are trimmed to fit. -The hydrated prompt is then passed to the agent alongside a fixed **system prompt**. Understanding this assembly helps you write better descriptions — you control what goes in, but the platform decides the final shape. +The hydrated prompt is then passed to the agent alongside a **system prompt** selected by task type. For `new_task`, the system prompt instructs the agent to create a branch, implement changes, and open a new PR. For `pr_iteration`, it instructs the agent to read review feedback, address it, push to the existing branch, and comment on the PR. For `pr_review`, it instructs the agent to analyze the PR's changes and post structured review comments without modifying code. Understanding this assembly helps you write better descriptions — you control what goes in, but the platform decides the final shape. ### What the agent receives @@ -47,9 +48,31 @@ Repository: owner/repo [your task description, if provided] ``` +For `pr_iteration` tasks (when using `--pr`) and `pr_review` tasks (when using `--review-pr`), the user prompt has a different structure: + +``` +Task ID: bgagent-01HYX... +Repository: owner/repo + +## Pull Request #42: Fix login timeout on slow connections +[PR body] + +### Changed Files +- src/auth.ts (+12, -3) +- src/middleware.ts (+5, -1) + +### Review Comments +**@alice** (src/auth.ts:88): The timeout should be configurable... +**@bob** (general): Consider adding a retry mechanism... + +## Additional Instructions +[your task description, if provided] +``` + The user prompt includes: - **Task ID** and **Repository** — always present. -- **GitHub Issue** (title, body, and comments) — included when you use `--issue`. +- **GitHub Issue** (title, body, and comments) — included when you use `--issue` (`new_task`). +- **Pull Request context** (title, body, diff, review comments) — included when you use `--pr` (`pr_iteration`) or `--review-pr` (`pr_review`). - **Task description** — included when you use `--task`. ### Token budget @@ -128,16 +151,24 @@ Both are active simultaneously. Blueprint overrides are part of the system promp ## Choosing the right input mode -You must provide at least one of `--issue` or `--task` (or both). +You must provide at least one of `--issue`, `--task`, `--pr`, or `--review-pr`. | Mode | When to use | Example | |---|---|---| | `--issue` only | The GitHub issue is well-written with clear requirements, reproduction steps, and acceptance criteria. | `bgagent submit --repo owner/repo --issue 42` | | `--task` only | Ad-hoc task not tied to an issue, or the issue doesn't exist yet. | `bgagent submit --repo owner/repo --task "Add rate limiting to the /search endpoint"` | | `--issue` + `--task` | The issue exists but needs clarification, scope narrowing, or additional instructions. | `bgagent submit --repo owner/repo --issue 42 --task "Focus only on the timeout in the OAuth flow. Don't change the retry logic."` | +| `--pr` only | A PR has review feedback that needs addressing. The agent reads the diff, review comments, and pushes fixes. | `bgagent submit --repo owner/repo --pr 42` | +| `--pr` + `--task` | A PR has review feedback, and you want to provide additional instructions or scope the work. | `bgagent submit --repo owner/repo --pr 42 --task "Focus on the null check Alice flagged in the auth module"` | +| `--review-pr` only | You want a structured code review of an existing PR. The agent reads the changes and posts review comments without modifying code. | `bgagent submit --repo owner/repo --review-pr 42` | +| `--review-pr` + `--task` | You want a focused review of specific aspects of a PR. | `bgagent submit --repo owner/repo --review-pr 42 --task "Focus on security issues and error handling"` | **When to combine both:** Use `--issue` + `--task` when you want the agent to see the full issue context (including comments from other contributors) but need to override or narrow the scope. Your `--task` text appears after the issue content, so it acts as the final instruction. +**PR iteration:** Use `--pr` when a reviewer has left feedback on an existing PR. The agent checks out the PR's branch, reads all review comments and the current diff, makes targeted changes to address the feedback, and pushes back to the same branch. The `--task` flag is optional but useful for narrowing scope (e.g., "Only address the security concern, not the style nits"). + +**PR review:** Use `--review-pr` when you want the agent to analyze a PR and post structured review comments without modifying any code. The agent reads the full source files, runs the build for analysis, and posts findings using a structured format (type, severity, description, proposed fix, AI prompt). The `--task` flag is optional but useful for focusing the review (e.g., "Focus on security issues"). + ## Writing effective task descriptions ### Describe the end state, not the steps @@ -249,6 +280,8 @@ The `--max-turns` flag (API field: `max_turns`) controls how many agent turns (m | Bug fix with clear reproduction | 50–100 | The agent needs to understand the issue, find the root cause, implement the fix, add tests, and verify. | | New feature (single module) | 100–200 | More exploration, implementation, and testing. Default of 100 is usually sufficient. | | Large refactoring or multi-file feature | 200–500 | Extensive codebase exploration and many file changes. Consider whether the task should be split instead. | +| PR iteration (address review feedback) | 30–100 | The agent reads the existing diff and review comments, makes targeted changes, and pushes. Typically fewer turns than a new task since the scope is narrower. | +| PR review (code review) | 30–80 | The agent reads the diff and source files, runs the build for analysis, and posts review comments. No code changes, so fewer turns needed. | If a task consistently times out or uses all turns without finishing, consider whether the task description is too broad. Splitting into smaller, focused tasks is usually more effective than increasing the turn limit. @@ -344,3 +377,22 @@ The fix should target screen widths below 768px. Use the existing breakpoint variables in src/styles/variables.css. " ``` + +### PR iteration (address review feedback) + +```bash +bgagent submit --repo acme/api-server --pr 95 +``` + +When the review feedback is broad and you want the agent to focus: + +```bash +bgagent submit --repo acme/api-server --pr 95 --task " +Address only the security concerns flagged by @alice: +- The SQL injection risk in the search query +- The missing CSRF token on the form submission + +Ignore the style suggestions for now — those will be addressed +in a follow-up. +" +``` diff --git a/docs/src/content/docs/user-guide/Task-lifecycle.md b/docs/src/content/docs/user-guide/Task-lifecycle.md index ba9d867..ddfc787 100644 --- a/docs/src/content/docs/user-guide/Task-lifecycle.md +++ b/docs/src/content/docs/user-guide/Task-lifecycle.md @@ -42,4 +42,4 @@ Each lifecycle transition is recorded as an audit event. Use the events endpoint curl "$API_URL/tasks//events" -H "Authorization: $TOKEN" ``` -Events include: `task_created`, `admission_rejected`, `preflight_failed`, `hydration_started`, `hydration_complete`, `session_started`, `task_completed`, `task_failed`, `task_cancelled`, `task_timed_out`. Event records are subject to the same 90-day retention as task records and are automatically deleted after that period. \ No newline at end of file +Events include: `task_created`, `admission_rejected`, `preflight_failed`, `hydration_started`, `hydration_complete`, `session_started`, `pr_created`, `pr_updated`, `task_completed`, `task_failed`, `task_cancelled`, `task_timed_out`. Event records are subject to the same 90-day retention as task records and are automatically deleted after that period. \ No newline at end of file diff --git a/docs/src/content/docs/user-guide/Using-the-cli.md b/docs/src/content/docs/user-guide/Using-the-cli.md index 7d8f117..d721420 100644 --- a/docs/src/content/docs/user-guide/Using-the-cli.md +++ b/docs/src/content/docs/user-guide/Using-the-cli.md @@ -32,6 +32,18 @@ node lib/bin/bgagent.js submit --repo owner/repo --issue 42 # From a text description node lib/bin/bgagent.js submit --repo owner/repo --task "Add input validation to the /users POST endpoint" +# Iterate on an existing pull request (address review feedback) +node lib/bin/bgagent.js submit --repo owner/repo --pr 42 + +# Iterate on a PR with additional instructions +node lib/bin/bgagent.js submit --repo owner/repo --pr 42 --task "Focus on the null check Alice flagged" + +# Review an existing pull request (read-only — posts structured review comments) +node lib/bin/bgagent.js submit --repo owner/repo --review-pr 55 + +# Review a PR with a specific focus area +node lib/bin/bgagent.js submit --repo owner/repo --review-pr 55 --task "Focus on security and error handling" + # Submit and wait for completion node lib/bin/bgagent.js submit --repo owner/repo --issue 42 --wait ``` @@ -58,13 +70,15 @@ Created: 2026-04-01T00:39:51.271Z | `--repo` | GitHub repository (`owner/repo`). Required. | | `--issue` | GitHub issue number. | | `--task` | Task description text. | +| `--pr` | PR number to iterate on. Sets task type to `pr_iteration`. The agent checks out the PR's branch, reads review feedback, and pushes updates. | +| `--review-pr` | PR number to review. Sets task type to `pr_review`. The agent checks out the PR's branch, analyzes changes read-only, and posts structured review comments. | | `--max-turns` | Maximum agent turns (1–500). Overrides per-repo Blueprint default. Platform default: 100. | | `--max-budget` | Maximum cost budget in USD (0.01–100). Overrides per-repo Blueprint default. No default limit. | | `--idempotency-key` | Idempotency key for deduplication. | | `--wait` | Poll until the task reaches a terminal status. | | `--output` | Output format: `text` (default) or `json`. | -At least one of `--issue` or `--task` is required. +At least one of `--issue`, `--task`, `--pr`, or `--review-pr` is required. The `--pr` and `--review-pr` flags are mutually exclusive. ### Checking task status diff --git a/docs/src/content/docs/user-guide/Using-the-rest-api.md b/docs/src/content/docs/user-guide/Using-the-rest-api.md index dd21f34..526c7b6 100644 --- a/docs/src/content/docs/user-guide/Using-the-rest-api.md +++ b/docs/src/content/docs/user-guide/Using-the-rest-api.md @@ -4,6 +4,16 @@ title: Using the REST API The Task API exposes 5 endpoints under the base URL from the `ApiUrl` stack output. +### Task types + +The platform supports three task types: + +| Type | Description | Outcome | +|---|---|---| +| `new_task` (default) | Create a new branch, implement changes, and open a new PR. | New pull request | +| `pr_iteration` | Check out an existing PR's branch, read review feedback, address it, and push updates. | Updated pull request | +| `pr_review` | Check out an existing PR's branch, analyze the changes read-only, and post a structured review. | Review comments on the PR | + ### Create a task ```bash @@ -38,6 +48,42 @@ curl -X POST "$API_URL/tasks" \ -d '{"repo": "owner/repo", "issue_number": 42}' ``` +To iterate on an existing pull request (address review feedback): + +```bash +curl -X POST "$API_URL/tasks" \ + -H "Authorization: $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"repo": "owner/repo", "task_type": "pr_iteration", "pr_number": 42}' +``` + +You can optionally include `task_description` with `pr_iteration` to provide additional instructions alongside the review feedback: + +```bash +curl -X POST "$API_URL/tasks" \ + -H "Authorization: $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"repo": "owner/repo", "task_type": "pr_iteration", "pr_number": 42, "task_description": "Focus on the null check Alice flagged in the auth module"}' +``` + +To request a read-only review of an existing pull request: + +```bash +curl -X POST "$API_URL/tasks" \ + -H "Authorization: $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"repo": "owner/repo", "task_type": "pr_review", "pr_number": 55}' +``` + +You can optionally include `task_description` with `pr_review` to focus the review on specific areas: + +```bash +curl -X POST "$API_URL/tasks" \ + -H "Authorization: $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"repo": "owner/repo", "task_type": "pr_review", "pr_number": 55, "task_description": "Focus on security implications and error handling"}' +``` + **Request body fields:** | Field | Type | Required | Description | @@ -45,6 +91,8 @@ curl -X POST "$API_URL/tasks" \ | `repo` | string | Yes | GitHub repository in `owner/repo` format | | `issue_number` | number | One of these | GitHub issue number | | `task_description` | string | is required | Free-text task description | +| `pr_number` | number | | PR number to iterate on or review (required for `pr_iteration` and `pr_review`) | +| `task_type` | string | No | `new_task` (default), `pr_iteration`, or `pr_review`. | | `max_turns` | number | No | Maximum agent turns (1–500). Overrides the per-repo Blueprint default. Platform default: 100. | | `max_budget_usd` | number | No | Maximum cost budget in USD (0.01–100). When reached, the agent stops regardless of remaining turns. Overrides the per-repo Blueprint default. If omitted, no budget limit is applied. | diff --git a/docs/src/content/docs/user-guide/Webhook-integration.md b/docs/src/content/docs/user-guide/Webhook-integration.md index 0fdf181..cd898d7 100644 --- a/docs/src/content/docs/user-guide/Webhook-integration.md +++ b/docs/src/content/docs/user-guide/Webhook-integration.md @@ -73,7 +73,7 @@ curl -X POST "$API_URL/webhooks/tasks" \ -d "$BODY" ``` -The request body is identical to `POST /v1/tasks` (same `repo`, `issue_number`, `task_description`, `max_turns`, `max_budget_usd` fields). The `Idempotency-Key` header is also supported. +The request body is identical to `POST /v1/tasks` (same `repo`, `issue_number`, `task_description`, `task_type`, `pr_number`, `max_turns`, `max_budget_usd` fields). The `Idempotency-Key` header is also supported. You can submit `pr_iteration` tasks via webhook to automate PR feedback loops, or `pr_review` tasks to trigger automated code reviews. **Example response** (same shape as a successful `POST /tasks` — `status` is `SUBMITTED`; session, PR, and cost fields are `null` until the run progresses): diff --git a/docs/src/content/docs/user-guide/What-the-agent-does.md b/docs/src/content/docs/user-guide/What-the-agent-does.md index b86341f..02f564b 100644 --- a/docs/src/content/docs/user-guide/What-the-agent-does.md +++ b/docs/src/content/docs/user-guide/What-the-agent-does.md @@ -2,7 +2,9 @@ title: What the agent does --- -When a task is submitted, the agent: +### New task (`new_task`) + +When a `new_task` is submitted, the agent: 1. Clones the repository into an isolated workspace 2. Creates a branch named `bgagent//` @@ -14,4 +16,36 @@ When a task is submitted, the agent: 8. Commits and pushes incrementally throughout 9. Creates a pull request with a summary of changes, build/test results, and decisions made -The PR title follows conventional commit format (e.g., `feat(auth): add OAuth2 login flow`). \ No newline at end of file +The PR title follows conventional commit format (e.g., `feat(auth): add OAuth2 login flow`). + +### PR iteration (`pr_iteration`) + +When a `pr_iteration` task is submitted, the agent: + +1. Clones the repository into an isolated workspace +2. Checks out the existing PR branch (fetched from the remote) +3. Installs dependencies via `mise install` and runs an initial build +4. Loads repo-level project configuration if present +5. Reads the review feedback (inline comments, conversation comments, and the PR diff) +6. Addresses the feedback with focused changes +7. Runs the build and tests (`mise run build`) +8. Commits and pushes to the existing PR branch +9. Posts a summary comment on the PR describing what was addressed + +The agent does **not** create a new PR — it updates the existing one in place. The PR's branch, title, and description remain unchanged; the agent adds commits and a comment summarizing its work. + +### PR review (`pr_review`) + +When a `pr_review` task is submitted, the agent: + +1. Clones the repository into an isolated workspace +2. Checks out the existing PR branch (fetched from the remote) +3. Installs dependencies via `mise install` and runs an initial build (informational only — build failures do not block the review) +4. Loads repo-level project configuration if present +5. Reads the PR context (diff, description, existing comments) and analyzes the changes +6. Leverages repository memory context (codebase patterns, past episodes) when available +7. Composes structured findings using a defined comment format: type (comment / question / issue / good_point), severity for issues (minor / medium / major / critical), title, description, proposed fix, and a ready-to-use AI prompt for addressing each finding +8. Posts the review via the GitHub Reviews API (`gh api repos/{repo}/pulls/{pr_number}/reviews`) as a single batch review +9. Posts a summary conversation comment on the PR + +The agent operates in **read-only mode** — it does not modify any files, create commits, or push changes. The `Write` and `Edit` tools are not available during `pr_review` tasks. \ No newline at end of file diff --git a/mise.toml b/mise.toml index 5ae316f..c42b88f 100644 --- a/mise.toml +++ b/mise.toml @@ -53,7 +53,7 @@ run = "gitleaks detect --source . --no-banner" [tasks."security:sast"] description = "SAST scan with semgrep (auto + OWASP top 10)" -run = "semgrep scan --config auto --config p/python --config p/owasp-top-ten --config p/security-audit --error --quiet ." +run = "semgrep scan --config auto --config p/python --config p/typescript --config p/owasp-top-ten --config p/security-audit --error --quiet ." [tasks."security:deps"] description = "Audit dependencies for known vulnerabilities (osv-scanner)" diff --git a/yarn.lock b/yarn.lock index 67fe6df..fd218fe 100644 --- a/yarn.lock +++ b/yarn.lock @@ -3939,9 +3939,9 @@ baseline-browser-mapping@^2.10.12: integrity sha512-BL2sTuHOdy0YT1lYieUxTw/QMtPBC3pmlJC6xk8BBYVv6vcw3SGdKemQ+Xsx9ik2F/lYDO9tqsFQH1r9PFuHKw== basic-ftp@^5.0.2: - version "5.2.0" - resolved "https://registry.yarnpkg.com/basic-ftp/-/basic-ftp-5.2.0.tgz#7c2dff63c918bde60e6bad1f2ff93dcf5137a40a" - integrity sha512-VoMINM2rqJwJgfdHq6RiUudKt2BV+FY5ZFezP/ypmwayk68+NzzAQy4XXLlqsGD4MCzq3DrmNFD/uUmBJuGoXw== + version "5.2.1" + resolved "https://registry.yarnpkg.com/basic-ftp/-/basic-ftp-5.2.1.tgz#818ba176e0e52a9e746e8576331f7e9474b94668" + integrity sha512-0yaL8JdxTknKDILitVpfYfV2Ob6yb3udX/hK97M7I3jOeznBNxQPtVvTUtnhUkyHlxFWyr5Lvknmgzoc7jf+1Q== bcp-47-match@^2.0.0: version "2.0.3"