feat(compile): add execution-context plugin with PR contributor (#860)#865
Conversation
🔍 Rust PR ReviewSummary: Looks good overall — the trust boundary design is solid and the test coverage is thorough. One stale doc/comment inconsistency in the security-critical section warrants a fix before merge; two minor latent hazards noted below. Findings🔒 Security Concerns
|
🔍 Rust PR ReviewSummary: Well-engineered feature with a solid security design — a few correctness and documentation gaps worth addressing before merge. Findings🐛 Bugs / Logic Issues
🔒 Security Concerns
|
🔍 Rust PR ReviewSummary: Looks good overall — the trust-boundary design is solid and the test suite is thorough. Two items worth addressing before merge. Findings
|
🔍 Rust PR ReviewSummary: Looks good — well-designed, with excellent security posture and thorough test coverage. One defense-in-depth gap in a regex and a cosmetic EOF issue worth fixing before merge. Findings
|
🔍 Rust PR ReviewSummary: Well-designed with strong trust-boundary controls and excellent test coverage; one low-severity prompt injection vector worth addressing. Findings🔒 Security Concerns
🐛 Bugs / Logic Issues
|
Adds an always-on ExecContextExtension that materialises `aw-context/pr/*` on disk for PR-triggered runs, so reviewer/triage agents stop reinventing `git fetch` / `git diff` / merge-base resolution in every workflow body. Bearer is mapped only into the precompute step's env and injected into git via `GIT_CONFIG_COUNT/KEY_0/VALUE_0` (never argv); no `persistCredentials: true` and no checkout override, so the AWF-sandboxed agent never sees `SYSTEM_ACCESSTOKEN`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Aligns the execution-context plugin with gh-aw's model: the agent
does its own `git diff` against precomputed SHAs, the compiler only
handles the parts the sandbox can't (fetch+merge-base resolution,
bearer-protected).
Artefacts collapse from ~6 files to 2 (`base.sha`, `head.sha`) plus
one failure file (`error.txt`). Short identifiers (PR id / project /
repo) are interpolated directly into the prompt fragment via
`printf` calls appended to `/tmp/awf-tools/agent-prompt.md` from the
prepare step (same mechanism gh-aw uses for built-in prompt sections).
The prompt fragment lists the right `git` commands and example ADO
MCP tool calls (`repo_get_pull_request_by_id` etc.) with PR id /
project / repo pre-filled. When the precompute fails, a failure
fragment tells the agent to surface the error rather than produce an
empty review (ADO MCP calls still possible).
Auto-extends the agent's bash allow-list with 7 read-only git
commands (`git`, `git diff`, `git log`, `git show`,
`git status`, `git rev-parse`, `git symbolic-ref`) when the PR
contributor activates, so a restricted-bash agent can still inspect
the diff. Opt-out via `execution-context.pr.enabled: false`.
Dropped from v1: `status.txt` protocol, `metadata.txt`,
`changed-files*.txt`, `diff.patch` (with scope/unified/max-bytes
knobs), and `head-files/` / `base-files/` snapshots. The
`execution-context.pr` block collapses to `{ enabled }` only.
Trust boundary unchanged: `SYSTEM_ACCESSTOKEN` scoped to the
prepare step's env, `GIT_CONFIG_*` bearer injection (no argv leak),
no `persistCredentials`, no checkout override, condition gate on
`Build.Reason == 'PullRequest'`.
Issue: #860
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Four review findings from the automated Rust PR reviewer
(workflow run 27037700172).
1. Bug — inconsistent `base.sha` semantics across paths. The
synthetic-merge-commit path used `HEAD^1` (target tip), while the
progressive-deepening path used `git merge-base` (common ancestor).
`git diff $BASE..$HEAD` therefore produced different change sets
depending on ADO's checkout mode. Fixed: synthetic path now also
computes `git merge-base HEAD^1 HEAD^2`, so `BASE_SHA` is always
the common ancestor ("PR diff since branch-point").
2. Cleanup — unreachable `|| echo 0`. `wc -w` always exits 0 and
prints "0" on empty input, so the fallback never fired. Replaced
with a clean `${PARENTS:-0}` parameter-expansion default.
3. Security defence-in-depth — missing `PR_TARGET_BRANCH` validation.
`PR_ID`/`PROJECT`/`REPO` were strictly regex-validated; only
`PR_TARGET_BRANCH` was just checked for emptiness despite being
interpolated into a git refspec. Added `^[A-Za-z0-9._/-]+$` regex
guard with the same posture as the other identifiers.
4. Footgun — explicit `pr.enabled: true` widened agent bash on
non-PR agents. Without `on.pr` configured, the prepare step is dead
code (runtime `Build.Reason == 'PullRequest'` gate), but the 7 git
commands were still appended to the agent's bash allow-list at
compile time. Now `on.pr` is REQUIRED — `pr.enabled: true` without
it is treated the same as the default (inactive). Aggregate in
`ExecContextExtension::new` updated to match.
Tests:
- `test_execution_context_pr_synthetic_merge_uses_merge_base` (new)
— regression guard against the `HEAD^1`-direct-assignment shape.
- `test_execution_context_pr_target_branch_validated` (new)
— asserts the new regex guard is emitted.
- `test_execution_context_pr_enabled_true_without_on_pr_is_inactive`
(new) — prepare step and bash allow-list both suppressed when
on.pr is absent.
- All 9 `test_execution_context_pr_*` tests pass; full `cargo test`
green (1739 unit + 127 compiler + bash_lint with shellcheck).
Docs updated: synthetic-path behaviour, `PR_TARGET_BRANCH` validation,
and `on.pr`-required activation rule in `docs/execution-context.md`.
Note: the first review (run 27020920083) was against the pre-v6.2
implementation; its concerns about `git -c http.extraheader` in
docs/comments were already resolved by the v6.2 rewrite (both now
describe `GIT_CONFIG_*` env-var injection).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two additional suggestions from the latest review pass.
1. Divergence-trap unit tests for `any_contributor_active`.
The pre-computation in `ExecContextExtension::new` duplicates each
contributor's `should_activate` logic so the aggregate flag can gate
`required_bash_commands` (which receives no `CompileContext`). The
existing tests exercise this only via fixture-compile paths through
`prepare_steps`, so a future contributor that wires up
`should_activate` but forgets to OR-in the aggregate could silently
suppress its bash allow-list contributions and only be caught at
E2E time.
Added 6 focused unit tests in `src/compile/extensions/exec_context/
mod.rs::tests` that construct `ExecContextExtension::new` directly
and assert `required_bash_commands` matches `should_activate` for
every combination of:
- on.pr configured / absent
- pr.enabled default / explicit true / explicit false
- master switch on / off
A future divergence trips here at unit-test time rather than at E2E
time.
2. Hard-fail on infra failures BEFORE the soft `fail()` machinery.
Without `set -e`, a failed `mkdir -p "$AW_PR_DIR"` (e.g. read-only
workspace) would silently continue, `fail()` would itself fail to
write `error.txt`, and the step would exit 0 with nothing staged
AND no failure signal in the agent prompt.
Added an explicit `mkdir -p "$AW_PR_DIR" || { echo ...; exit 1; }`
hard-fail at the top of the prepare step, with a message that
points the operator at `BUILD_SOURCESDIRECTORY` permissions.
Unlike the merge-base soft-fail path (normal operational case),
a missing output directory always indicates a broken pipeline
configuration and warrants a hard build break.
Tests: 1745 unit (+6) + 127 compiler + bash_lint with shellcheck;
clippy clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ript bundle The PR contributor's prepare step used to embed ~190 lines of bash heredoc into the emitted YAML, with only end-to-end shellcheck for coverage. Port that work to a new `exec-context-pr.js` ado-script bundle alongside the existing `gate.js` and `import.js` bundles. What's in the bundle (scripts/ado-script/src/exec-context-pr/): - validate.ts - strict allowlist regex guards for PR id, project, repo, target branch. - git.ts - execFile wrappers + bearerEnv helper that injects GIT_CONFIG_* into the spawned git child env only. - merge-base.ts - synthetic-merge detection (parents >= 3) + progressive-deepening fetch (--depth=200/500/2000/--unshallow). - prompt.ts - success/failure prompt-fragment writers. - index.ts - orchestrator; hard-fails (exit 1) on mkdir failure, soft-fails (exit 0 + error.txt) on validation/merge-base failures. - 32 vitest unit tests + 3 e2e smoke tests against a synthetic-merge git repo. Rust side: - pr.rs shrinks from ~320 lines to ~145 lines; the emitted bash is 4 lines wrapping a single `node /tmp/ado-aw-scripts/ado-script/exec-context-pr.js` invocation. - AdoScriptExtension gains exec_context_pr_active: bool so the Agent-job install/download fires whenever either import.js or exec-context-pr.js is needed. - Shared pr_contributor_will_activate predicate keeps the new flag and ExecContextExtension::new in lockstep; cfg-aware variant lets unit tests pass a custom config without divergence. - Tests in compiler_tests.rs updated to assert the new node-invocation shape; two tests that asserted bash literals (synthetic-merge, target-branch regex) deleted - their logic is now covered by vitest. - dedupe_gate_only fixture pins execution-context.pr.enabled: false to keep the gate-only download-placement test narrow. Trust boundary improves: the bearer (SYSTEM_ACCESSTOKEN) is mapped into the Node process's env and is only propagated into the spawned git child via GIT_CONFIG_COUNT/KEY_0/VALUE_0 - never into the wrapping bash shell, never on argv, never written to .git/config. Release packaging (.github/workflows/release.yml) zips the new bundle into ado-script.zip. Docs (docs/ado-script.md, docs/execution-context.md, AGENTS.md) updated to reflect three bundles. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Five findings from the latest PR review on commit 7bcf107: 1. Security: sanitize unvalidated env values before embedding in the failure prompt fragment. A PR author with push access could craft a branch name containing CR/LF + markdown headers to inject content into the agent prompt via the validation-failure path. validate.ts now passes raw values through sanitizeForPrompt (CR/LF -> space, truncate to 80 chars with ellipsis) before building the reason string. Covers PR_ID, PROJECT, REPO, and PR_TARGET_BRANCH symmetrically. 2 new vitest cases guard the sanitization invariant. 2. Naming clarity: rename countParents -> countParentTokens in merge-base.ts. The function returns `1 + parentCount` (rev-list --parents output includes the commit SHA itself), so a normal 2-parent merge yields 3 tokens. The previous name made debug output (`parents=3`) misleading. Failure message updated to `parentTokens=N`. 3. Stale docstring: git.ts::bearerEnv mentioned the v6.2 bash `git_fetch` wrapper that no longer exists. Rewritten to drop the obsolete reference. 4. Smoke-test gap: the synthetic-merge success smoke test now cross-checks the bundle's emitted base.sha / head.sha against git's own `git rev-parse HEAD^2` / `git merge-base HEAD^1 HEAD^2` answer. Guards against silent SHA-transposition regressions. 5. Dedupe-fixture coverage gap: `dedupe_gate_only.md` pins `execution-context.pr.enabled: false` to keep its narrow contract, which left the `inlined-imports: true + on.pr + exec-context PR active` combination uncovered. Added `tests/fixtures/dedupe_exec_context_pr_only.md` and `test_exec_context_pr_only_downloads_bundle_in_agent_job_not_setup` to assert the Agent-job download fires for exec-context-pr.js alone (no gate, imports inlined). Verified: cargo build/test (1746 unit + 126 compiler_tests + all integration suites green), cargo clippy --all-targets --all-features clean, `npm run typecheck/test/build/test:smoke` (245 vitest + 6 smoke tests green). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7bcf107 to
e5777a0
Compare
🔍 Rust PR ReviewSummary: Well-engineered with good trust-boundary design — one security gap in the TypeScript failure path, one visibility inconsistency in Rust. Findings🔒 Security Concerns
|
Two findings from the latest review on commit e5777a0: 1. Security (failure-path prompt injection guard): prompt.ts::failureFragment now runs each partial identifier (prId/project/repo) through sanitizeForPrompt before interpolating into the agent prompt. Previously only the `reason` field was sanitised; the raw partial values were embedded verbatim. ADO predefined variables are infra-set, so exploitability is low, but consistent defence-in-depth matches the posture already applied to `reason`. 2 new prompt.test.ts cases guard the invariant. 2. Visibility consistency: src/compile/extensions/ado_script.rs::EXEC_CONTEXT_PR_PATH downgraded from `pub` to `pub(crate)` to match the analogous IMPORT_EVAL_PATH constant immediately above. Both are crate-internal and consumed only by sibling extension modules. Verified: cargo build + full cargo test (1745 + 126 + all integration suites green), npm typecheck + test (247 vitest tests, 6 smoke tests green). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Rust PR ReviewSummary: Solid, well-thought-out design — one logic bug introduced in Findings🐛 Bugs / Logic Issues
|
The ado-script.zip bundle is downloaded at the pipeline-host level (curl in a bash step that runs BEFORE the AWF sandbox starts), not from inside the agent container. AdoScriptExtension::required_hosts() therefore must not contribute github.com to the AWF --allow-domains list — doing so would widen the agent's network reach without a legitimate in-sandbox consumer. The bug was masked because github.com is in CORE_ALLOWED_HOSTS, but the misleading 'Only request github.com when the bundle is actually downloaded' invariant would have silently broken any locked-down deployment that trimmed the core list (and would have been wrong for the new exec-context-pr-only download case too). - required_hosts() now returns vec![] unconditionally, with a comment explaining the trust-boundary rationale - has_gate() helper removed (was only used by required_hosts()) - Tests updated: the three required_hosts_* tests now assert the empty invariant across no-consumer / gate-active / runtime-imports-active scenarios Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ationale
Two precision/clarity fixes from the latest reviewer pass:
1. pr.rs module-level Trust-boundary section overstated token isolation. SYSTEM_ACCESSTOKEN does end up in Node's process.env (the ADO env: block exports to the step process; Node inherits it). The accurate properties are: never in the agent step's env, never in argv, never logged, never written to .git/config. The GIT_CONFIG_* wrappers carrying the bearer into git's http.extraheader are what's confined to the spawned git child — clarified that in the comment.
2. PROJECT_RE in validate.ts may reject legitimate ADO project names containing '+', '(', ')', etc. This is intentional fail-closed conservatism (the precompute step is meant to degrade gracefully to the failure fragment rather than widen its escaping surface). Added a comment explaining the deliberate trade-off and pointing future maintainers at the sanitiser / refspec / URL-construction sites to re-check before relaxing it.
No behaviour change. TS typecheck + 247 vitest + cargo build all clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Rust PR ReviewSummary: Solid implementation — trust boundary is well-maintained, correctness guardrails are excellent, and the v7 migration to a TypeScript bundle is a clear improvement. A few minor issues worth addressing. Findings🐛 Bugs / Logic Issues
|
Bugs:
- merge-base.ts::resolveMergeBase now validates baseSha/headSha against /^[0-9a-f]{40}$/i before returning ok. Previously a misconfigured core.abbrev or unusual git wrapper could have silently staged abbreviated/non-hex SHAs; the agent's in-sandbox 'git diff \..\' would have errored confusingly. Now we fail closed with a clear reason. Two new vitest cases pin the abbreviated-SHA and non-hex-character paths.
Suggestions:
- pr_contributor_will_activate no longer clones execution_context — borrows the embedded config (or a stack-local default) instead. Hot path during compile.
- ContextContributor::name and ::agent_env_vars now have default implementations so future contributors don't have to write no-op boilerplate. #[allow(dead_code)] retained on the trait signatures (no caller invokes them today).
- index.ts agentPromptPath: documented AW_AGENT_PROMPT_FILE as a test-only seam, with a security note explaining the threat model (pipeline-variable write access already implied; cannot expand agent read surface) and pointing at the future hardening lever (gate on NODE_ENV).
Existing merge-base tests were updated to use real-shape 40-char hex SHAs via SHA_C/A/B/M constants — historical 'ccc'/'aaa' literals would have tripped the new SHA40 guard.
TS: 249 vitest (+2) + 6 smoke. Rust: 1745 unit + 126 compiler + integration, clippy clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@copilot resolve the merge conflicts in this pull request |
Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
Resolved by merging |
|
@copilot more conflicts to resolve! |
Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
Resolved the new merge conflicts by merging |
The trust-boundary test test_execution_context_pr_does_not_leak_system_accesstoken (added in #865) walks the compiled YAML and asserts that every step env: block declaring SYSTEM_ACCESSTOKEN is the exec-context PR prepare step. That was correct on #865's PR branch, but #873 ("default executor to System.AccessToken") landed first and added a legitimate SYSTEM_ACCESSTOKEN: $(System.AccessToken) mapping to the Stage 3 SafeOutputs executor step. The two compose into a semantic conflict on main: each PR passes its own CI, but the combined tree fails. Stage 3 runs in its own non-agent job and needs the token to apply safe outputs (PRs, work items, etc.) — this is the same allow-list documented by the sibling test test_agent_job_steps_do_not_map_system_access_token, which scopes its check to the Agent job. The cross-stage trust boundary (no SYSTEM_ACCESSTOKEN inside the AWF-sandboxed Agent step) is unaffected. Replace the single-allowed-displayName whitelist with an explicit, audited ALLOWED_DISPLAY_NAMES list covering: 1. "Stage PR execution context (aw-context/pr/*)" — owned here. 2. "Execute safe outputs (Stage 3)" — per #873. Also keep an explicit assertion that the exec-context prepare step is present so we don't silently regress to a state where the PR contributor fails to activate but the test still passes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Adds an always-on
ExecContextExtensionthat precomputes focused PR execution-context signals for the agent, appends a tailored prompt fragment, and avoids per-workflow reinvention of merge-base/shallow-clone handling.The contributor remains PR-gated, trust-boundary-safe (
SYSTEM_ACCESSTOKENonly in the precompute step), and stagesaw-context/pr/base.sha+aw-context/pr/head.shaso agents can rungit diff $BASE..$HEADdirectly in AWF.As part of resolving merge conflicts with
origin/main, extension collection was reconciled so both always-on extensions are preserved together:ExecContextExtensionAzureCliExtensionRelated
collect_extensionstests were updated to reflect the combined always-on extension set and expected counts.A follow-up merge from
origin/mainwas also applied to resolve additional conflicts, includingAGENTS.md; this follow-up did not change runtime behavior of the feature.Test plan
cargo build— clean.cargo clippy --all-targets --all-features— clean.cargo test— fails ontest_execution_context_pr_does_not_leak_system_accesstoken(pre-existing failure observed during conflict-resolution validation).cargo test test_collect_extensions_ -- --nocapture— targeted extension-order/count tests pass.parallel_validationrun: