Skip to content

Commit 6d9bc84

Browse files
jamesadevineCopilot
andcommitted
feat(compile): add execution-context plugin with PR contributor (#860)
Adds an always-on ExecContextExtension that materialises `aw-context/pr/*` on disk for PR-triggered runs, so reviewer/triage agents stop reinventing `git fetch` / `git diff` / merge-base resolution in every workflow body. Bearer is mapped only into the precompute step's env and injected into git via `GIT_CONFIG_COUNT/KEY_0/VALUE_0` (never argv); no `persistCredentials: true` and no checkout override, so the AWF-sandboxed agent never sees `SYSTEM_ACCESSTOKEN`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent ff2de84 commit 6d9bc84

13 files changed

Lines changed: 1314 additions & 8 deletions

File tree

AGENTS.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,10 @@ Every compiled pipeline runs as three sequential jobs:
6363
│ │ │ ├── github.rs # Always-on GitHub MCP extension
6464
│ │ │ ├── safe_outputs.rs # Always-on SafeOutputs MCP extension
6565
│ │ │ ├── ado_script.rs # Always-on ado-script extension (gate evaluator + runtime-import resolver, per-job downloads)
66+
│ │ │ ├── exec_context/ # Always-on execution-context extension (issue #860)
67+
│ │ │ │ ├── mod.rs # ExecContextExtension; CompilerExtension impl; contributor fan-out
68+
│ │ │ │ ├── contributor.rs # Internal ContextContributor trait + Contributor enum
69+
│ │ │ │ └── pr.rs # PrContextContributor — stages aw-context/pr/* for PR builds
6670
│ │ │ └── tests.rs # Extension integration tests
6771
│ │ ├── codemods/ # Front-matter codemods (one file per transformation)
6872
│ │ │ ├── mod.rs # Codemod struct, CODEMODS registry, runner
@@ -235,6 +239,10 @@ index to jump to the right page.
235239
Python, Node.js, .NET).
236240
- [`docs/targets.md`](docs/targets.md) — target platforms: `standalone`,
237241
`1es`, `job`, and `stage`.
242+
- [`docs/execution-context.md`](docs/execution-context.md) — built-in
243+
`aw-context/` precompute (issue #860): PR target-branch fetch,
244+
unified diff, file snapshots, base/head SHAs, configured via the
245+
`execution-context:` front-matter block.
238246
- [`docs/safe-outputs.md`](docs/safe-outputs.md) — full reference for every
239247
safe-output tool agents can use to propose actions (PRs, work items, wiki
240248
pages, comments, etc.) plus their per-agent configuration.

docs/execution-context.md

Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
# Execution Context
2+
3+
_Part of the [ado-aw documentation](../AGENTS.md)._
4+
5+
The **execution-context plugin** stages per-run context (changed files,
6+
diffs, base/head SHAs, file snapshots, metadata) on disk in a stable
7+
layout under `aw-context/` *before* the agent starts. The agent then
8+
reads these files instead of running its own `git fetch` / `git diff`
9+
plumbing.
10+
11+
This is an always-on compiler extension. There is no `tools:` entry to
12+
enable it; per-trigger contributors gate themselves based on the
13+
agent's `on:` configuration.
14+
15+
> Background and motivation: this feature was tracked in
16+
> [issue #860](https://github.com/githubnext/ado-aw/issues/860).
17+
18+
## Why this exists
19+
20+
PR-reviewer agents almost always need the same precondition: a fully
21+
fetched target branch, resolved base / head SHAs, a unified diff, and
22+
optionally pre / post snapshots of touched files. ADO's default
23+
`checkout: self` is shallow (`fetchDepth: 1`), doesn't fetch the PR
24+
target branch, and (deliberately) does not persist credentials into
25+
`.git/config` for OAuth bearer reuse. Every PR-reviewer agent has
26+
historically rebuilt the same ~120 lines of bash to work around this.
27+
28+
The execution-context plugin owns that step centrally:
29+
30+
- One canonical implementation that evolves with the framework.
31+
- Driven by ADO's predefined `System.PullRequest.*` variables — no
32+
manual ref discovery.
33+
- Inside the trust boundary: the bearer token used to fetch is
34+
scoped to the precompute step's process env and never reaches the
35+
agent container or `.git/config`.
36+
37+
## v1 contributors
38+
39+
| Contributor | Trigger | Output layout |
40+
|-------------|----------------|--------------------------|
41+
| `pr` | `on.pr` | `aw-context/pr/*` |
42+
43+
Future trigger contributors (pipeline-completion, schedule, manual)
44+
plug in via the same internal `ContextContributor` trait without
45+
breaking the agent-facing layout.
46+
47+
## Front-matter surface
48+
49+
```yaml
50+
execution-context:
51+
enabled: true # master switch; defaults to true
52+
pr: # PR contributor configuration
53+
enabled: true # defaults to true when `on.pr` is configured
54+
scope: # pathspecs scoping diff + snapshots
55+
- "src/**"
56+
- "docs/**"
57+
- ":(top,glob)*.yml"
58+
unified: 3 # `-U` lines of context for diff.patch
59+
max-diff-bytes: 524288 # truncate diff.patch beyond this many bytes
60+
snapshots: true # write head-files/ and base-files/
61+
```
62+
63+
All keys are optional. When the `execution-context:` block is omitted
64+
entirely, defaults are *"on for the triggers configured in `on:`"*.
65+
66+
### Fields
67+
68+
- **`enabled`** (`bool`, default `true`) — master switch. When `false`,
69+
no contributor runs and no `aw-context/` is staged.
70+
- **`pr.enabled`** (`bool`, default `true` when `on.pr` is set) —
71+
whether to activate the PR contributor. Set `false` to opt out on
72+
huge monorepos where the targeted fetch + diff cost is unacceptable
73+
(the agent then has to roll its own equivalent).
74+
- **`pr.scope`** (`list[string]`, default `[]` = all paths) — pathspecs
75+
passed to `git diff -- <scope>` for both `changed-files-in-scope.txt`
76+
and `diff.patch`. Sanitised at compile time.
77+
- **`pr.unified`** (`u32`, default `3`) — `-U` lines of context for
78+
`diff.patch`.
79+
- **`pr.max-diff-bytes`** (`u64`, default `524288` / 512 KiB) — cap on
80+
`diff.patch` size. When exceeded, the file ends with a literal
81+
marker line `--- TRUNCATED at <N> bytes; full diff suppressed ---`
82+
so the agent knows it is reading a partial diff.
83+
- **`pr.snapshots`** (`bool`, default `true`) — whether to write per-file
84+
pre / post snapshots under `head-files/` and `base-files/`. Disable on
85+
large changes if you only need the diff.
86+
87+
## Agent-visible layout
88+
89+
For PR-triggered builds, the precompute step stages files under
90+
`$(Build.SourcesDirectory)/aw-context/` (i.e. relative to the agent's
91+
working directory):
92+
93+
```
94+
aw-context/
95+
status.txt # OK | (errors propagate to per-contributor files)
96+
trigger.txt # pr (today; future: pipeline / schedule / manual)
97+
metadata.txt # build_id, build_reason, repository, source_branch
98+
pr/
99+
status.txt # OK | NO_PR_CONTEXT | DIFF_RESOLUTION_FAILED
100+
metadata.txt # pr_id, source_branch, target_branch, base_sha, head_sha
101+
changed-files.txt # full `git diff --name-status`
102+
changed-files-in-scope.txt # name-status restricted to `scope`
103+
diff.patch # unified diff, scoped, capped, may end with TRUNCATED marker
104+
head-files/<path> # post-PR snapshots of A/M/T/R*/C* files in scope
105+
base-files/<path> # pre-PR snapshots of D files in scope
106+
error.txt # only present when pr/status.txt != OK
107+
```
108+
109+
**Agents MUST read `aw-context/pr/status.txt` first** and act on its
110+
value:
111+
112+
- `OK``aw-context/pr/*` is fully populated. Prefer reading those
113+
files over running `git fetch` / `git diff` yourself.
114+
- `NO_PR_CONTEXT` — the build is not a PR (e.g. manual queue of a
115+
PR-triggered pipeline). Skip PR-specific logic.
116+
- `DIFF_RESOLUTION_FAILED` — the precompute step ran but could not
117+
resolve the base / head SHAs. See `aw-context/pr/error.txt` for the
118+
reason. Surface this in your output rather than silently producing
119+
an empty review.
120+
- `CONTEXT_GENERATION_FAILED` — base / head SHAs resolved, but at
121+
least one of the `git diff` commands that populates the staged
122+
files failed. The `metadata.txt` file is still trustworthy, but
123+
`changed-files.txt`, `changed-files-in-scope.txt`, or `diff.patch`
124+
may be empty or partial. See `aw-context/pr/error.txt`.
125+
126+
If `aw-context/pr/status.txt` does not exist at all (e.g. when the
127+
extension is disabled), treat it as `NO_PR_CONTEXT`.
128+
129+
## What the precompute step does
130+
131+
The PR contributor's generated bash step:
132+
133+
1. **Reads `System.PullRequest.*` from the environment.** No manual ref
134+
discovery — ADO already populates `SourceBranch`, `TargetBranch`,
135+
and `PullRequestId`. If they are missing, writes `NO_PR_CONTEXT`
136+
and exits 0.
137+
2. **Detects merge-commit shape first.** If `HEAD` has two parents
138+
(the synthetic merge commit ADO checks out for PR builds), uses
139+
`HEAD^1` / `HEAD^2` as base / head and skips the target-branch
140+
fetch entirely. Otherwise:
141+
3. **Fetches the PR target branch with progressive deepening**
142+
`--depth=200`, then `500`, then `2000`, then finally `--unshallow`.
143+
**After each successful fetch, attempts `git merge-base
144+
origin/<target> HEAD`** and continues to the next depth if it
145+
cannot resolve yet. Bounded bandwidth on the common case; covers
146+
the long-tail PR-against-old-base case. On exhaustion writes
147+
`DIFF_RESOLUTION_FAILED`.
148+
4. **Writes `metadata.txt`, `changed-files.txt`,
149+
`changed-files-in-scope.txt`, `diff.patch`.** The diff is scoped to
150+
`pr.scope` (or all paths if empty) and truncated at `pr.max-diff-bytes`
151+
with a literal marker. If any of these `git diff` invocations fails,
152+
the status becomes `CONTEXT_GENERATION_FAILED` rather than `OK`.
153+
5. **Snapshots** (when `pr.snapshots: true`) — for each in-scope file:
154+
`head-files/<path>` for `A`/`M`/`T`/`R*`/`C*` entries,
155+
`base-files/<path>` for `D` entries.
156+
6. **Writes the final status** to `pr/status.txt` and `status.txt`.
157+
158+
The step is gated by `condition: eq(variables['Build.Reason'],
159+
'PullRequest')` so it is a no-op on manual or scheduled queues of a
160+
PR-triggered pipeline.
161+
162+
## Trust boundary
163+
164+
The PR contributor must fetch the PR target branch (which the default
165+
checkout does not), but doing so requires an OAuth bearer. ado-aw
166+
preserves the Stage 1 read-only invariant with these design choices:
167+
168+
| Mechanism | Decision |
169+
|---------------------------------------------|----------|
170+
| Override `checkout: self` with `persistCredentials: true` | **Rejected.** It would write the build identity's bearer into `.git/config` inside the workspace, which is then mounted into the AWF sandbox where the agent could read and exfiltrate it. |
171+
| Override `checkout: self` with `fetchDepth: 0` | **Rejected.** Unnecessary — the precompute fetches exactly the two refs it needs. |
172+
| In-step `SYSTEM_ACCESSTOKEN` + bash bearer wrapper | **Adopted.** `SYSTEM_ACCESSTOKEN` is mapped from `$(System.AccessToken)` only into the precompute step's process env. A `git_fetch` wrapper injects `git -c http.extraheader="Authorization: bearer ${SYSTEM_ACCESSTOKEN}" fetch …`. The token lives only in the bash step's process memory and is never written to disk. |
173+
174+
After the precompute step exits, the bearer is gone from the runtime
175+
environment the agent inherits, `.git/config` contains no
176+
`http.extraheader` line, and the agent container is started by AWF
177+
with its own (read-only) MI from the ARM service connection.
178+
179+
The compile-time test `test_execution_context_pr_does_not_leak_system_accesstoken`
180+
asserts that generated YAML never contains `persistCredentials: true`,
181+
never writes to `.git/config`, and that `SYSTEM_ACCESSTOKEN` appears
182+
only in the execution-context prepare step.
183+
184+
## Migrating from a hand-rolled precompute
185+
186+
If you have an existing PR-reviewer agent with a `steps:` block that
187+
manually fetches the target branch, resolves merge-base, and emits a
188+
diff: delete that block, ensure `on.pr` is configured, and read from
189+
`aw-context/pr/*` in your agent prompt. The prompt supplement is
190+
appended automatically — you do not need to mention the layout in your
191+
own markdown body.
192+
193+
## Notes and edge cases
194+
195+
- **`AW_PR_*` env vars are not surfaced.** ado-aw's agent-env-var
196+
channel rejects ADO `$(...)` expressions for injection-defence
197+
reasons, and bouncing values through pipeline output variables
198+
introduces a second source of truth. Agents read everything from
199+
`aw-context/pr/metadata.txt`.
200+
- **No `git` / `cat` / `ls` is added to the agent's bash allow-list.**
201+
The agent reads `aw-context/*` using its normal file-reading
202+
mechanism (the `edit` tool, native copilot reads, etc.), not via
203+
shell. This avoids silently widening the bash capability surface
204+
when the user has restricted bash.
205+
- **Non-`self` checkouts in `repos:`.** v1 only diffs the `self`
206+
checkout. The PR contributor does not currently produce contexts
207+
for additional repository checkouts.
208+
- **Workspace alias.** When `workspace:` points to a non-`self` alias,
209+
`aw-context/` is still relative to `$(Build.SourcesDirectory)`
210+
i.e. the pipeline's working directory, not the workspace alias's
211+
directory.
212+
- **Ordering.** The precompute step runs after the standard
213+
`- checkout: self` and before any user `steps:`, so user `steps:`
214+
can also read `aw-context/` if needed.
215+
216+
## Compiler internals
217+
218+
- Always-on `ExecContextExtension` in
219+
`src/compile/extensions/exec_context/mod.rs` (`ExtensionPhase::Tool`).
220+
- Internal `ContextContributor` trait in `contributor.rs`. v1 ships one
221+
contributor: `PrContextContributor` in `pr.rs`.
222+
- Front-matter types: `ExecutionContextConfig` and `PrContextConfig` in
223+
`src/compile/types.rs`.
224+
- Compile tests live in `tests/compiler_tests.rs` (search for
225+
`test_execution_context_pr_*`).
226+
- The generated bash is shellchecked by `tests/bash_lint_tests.rs` via
227+
the `execution-context-agent.md` fixture.

docs/front-matter.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,16 @@ on: # trigger configuration (unified under on: key)
124124
build-reason:
125125
include: [PullRequest]
126126
expression: "eq(variables['Custom.Flag'], 'true')" # raw ADO condition
127+
execution-context: # optional execution-context plugin (see docs/execution-context.md)
128+
enabled: true # master switch; defaults to true. Set false to disable globally.
129+
pr: # PR-context contributor. Activates on PR-triggered builds when on.pr is set.
130+
enabled: true # defaults to true when on.pr is configured. Set false to opt out.
131+
scope: # pathspecs scoping the diff + snapshots
132+
- "src/**"
133+
- "docs/**"
134+
unified: 3 # `-U` lines of context for diff.patch; default: 3
135+
max-diff-bytes: 524288 # truncate diff.patch beyond this size; default: 524288 (512 KiB)
136+
snapshots: true # whether to write head-files/ and base-files/; default: true
127137
steps: # inline steps before agent runs (same job, generate context)
128138
- bash: echo "Preparing context for agent"
129139
displayName: "Prepare context"

prompts/create-ado-agentic-workflow.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -429,6 +429,8 @@ on:
429429

430430
When `on.pr` is set: the native ADO `pr:` trigger block is generated from `branches:` and `paths:`. Runtime `filters:` compile to a gate step in the Setup job that self-cancels the build when they do not match.
431431

432+
**PR-reviewer agents — DO NOT write your own precompute step.** When `on.pr` is set, the compiler automatically stages PR context (changed files, unified diff, base/head SHAs, optional file snapshots) under `aw-context/pr/*` before the agent runs. Tell the agent to read `aw-context/pr/status.txt` first, then consume `aw-context/pr/diff.patch` and `aw-context/pr/changed-files-in-scope.txt` as needed. Customise via the top-level `execution-context:` block (scope, unified context size, max diff bytes, snapshots). Full reference: [`docs/execution-context.md`](../docs/execution-context.md).
433+
432434
#### Pipeline Triggers (`on.pipeline`)
433435

434436
Trigger from another pipeline completing:
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
//! Internal `ContextContributor` trait + `Contributor` enum.
2+
//!
3+
//! The execution-context extension is itself a `CompilerExtension`
4+
//! (always-on, registered in `collect_extensions()`). Internally it
5+
//! delegates to a small set of per-trigger **context contributors**,
6+
//! each responsible for materialising one slice of `aw-context/`.
7+
//!
8+
//! v1 ships one contributor: `PrContextContributor`. Future
9+
//! contributors (pipeline-completion, schedule, manual) slot in via
10+
//! the same trait + enum without touching callers.
11+
//!
12+
//! ## Why a private trait instead of reusing `CompilerExtension`
13+
//!
14+
//! `CompilerExtension` is the public boundary between the compiler
15+
//! and runtimes/tools. Context contributors are private implementation
16+
//! detail of one extension; they share the same `CompileContext` input
17+
//! but emit a narrower output (a single prepare step + a single prompt
18+
//! supplement + a few env vars). Keeping them behind a small private
19+
//! trait avoids accidentally exposing them as user-facing extensions
20+
//! and lets us evolve the contract freely.
21+
22+
use crate::compile::extensions::CompileContext;
23+
24+
/// A unit of per-trigger execution-context generation.
25+
///
26+
/// Each contributor decides — based on `CompileContext` (front matter,
27+
/// triggers, target) — whether it activates. Activated contributors
28+
/// each emit exactly one prepare bash step (wrapped in an ADO
29+
/// `condition:` so non-matching trigger types skip with zero cost),
30+
/// plus a prompt-supplement fragment and env var declarations.
31+
pub(super) trait ContextContributor {
32+
/// Display name for diagnostics (e.g. `"pr"`).
33+
#[allow(dead_code)]
34+
fn name(&self) -> &str;
35+
36+
/// Whether this contributor activates for the given compile context.
37+
fn should_activate(&self, ctx: &CompileContext) -> bool;
38+
39+
/// Generate the prepare-step YAML (a single `- bash:` block or
40+
/// equivalent). Must include its own ADO `condition:` so the step
41+
/// no-ops on non-matching trigger types. Empty string = no step.
42+
fn prepare_step(&self, ctx: &CompileContext) -> String;
43+
44+
/// Markdown fragment to append to the agent prompt (under the
45+
/// "Execution context" supplement section). Empty = no fragment.
46+
fn prompt_fragment(&self) -> String;
47+
48+
/// Agent env vars this contributor exposes. Currently unused
49+
/// (the ado-aw env-var channel rejects ADO `$(...)` expressions,
50+
/// so all per-trigger metadata flows through files), but kept on
51+
/// the trait so a future contributor can opt in if it only needs
52+
/// literal values.
53+
#[allow(dead_code)]
54+
fn agent_env_vars(&self) -> Vec<(String, String)>;
55+
56+
/// Bash commands the agent must have on its allow-list to read
57+
/// the staged context (e.g. `cat`, `ls`). The agent itself does
58+
/// NOT need `git`, `mkdir`, etc. — those run in the precompute
59+
/// step which is outside the agent sandbox.
60+
fn required_bash_commands(&self) -> Vec<String>;
61+
}
62+
63+
/// Static-dispatch enum over all known contributors.
64+
///
65+
/// Mirrors the `Extension` enum pattern in `extensions/mod.rs`. v1
66+
/// ships `Pr`; adding a future variant requires only a new arm here
67+
/// and a registration in `ExecContextExtension::contributors()`.
68+
pub(super) enum Contributor {
69+
Pr(super::pr::PrContextContributor),
70+
}
71+
72+
impl ContextContributor for Contributor {
73+
fn name(&self) -> &str {
74+
match self {
75+
Contributor::Pr(c) => c.name(),
76+
}
77+
}
78+
fn should_activate(&self, ctx: &CompileContext) -> bool {
79+
match self {
80+
Contributor::Pr(c) => c.should_activate(ctx),
81+
}
82+
}
83+
fn prepare_step(&self, ctx: &CompileContext) -> String {
84+
match self {
85+
Contributor::Pr(c) => c.prepare_step(ctx),
86+
}
87+
}
88+
fn prompt_fragment(&self) -> String {
89+
match self {
90+
Contributor::Pr(c) => c.prompt_fragment(),
91+
}
92+
}
93+
fn agent_env_vars(&self) -> Vec<(String, String)> {
94+
match self {
95+
Contributor::Pr(c) => c.agent_env_vars(),
96+
}
97+
}
98+
fn required_bash_commands(&self) -> Vec<String> {
99+
match self {
100+
Contributor::Pr(c) => c.required_bash_commands(),
101+
}
102+
}
103+
}

0 commit comments

Comments
 (0)