|
| 1 | +# Design exploration: an `actions/github-script` analog for ADO |
| 2 | + |
| 3 | +> **Mode**: thought experiment. No scope committed. Goal is to map the design |
| 4 | +> space, surface trade-offs, and identify the highest-leverage entry point if |
| 5 | +> we ever pull the trigger. |
| 6 | +
|
| 7 | +## 1. The concern that motivates this |
| 8 | + |
| 9 | +`scripts/gate-eval.py` is already 388 lines. It conflates several |
| 10 | +responsibilities: |
| 11 | + |
| 12 | +1. **Spec deserialization** (base64 → JSON → dict) |
| 13 | +2. **Fact acquisition** (env vars, REST API for PR metadata, REST API for |
| 14 | + iteration changes, datetime arithmetic) — including auth, URL building, |
| 15 | + retry/timeout semantics |
| 16 | +3. **Predicate evaluation** (10 predicate types, recursive, with overnight |
| 17 | + time-window arithmetic and `_strip_ref_prefix` quirks) |
| 18 | +4. **Failure-policy state machine** (`fail_closed` / `fail_open` / |
| 19 | + `skip_dependents`, with transitive propagation through fact dependencies) |
| 20 | +5. **ADO logging-command emission** (`##vso[...]`) |
| 21 | +6. **Self-cancel** (PATCH to builds API) |
| 22 | + |
| 23 | +Every new filter type forces a coordinated change across: |
| 24 | +`filter_ir.rs` (Rust IR) → JSON schema → `gate-eval.py` evaluator → |
| 25 | +fixtures → docs. The Python file has no static typing, no test harness in CI |
| 26 | +(only Rust-side spec-serialization tests), and grows with the IR. |
| 27 | + |
| 28 | +There are at least two more places where a similar Python/bash blob is on the |
| 29 | +roadmap or already exists: |
| 30 | + |
| 31 | +- The Stage-3 safe-output executor — currently a typed Rust binary |
| 32 | + (`src/execute.rs` + `src/safeoutputs/*.rs`). Strong story today, but every |
| 33 | + ADO interaction is hand-rolled HTTP via `reqwest`. |
| 34 | +- The agent shim & the prepare/setup steps — currently bash interleaved with |
| 35 | + ADO macro expansion. |
| 36 | + |
| 37 | +The user's instinct: rather than letting `gate-eval.py` grow into a |
| 38 | +monstrosity (and rather than reinventing it for each new use case), give |
| 39 | +ado-aw a single, well-tested primitive — the way `actions/github-script` |
| 40 | +gives gh-aw its "drop in JS, get a pre-authed Octokit + context" lever. |
| 41 | + |
| 42 | +## 2. What `actions/github-script` actually is |
| 43 | + |
| 44 | +For grounding, the github-script contract: |
| 45 | + |
| 46 | +```yaml |
| 47 | +- uses: actions/github-script@v7 |
| 48 | + with: |
| 49 | + github-token: ${{ secrets.GITHUB_TOKEN }} |
| 50 | + script: | |
| 51 | + const { data } = await github.rest.issues.createComment({ |
| 52 | + owner: context.repo.owner, repo: context.repo.repo, |
| 53 | + issue_number: context.issue.number, body: 'hi', |
| 54 | + }); |
| 55 | + core.setOutput('comment-id', data.id); |
| 56 | +``` |
| 57 | +
|
| 58 | +Mechanics worth copying: |
| 59 | +
|
| 60 | +| Property | Detail | |
| 61 | +|---|---| |
| 62 | +| Language | Node.js (single ecosystem, ncc-bundled, no `npm install` at runtime) | |
| 63 | +| Auth | Pre-injected `github` Octokit, token from input | |
| 64 | +| Context | Pre-injected `context` (event payload + repo/issue/PR shortcuts) | |
| 65 | +| Helpers | `core` (output/secrets/log), `glob`, `io`, `exec`, `fetch` | |
| 66 | +| Wrapper | `(async () => { <script> })()` — top-level `await` works | |
| 67 | +| Return | Stringified into a step output | |
| 68 | +| TS-aware | `@octokit/rest` types via JSDoc; some IDEs surface them | |
| 69 | +| Distribution | Action repo bundles all deps; runner downloads the action tarball once | |
| 70 | + |
| 71 | +Mechanics that **don't** translate cleanly: |
| 72 | + |
| 73 | +- GH Actions runners have Node pre-installed; ADO Microsoft-hosted agents do |
| 74 | + too, but **AWF-isolated 1ES sandboxes do not** by default. Anything we ship |
| 75 | + must either be self-contained or be installed in `prepare` before the |
| 76 | + network is locked down. |
| 77 | +- github-script has *no* notion of fail-open / skip-dependents / multi-stage |
| 78 | + trust boundaries. The gate logic isn't just "call an API"; it has a |
| 79 | + bespoke policy DSL. |
| 80 | + |
| 81 | +## 3. Two distinct primitives are being conflated |
| 82 | + |
| 83 | +It pays to separate these up front, because they pull in opposite directions: |
| 84 | + |
| 85 | +### (A) **Internal** primitive — for the compiler to target |
| 86 | + |
| 87 | +> "Stop emitting hand-rolled Python; emit calls to a single bundled binary |
| 88 | +> with a typed, declarative spec and a small evaluator surface." |
| 89 | + |
| 90 | +- Audience: ado-aw maintainers |
| 91 | +- Surface: minimal, declarative, deterministic |
| 92 | +- Driver: maintainability of the *compiler output* |
| 93 | +- Examples: gate evaluator, future "wait for X" pollers, agent-stats parser |
| 94 | + |
| 95 | +### (B) **User-facing** primitive — for agent authors |
| 96 | + |
| 97 | +> "Give pipeline authors an `ado-script:` block in their agent file that runs |
| 98 | +> arbitrary JS with `ado` (azure-devops-node-api), `context`, `core`." |
| 99 | + |
| 100 | +- Audience: humans writing `.md` agent files |
| 101 | +- Surface: rich, ergonomic, escape-hatch-friendly |
| 102 | +- Driver: power & extensibility |
| 103 | +- Examples: custom triggers, custom safe-output post-processing, ad-hoc |
| 104 | + reporting |
| 105 | + |
| 106 | +These are **separate features** even if they share a runtime. Our concern |
| 107 | +about gate-eval.py becoming a monstrosity is squarely about (A). The |
| 108 | +github-script analogy points at (B). Picking one direction without naming the |
| 109 | +other is how scope creep happens. |
| 110 | + |
| 111 | +## 4. Design space — variant matrix |
| 112 | + |
| 113 | +Three orthogonal axes: |
| 114 | + |
| 115 | +### Axis 1 — Language |
| 116 | + |
| 117 | +| Option | Pros | Cons | |
| 118 | +|---|---|---| |
| 119 | +| **Node.js** (mirrors github-script directly) | `azure-devops-node-api` is the most mature SDK; ncc-bundling produces a single file; same mental model as gh-aw | New runtime dependency; AWF sandbox needs Node pre-staged in chroot; bigger binary (~30 MB bundled) | |
| 120 | +| **Python** (continues current trajectory) | Already in the chroot for gate-eval; stdlib-only is feasible (current approach); easy to embed | No first-class ADO SDK that's stdlib-only; users have to hand-roll `urllib`; weak typing means the same maintenance pain we have today | |
| 121 | +| **Embed in the Rust binary** (`ado-aw script ...` subcommand) | Strongest typing and testability; reuses `reqwest`/`anyhow` already in the binary; no new runtime to ship; same audit surface as the rest of ado-aw | Useless as a (B) user-facing primitive (no inline scripting); for (A), basically just "move gate-eval.py into Rust", which is a viable answer in itself | |
| 122 | +| **Deno / Bun** | Single-binary, sandboxed-by-default, TypeScript-first | Not in standard ADO/1ES images; one more thing to vet for OneBranch | |
| 123 | + |
| 124 | +### Axis 2 — Distribution |
| 125 | + |
| 126 | +| Option | Pros | Cons | |
| 127 | +|---|---|---| |
| 128 | +| **Bundled into ado-aw release artifacts** (current `gate-eval.py` model) | Versioned with the compiler; deterministic URL; CI publishes alongside binary | Each script is a separate download (already 2 artifacts; will be N) | |
| 129 | +| **Single ncc-bundled JS** (one `ado-script.js` per release) | One artifact regardless of how many internal use sites; deps frozen at build time | If we add a (B) user surface, users can't `npm install` extras | |
| 130 | +| **Inline heredoc** (today's Tier-1 inline gate) | No download, no extra failure mode | Caps at "tiny scripts"; ADO macro expansion + heredoc quoting is already painful | |
| 131 | +| **Subcommand of the ado-aw binary** | Zero extra artifacts; one auth/sanitization story | Forces (A)-only — no inline user scripts | |
| 132 | + |
| 133 | +### Axis 3 — User surface |
| 134 | + |
| 135 | +| Option | Description | |
| 136 | +|---|---| |
| 137 | +| **None** — internal-only (A) | Compiler emits `ado-aw script eval-gate --spec=…` (or `node ado-script.js gate ...`). User never sees it. Pure refactor of gate-eval.py. | |
| 138 | +| **Front-matter `scripts:` block** | Authors declare named scripts that run in stage 1 prepare or stage 3. Tightly typed inputs/outputs. Limited blast radius. | |
| 139 | +| **Free-form `ado-script:` step** | Mirrors github-script 1:1. Maximum power, maximum risk. Needs sanitization, prompt-injection review, allow-list of called APIs. | |
| 140 | +| **Safe-output kind** (`safe-outputs.run-script`) | The agent itself proposes a script to run; Stage 2 detection reviews it; Stage 3 executes. Symmetrical with existing safe outputs but inverts trust (agent-authored code). | |
| 141 | + |
| 142 | +## 5. What changes for `gate-eval.py` under each scope |
| 143 | + |
| 144 | +### Scope A1 — "Move gate-eval.py into the Rust binary" |
| 145 | + |
| 146 | +- Add `ado-aw eval-gate --spec-base64=…` subcommand |
| 147 | +- Reuse `reqwest`, `serde_json`, the existing `Fact`/`Predicate` types as |
| 148 | + *runtime* types (not just IR) |
| 149 | +- Bash shim drops to: `export GATE_SPEC=…; ado-aw eval-gate` |
| 150 | +- **Trade-off**: ado-aw binary is now also a runtime dependency in the chroot |
| 151 | + (it already is — `prepare` downloads it). The big win is testability: |
| 152 | + predicate evaluation gets unit-tested in Rust, the policy state machine |
| 153 | + becomes a typed `enum`, and we lose the JSON-schema-dance. |
| 154 | +- **Risk**: every agent pipeline now invokes the ado-aw binary at runtime; |
| 155 | + any panic surfaces as a build failure. (Mitigated by `Result` discipline.) |
| 156 | + |
| 157 | +### Scope A2 — "Bundle a Node ado-script and emit `node ado-script.js gate ...`" |
| 158 | + |
| 159 | +- New `scripts/ado-script/` workspace with `azure-devops-node-api` |
| 160 | +- ncc-bundle to a single `ado-script.js` |
| 161 | +- Compiler emits `node /tmp/ado-aw-scripts/ado-script.js gate <base64-spec>` |
| 162 | +- **Trade-off**: best ergonomics for *future user-facing* (B) work; worst |
| 163 | + fit for the immediate problem (we already have the spec types in Rust; |
| 164 | + re-deserializing in JS just moves the pain). |
| 165 | +- **Risk**: Node version skew across hosted vs 1ES vs OneBranch images. |
| 166 | + |
| 167 | +### Scope B — "User-facing ado-script:" |
| 168 | + |
| 169 | +Independent question. Even if we pick A1 (Rust subcommand), we might still |
| 170 | +later add a `.md` front-matter `scripts:` block that runs Node. They're not |
| 171 | +mutually exclusive. |
| 172 | + |
| 173 | +## 6. Recommendation framework (no commitment yet) |
| 174 | + |
| 175 | +If the immediate pain is gate-eval.py specifically, **Scope A1** has by far |
| 176 | +the best cost/benefit ratio: |
| 177 | + |
| 178 | +- Eliminates the JSON-spec round-trip (Rust IR → JSON → Python dict → eval). |
| 179 | + The `FilterCheck` enum *is* the runtime representation. |
| 180 | +- Eliminates the dual codebase and the schema-drift class of bugs. |
| 181 | +- Removes the `scripts/gate-eval.py` and `scripts/gate-spec.schema.json` |
| 182 | + release artifacts. |
| 183 | +- Keeps the door open for a future (B) primitive without prejudging it. |
| 184 | + |
| 185 | +If the longer-term vision is "agent authors should be able to drop in custom |
| 186 | +ADO logic", **Scope B with bundled Node** is the right shape, but it should |
| 187 | +be approached as a deliberate user-facing feature with its own RFC — not as |
| 188 | +a back-door from the gate-eval refactor. |
| 189 | + |
| 190 | +The framing the user is reacting against — "embedded Python that grows |
| 191 | +forever" — is solved by either A1 *or* A2. The github-script-shaped solution |
| 192 | +(Node + SDK + inline scripts) only pays off if we commit to (B). |
| 193 | + |
| 194 | +## 7. Open questions to resolve before any implementation |
| 195 | + |
| 196 | +1. **Is the chroot OK with a second invocation of the ado-aw binary at |
| 197 | + runtime?** Today it's only invoked in `prepare` (download) and as the MCP |
| 198 | + server. Promoting it to "the gate evaluator and everything else" changes |
| 199 | + its operational profile. |
| 200 | +2. **Can the existing Rust `Fact`/`Predicate`/`Policy` types be the runtime |
| 201 | + types directly, or do they leak compiler concerns (spans, diagnostics) |
| 202 | + that would have to be split?** |
| 203 | +3. **What's the ADO REST client story?** `azure-devops-node-api` for Node, |
| 204 | + nothing canonical for Python, hand-rolled `reqwest` for Rust. If we go |
| 205 | + A1, we should consolidate the ad-hoc HTTP in `safeoutputs/*.rs` against |
| 206 | + the same client. |
| 207 | +4. **Self-cancel & `##vso` emission.** These are tiny but pervasive. Worth |
| 208 | + a single `AdoLogger` + `AdoBuildClient` abstraction in whichever language |
| 209 | + we land on. |
| 210 | +5. **Failure-policy semantics.** `skip_dependents` + transitive `fail_open` |
| 211 | + propagation is *not* in any off-the-shelf SDK. It's our DSL. Whichever |
| 212 | + language we pick, this lives in our code. |
| 213 | +6. **Stage-3 trust boundary.** A user-facing (B) `ado-script` would need to |
| 214 | + live in Stage 3 (not Stage 1) to have write access. That's the same |
| 215 | + pattern as safe outputs — agent proposes, executor decides. |
| 216 | + |
| 217 | +## 8. Suggested next step (if and when scope is committed) |
| 218 | + |
| 219 | +Spike A1 on a single throwaway branch: |
| 220 | + |
| 221 | +- Add `ado-aw eval-gate` subcommand |
| 222 | +- Move ~80% of `gate-eval.py` logic into `src/gate/eval.rs` (predicate |
| 223 | + eval + policy state machine) — re-using existing `Fact`/`Predicate` |
| 224 | + types where possible |
| 225 | +- Keep the bash shim and the JSON spec format unchanged for the spike; |
| 226 | + only the *evaluator* moves |
| 227 | +- Compare LoC, test count, and binary-size delta against today |
| 228 | + |
| 229 | +That spike answers questions 1, 2, and 3 concretely without committing to |
| 230 | +any user-facing surface. |
| 231 | + |
| 232 | +--- |
| 233 | + |
| 234 | +*This is a design note, not an implementation plan. No todos created. If you |
| 235 | +want to move forward on any of these scopes, ask for a fresh planning pass.* |
0 commit comments