Skip to content

Commit 2b0ae03

Browse files
jamesadevineCopilot
andcommitted
feat(compile): replace Python gate evaluator with bundled TypeScript gate.js
Migrates the trigger-filter gate evaluator from `scripts/gate-eval.py` to a bundled TypeScript artifact `scripts/gate.js` produced by a new `scripts/ado-script/` workspace. The compiler now emits a `NodeTool@0` install step and `node /tmp/ado-aw-scripts/gate.js` instead of `python3`. Architecture (variant A2 in the design walkthrough): - TypeScript workspace at scripts/ado-script/ built with @vercel/ncc. - shared/ modules (auth, ado-client, env-facts, policy state machine, vso-logger) reusable across future bundles. - gate/ entry implementing bypass, fact acquisition, 11 predicate evaluators, and self-cancel — full parity with the deleted Python evaluator (45/45 tests ported, +6 parity guards). Drift-proof codegen: - New hidden CLI subcommand `ado-aw export-gate-schema` emits a schemars-derived JSON Schema from the Rust IR types. - `npm run codegen` chains it through json-schema-to-typescript to produce src/shared/types.gen.ts. - New CI workflow .github/workflows/ado-script.yml runs codegen + `git diff --exit-code` to fail on IR/TS schema drift. Pipeline integration: - TriggerFiltersExtension prepends a NodeTool@0 step pinned to 20.x. - compile_gate_step_external invokes `node` instead of `python3`. - release.yml builds the bundle (npm ci && npm run build) before zipping scripts/, and excludes node_modules/dist/schema from the zip. - New tests/gate_e2e.rs (#[ignore]'d) compiles a real agent, extracts GATE_SPEC, runs gate.js end-to-end, and asserts SHOULD_RUN. - compiler_tests.rs assertions tightened: now check for `node '/tmp/ado-aw-scripts/gate.js'` instead of the loose `python3` match (which falsely passed via base.yml's mcpg-config validation). Cleanup: - Deleted scripts/gate-eval.py, scripts/gate-spec.schema.json, tests/gate_eval_tests.py. Documentation: - New docs/ado-script.md records the A2 decision, codegen pipeline, bundle-size budget (5 MB; gate.js is ~1.1 MB), and how to add new internal bundles (e.g. poll.js). - docs/filter-ir.md rewritten: Node evaluator, NodeTool@0 step, scripts.zip distribution. - AGENTS.md tree + tech stack updated; new entry in docs index. - ado-script-design.md added at repo root as the design walkthrough that produced the A2 decision. Validation: - 173/173 vitest tests pass (45 ports + 6 parity guards + smoke + shared-module units). - Full cargo test suite green. - cargo clippy --all-targets --all-features clean. - E2E: `cd scripts/ado-script && npm run build && cd ../.. && cargo test --test gate_e2e -- --ignored` passes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 0e7a162 commit 2b0ae03

58 files changed

Lines changed: 6206 additions & 1407 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/ado-script.yml

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
name: ado-script Workspace
2+
3+
on:
4+
pull_request:
5+
paths:
6+
- "scripts/ado-script/**"
7+
- "src/compile/filter_ir.rs"
8+
- "Cargo.toml"
9+
- "Cargo.lock"
10+
- ".github/workflows/ado-script.yml"
11+
12+
env:
13+
CARGO_TERM_COLOR: always
14+
15+
jobs:
16+
ado-script:
17+
name: Build, Test & Drift-Check
18+
runs-on: ubuntu-latest
19+
steps:
20+
- uses: actions/checkout@v4
21+
22+
- uses: dtolnay/rust-toolchain@stable
23+
24+
- uses: Swatinem/rust-cache@v2
25+
26+
- uses: actions/setup-node@v4
27+
with:
28+
node-version: "20"
29+
cache: "npm"
30+
cache-dependency-path: scripts/ado-script/package-lock.json
31+
32+
- name: Install workspace dependencies
33+
working-directory: scripts/ado-script
34+
run: npm ci
35+
36+
- name: Regenerate types from Rust IR (codegen)
37+
working-directory: scripts/ado-script
38+
run: npm run codegen
39+
40+
- name: Verify generated TypeScript is up to date
41+
run: |
42+
if ! git diff --exit-code -- scripts/ado-script/src/shared/types.gen.ts; then
43+
echo ""
44+
echo "::error::types.gen.ts is out of date with the Rust IR."
45+
echo "Run 'cd scripts/ado-script && npm run codegen' and commit the result."
46+
exit 1
47+
fi
48+
49+
- name: Run TypeScript tests
50+
working-directory: scripts/ado-script
51+
run: npm test
52+
53+
- name: Type-check
54+
working-directory: scripts/ado-script
55+
run: npm run typecheck
56+
57+
- name: Build bundle (gate.js)
58+
working-directory: scripts/ado-script
59+
run: npm run build
60+
61+
- name: Smoke-test bundle
62+
working-directory: scripts/ado-script
63+
run: npx vitest run test/smoke.test.ts
64+
65+
- name: E2E gate test
66+
run: cargo test --test gate_e2e -- --ignored --nocapture

.github/workflows/release.yml

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,11 +56,28 @@ jobs:
5656
cd target/release
5757
cp ado-aw ado-aw-linux-x64
5858
59+
- name: Set up Node.js for ado-script bundle
60+
uses: actions/setup-node@v4
61+
with:
62+
node-version: "20"
63+
64+
- name: Build ado-script TypeScript bundle (gate.js)
65+
working-directory: scripts/ado-script
66+
run: |
67+
npm ci
68+
npm run build
69+
# `npm run build` runs codegen + ncc + copies dist/gate/index.js
70+
# to ../gate.js (i.e. scripts/gate.js), which is then included in
71+
# scripts.zip by the next step.
72+
5973
- name: Package scripts bundle
6074
run: |
6175
set -euo pipefail
6276
cd scripts
63-
zip -r ../scripts.zip .
77+
zip -r ../scripts.zip . \
78+
-x "ado-script/node_modules/*" \
79+
-x "ado-script/dist/*" \
80+
-x "ado-script/schema/*"
6481
6582
- name: Upload release assets
6683
env:

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
target
22
examples/sample-agent.yml
3+
scripts/gate.js
34
*.pyc
45
__pycache__/

AGENTS.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,8 +120,8 @@ Every compiled pipeline runs as three sequential jobs:
120120
├── ado-aw-derive/ # Proc-macro crate: #[derive(SanitizeConfig)], #[derive(SanitizeContent)]
121121
├── examples/ # Example agent definitions
122122
├── scripts/ # Supporting scripts shipped as release artifacts
123-
│ ├── gate-eval.py # Python gate evaluator (data-driven filter evaluation)
124-
│ └── gate-spec.schema.json # JSON Schema for gate spec (generated from Rust types)
123+
│ ├── ado-script/ # TypeScript workspace for bundled gate.js (and future bundles)
124+
│ └── gate.js # Bundled gate evaluator (built from scripts/ado-script/, see docs/ado-script.md)
125125
├── tests/ # Integration tests and fixtures
126126
├── docs/ # Per-concept reference documentation (see index below)
127127
├── Cargo.toml # Rust dependencies
@@ -133,6 +133,7 @@ Every compiled pipeline runs as three sequential jobs:
133133
- **Language**: Rust (2024 edition) - Note: Rust 2024 edition exists and is the edition used by this project
134134
- **CLI Framework**: clap v4 with derive macros
135135
- **Error Handling**: anyhow for ergonomic error propagation
136+
- **Bundled scripts**: TypeScript + ncc (`scripts/ado-script/`) — compiled gate evaluator and future internal helpers; see [`docs/ado-script.md`](docs/ado-script.md).
136137
- **Async Runtime**: tokio with full features
137138
- **YAML Parsing**: serde_yaml
138139
- **MCP Server**: rmcp with server and transport-io features
@@ -183,6 +184,9 @@ index to jump to the right page.
183184
- [`docs/filter-ir.md`](docs/filter-ir.md) — filter expression IR
184185
specification: `Fact`/`Predicate` types, three-pass compilation (lower →
185186
validate → codegen), gate step generation, adding new filter types.
187+
- [`docs/ado-script.md`](docs/ado-script.md)`ado-script` workspace
188+
(`scripts/ado-script/`): the bundled TypeScript runtime helpers (today:
189+
`gate.js`), schemars-driven type codegen, and the A2 design decision.
186190
- [`docs/local-development.md`](docs/local-development.md) — local development
187191
setup notes.
188192

ado-script-design.md

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# Design exploration: an `actions/github-script` analog for ADO
2+
3+
> **Mode**: thought experiment. No scope committed. Goal is to map the design
4+
> space, surface trade-offs, and identify the highest-leverage entry point if
5+
> we ever pull the trigger.
6+
7+
## 1. The concern that motivates this
8+
9+
`scripts/gate-eval.py` is already 388 lines. It conflates several
10+
responsibilities:
11+
12+
1. **Spec deserialization** (base64 → JSON → dict)
13+
2. **Fact acquisition** (env vars, REST API for PR metadata, REST API for
14+
iteration changes, datetime arithmetic) — including auth, URL building,
15+
retry/timeout semantics
16+
3. **Predicate evaluation** (10 predicate types, recursive, with overnight
17+
time-window arithmetic and `_strip_ref_prefix` quirks)
18+
4. **Failure-policy state machine** (`fail_closed` / `fail_open` /
19+
`skip_dependents`, with transitive propagation through fact dependencies)
20+
5. **ADO logging-command emission** (`##vso[...]`)
21+
6. **Self-cancel** (PATCH to builds API)
22+
23+
Every new filter type forces a coordinated change across:
24+
`filter_ir.rs` (Rust IR) → JSON schema → `gate-eval.py` evaluator →
25+
fixtures → docs. The Python file has no static typing, no test harness in CI
26+
(only Rust-side spec-serialization tests), and grows with the IR.
27+
28+
There are at least two more places where a similar Python/bash blob is on the
29+
roadmap or already exists:
30+
31+
- The Stage-3 safe-output executor — currently a typed Rust binary
32+
(`src/execute.rs` + `src/safeoutputs/*.rs`). Strong story today, but every
33+
ADO interaction is hand-rolled HTTP via `reqwest`.
34+
- The agent shim & the prepare/setup steps — currently bash interleaved with
35+
ADO macro expansion.
36+
37+
The user's instinct: rather than letting `gate-eval.py` grow into a
38+
monstrosity (and rather than reinventing it for each new use case), give
39+
ado-aw a single, well-tested primitive — the way `actions/github-script`
40+
gives gh-aw its "drop in JS, get a pre-authed Octokit + context" lever.
41+
42+
## 2. What `actions/github-script` actually is
43+
44+
For grounding, the github-script contract:
45+
46+
```yaml
47+
- uses: actions/github-script@v7
48+
with:
49+
github-token: ${{ secrets.GITHUB_TOKEN }}
50+
script: |
51+
const { data } = await github.rest.issues.createComment({
52+
owner: context.repo.owner, repo: context.repo.repo,
53+
issue_number: context.issue.number, body: 'hi',
54+
});
55+
core.setOutput('comment-id', data.id);
56+
```
57+
58+
Mechanics worth copying:
59+
60+
| Property | Detail |
61+
|---|---|
62+
| Language | Node.js (single ecosystem, ncc-bundled, no `npm install` at runtime) |
63+
| Auth | Pre-injected `github` Octokit, token from input |
64+
| Context | Pre-injected `context` (event payload + repo/issue/PR shortcuts) |
65+
| Helpers | `core` (output/secrets/log), `glob`, `io`, `exec`, `fetch` |
66+
| Wrapper | `(async () => { <script> })()` — top-level `await` works |
67+
| Return | Stringified into a step output |
68+
| TS-aware | `@octokit/rest` types via JSDoc; some IDEs surface them |
69+
| Distribution | Action repo bundles all deps; runner downloads the action tarball once |
70+
71+
Mechanics that **don't** translate cleanly:
72+
73+
- GH Actions runners have Node pre-installed; ADO Microsoft-hosted agents do
74+
too, but **AWF-isolated 1ES sandboxes do not** by default. Anything we ship
75+
must either be self-contained or be installed in `prepare` before the
76+
network is locked down.
77+
- github-script has *no* notion of fail-open / skip-dependents / multi-stage
78+
trust boundaries. The gate logic isn't just "call an API"; it has a
79+
bespoke policy DSL.
80+
81+
## 3. Two distinct primitives are being conflated
82+
83+
It pays to separate these up front, because they pull in opposite directions:
84+
85+
### (A) **Internal** primitive — for the compiler to target
86+
87+
> "Stop emitting hand-rolled Python; emit calls to a single bundled binary
88+
> with a typed, declarative spec and a small evaluator surface."
89+
90+
- Audience: ado-aw maintainers
91+
- Surface: minimal, declarative, deterministic
92+
- Driver: maintainability of the *compiler output*
93+
- Examples: gate evaluator, future "wait for X" pollers, agent-stats parser
94+
95+
### (B) **User-facing** primitive — for agent authors
96+
97+
> "Give pipeline authors an `ado-script:` block in their agent file that runs
98+
> arbitrary JS with `ado` (azure-devops-node-api), `context`, `core`."
99+
100+
- Audience: humans writing `.md` agent files
101+
- Surface: rich, ergonomic, escape-hatch-friendly
102+
- Driver: power & extensibility
103+
- Examples: custom triggers, custom safe-output post-processing, ad-hoc
104+
reporting
105+
106+
These are **separate features** even if they share a runtime. Our concern
107+
about gate-eval.py becoming a monstrosity is squarely about (A). The
108+
github-script analogy points at (B). Picking one direction without naming the
109+
other is how scope creep happens.
110+
111+
## 4. Design space — variant matrix
112+
113+
Three orthogonal axes:
114+
115+
### Axis 1 — Language
116+
117+
| Option | Pros | Cons |
118+
|---|---|---|
119+
| **Node.js** (mirrors github-script directly) | `azure-devops-node-api` is the most mature SDK; ncc-bundling produces a single file; same mental model as gh-aw | New runtime dependency; AWF sandbox needs Node pre-staged in chroot; bigger binary (~30 MB bundled) |
120+
| **Python** (continues current trajectory) | Already in the chroot for gate-eval; stdlib-only is feasible (current approach); easy to embed | No first-class ADO SDK that's stdlib-only; users have to hand-roll `urllib`; weak typing means the same maintenance pain we have today |
121+
| **Embed in the Rust binary** (`ado-aw script ...` subcommand) | Strongest typing and testability; reuses `reqwest`/`anyhow` already in the binary; no new runtime to ship; same audit surface as the rest of ado-aw | Useless as a (B) user-facing primitive (no inline scripting); for (A), basically just "move gate-eval.py into Rust", which is a viable answer in itself |
122+
| **Deno / Bun** | Single-binary, sandboxed-by-default, TypeScript-first | Not in standard ADO/1ES images; one more thing to vet for OneBranch |
123+
124+
### Axis 2 — Distribution
125+
126+
| Option | Pros | Cons |
127+
|---|---|---|
128+
| **Bundled into ado-aw release artifacts** (current `gate-eval.py` model) | Versioned with the compiler; deterministic URL; CI publishes alongside binary | Each script is a separate download (already 2 artifacts; will be N) |
129+
| **Single ncc-bundled JS** (one `ado-script.js` per release) | One artifact regardless of how many internal use sites; deps frozen at build time | If we add a (B) user surface, users can't `npm install` extras |
130+
| **Inline heredoc** (today's Tier-1 inline gate) | No download, no extra failure mode | Caps at "tiny scripts"; ADO macro expansion + heredoc quoting is already painful |
131+
| **Subcommand of the ado-aw binary** | Zero extra artifacts; one auth/sanitization story | Forces (A)-only — no inline user scripts |
132+
133+
### Axis 3 — User surface
134+
135+
| Option | Description |
136+
|---|---|
137+
| **None** — internal-only (A) | Compiler emits `ado-aw script eval-gate --spec=…` (or `node ado-script.js gate ...`). User never sees it. Pure refactor of gate-eval.py. |
138+
| **Front-matter `scripts:` block** | Authors declare named scripts that run in stage 1 prepare or stage 3. Tightly typed inputs/outputs. Limited blast radius. |
139+
| **Free-form `ado-script:` step** | Mirrors github-script 1:1. Maximum power, maximum risk. Needs sanitization, prompt-injection review, allow-list of called APIs. |
140+
| **Safe-output kind** (`safe-outputs.run-script`) | The agent itself proposes a script to run; Stage 2 detection reviews it; Stage 3 executes. Symmetrical with existing safe outputs but inverts trust (agent-authored code). |
141+
142+
## 5. What changes for `gate-eval.py` under each scope
143+
144+
### Scope A1 — "Move gate-eval.py into the Rust binary"
145+
146+
- Add `ado-aw eval-gate --spec-base64=…` subcommand
147+
- Reuse `reqwest`, `serde_json`, the existing `Fact`/`Predicate` types as
148+
*runtime* types (not just IR)
149+
- Bash shim drops to: `export GATE_SPEC=…; ado-aw eval-gate`
150+
- **Trade-off**: ado-aw binary is now also a runtime dependency in the chroot
151+
(it already is — `prepare` downloads it). The big win is testability:
152+
predicate evaluation gets unit-tested in Rust, the policy state machine
153+
becomes a typed `enum`, and we lose the JSON-schema-dance.
154+
- **Risk**: every agent pipeline now invokes the ado-aw binary at runtime;
155+
any panic surfaces as a build failure. (Mitigated by `Result` discipline.)
156+
157+
### Scope A2 — "Bundle a Node ado-script and emit `node ado-script.js gate ...`"
158+
159+
- New `scripts/ado-script/` workspace with `azure-devops-node-api`
160+
- ncc-bundle to a single `ado-script.js`
161+
- Compiler emits `node /tmp/ado-aw-scripts/ado-script.js gate <base64-spec>`
162+
- **Trade-off**: best ergonomics for *future user-facing* (B) work; worst
163+
fit for the immediate problem (we already have the spec types in Rust;
164+
re-deserializing in JS just moves the pain).
165+
- **Risk**: Node version skew across hosted vs 1ES vs OneBranch images.
166+
167+
### Scope B — "User-facing ado-script:"
168+
169+
Independent question. Even if we pick A1 (Rust subcommand), we might still
170+
later add a `.md` front-matter `scripts:` block that runs Node. They're not
171+
mutually exclusive.
172+
173+
## 6. Recommendation framework (no commitment yet)
174+
175+
If the immediate pain is gate-eval.py specifically, **Scope A1** has by far
176+
the best cost/benefit ratio:
177+
178+
- Eliminates the JSON-spec round-trip (Rust IR → JSON → Python dict → eval).
179+
The `FilterCheck` enum *is* the runtime representation.
180+
- Eliminates the dual codebase and the schema-drift class of bugs.
181+
- Removes the `scripts/gate-eval.py` and `scripts/gate-spec.schema.json`
182+
release artifacts.
183+
- Keeps the door open for a future (B) primitive without prejudging it.
184+
185+
If the longer-term vision is "agent authors should be able to drop in custom
186+
ADO logic", **Scope B with bundled Node** is the right shape, but it should
187+
be approached as a deliberate user-facing feature with its own RFC — not as
188+
a back-door from the gate-eval refactor.
189+
190+
The framing the user is reacting against — "embedded Python that grows
191+
forever" — is solved by either A1 *or* A2. The github-script-shaped solution
192+
(Node + SDK + inline scripts) only pays off if we commit to (B).
193+
194+
## 7. Open questions to resolve before any implementation
195+
196+
1. **Is the chroot OK with a second invocation of the ado-aw binary at
197+
runtime?** Today it's only invoked in `prepare` (download) and as the MCP
198+
server. Promoting it to "the gate evaluator and everything else" changes
199+
its operational profile.
200+
2. **Can the existing Rust `Fact`/`Predicate`/`Policy` types be the runtime
201+
types directly, or do they leak compiler concerns (spans, diagnostics)
202+
that would have to be split?**
203+
3. **What's the ADO REST client story?** `azure-devops-node-api` for Node,
204+
nothing canonical for Python, hand-rolled `reqwest` for Rust. If we go
205+
A1, we should consolidate the ad-hoc HTTP in `safeoutputs/*.rs` against
206+
the same client.
207+
4. **Self-cancel & `##vso` emission.** These are tiny but pervasive. Worth
208+
a single `AdoLogger` + `AdoBuildClient` abstraction in whichever language
209+
we land on.
210+
5. **Failure-policy semantics.** `skip_dependents` + transitive `fail_open`
211+
propagation is *not* in any off-the-shelf SDK. It's our DSL. Whichever
212+
language we pick, this lives in our code.
213+
6. **Stage-3 trust boundary.** A user-facing (B) `ado-script` would need to
214+
live in Stage 3 (not Stage 1) to have write access. That's the same
215+
pattern as safe outputs — agent proposes, executor decides.
216+
217+
## 8. Suggested next step (if and when scope is committed)
218+
219+
Spike A1 on a single throwaway branch:
220+
221+
- Add `ado-aw eval-gate` subcommand
222+
- Move ~80% of `gate-eval.py` logic into `src/gate/eval.rs` (predicate
223+
eval + policy state machine) — re-using existing `Fact`/`Predicate`
224+
types where possible
225+
- Keep the bash shim and the JSON spec format unchanged for the spike;
226+
only the *evaluator* moves
227+
- Compare LoC, test count, and binary-size delta against today
228+
229+
That spike answers questions 1, 2, and 3 concretely without committing to
230+
any user-facing surface.
231+
232+
---
233+
234+
*This is a design note, not an implementation plan. No todos created. If you
235+
want to move forward on any of these scopes, ask for a fresh planning pass.*

0 commit comments

Comments
 (0)