Skip to content

Commit 8f3701b

Browse files
garrytanclaude
andauthored
v1.16.0.0 feat: tunnel allowlist 17→26 + canDispatchOverTunnel pure function (#1253)
* feat: extend tunnel allowlist to 26 commands + extract canDispatchOverTunnel Adds newtab, tabs, back, forward, reload, snapshot, fill, url, closetab to TUNNEL_COMMANDS (matching what cli.ts and REMOTE_BROWSER_ACCESS.md already documented). Each new command is bounded by the existing per-tab ownership check at server.ts:613-624 — scoped tokens default to tabPolicy: 'own-only' so paired agents still can't operate on tabs they don't own. Refactors the inline gate check at server.ts:1771-1783 into a pure exported function canDispatchOverTunnel(command). Same behavior as the inline check; the difference is unit-testability without HTTP. Adds BROWSE_TUNNEL_LOCAL_ONLY=1 test-mode flag that binds the second Bun.serve listener with makeFetchHandler('tunnel') on 127.0.0.1 — no ngrok needed. Production tunnel still requires BROWSE_TUNNEL=1 + valid NGROK_AUTHTOKEN. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: source-level guards + pure-function unit test + dual-listener behavioral eval Three layers of regression coverage for the tunnel allowlist: 1. dual-listener.test.ts: replaces must-include/must-exclude with exact-set equality on the 26-command literal (the prior intersection-only style let new commands sneak into the source without test updates). Adds a regex assertion that the `command !== 'newtab'` ownership exemption at server.ts:613 still exists — catches refactors that re-introduce the catch-22 from the other side. Updates the /command handler test to look for canDispatchOverTunnel(body?.command) instead of the inline check. 2. tunnel-gate-unit.test.ts (new): 53 expects covering all 26 allowed, 20 blocked, null/undefined/empty/non-string defensive handling, and alias canonicalization (e.g. 'set-content' resolves to 'load-html' which is correctly rejected since 'load-html' isn't tunnel-allowed). 3. pair-agent-tunnel-eval.test.ts (new): 4 behavioral tests that spawn the daemon under BROWSE_HEADLESS_SKIP=1 BROWSE_TUNNEL_LOCAL_ONLY=1, bind both listeners on 127.0.0.1, mint a scoped token via /pair → /connect, and assert: (a) newtab over tunnel passes the gate; (b) pair over tunnel 403s with disallowed_command:pair AND writes a denial-log entry; (c) pair over local does NOT trigger the tunnel gate (proves the gate is surface-scoped); (d) regression for the catch-22 — newtab + goto on the resulting tab does not 403 with "Tab not owned by your agent". All four tests run free under bun test (no API spend, no ngrok). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: bump tunnel allowlist count 17 -> 26 in CLAUDE.md and REMOTE_BROWSER_ACCESS.md Both docs already named the 9 new commands as remote-accessible (the operator guide's per-command sections at lines 86-119 and 168, plus cli.ts:546-586's instruction blocks). The allowlist count was the only place the drift was visible. Also corrected REMOTE_BROWSER_ACCESS.md's denied-commands list: 'eval' is in the allowlist, not the denied list — prior doc was wrong. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.21.0.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: re-version v1.21.0.0 -> v1.16.0.0 (lowest unclaimed slot) The previous bump landed at v1.21.0.0 because gstack-next-version advances past the highest claimed slot (v1.20.0.0 from #1252) rather than picking the lowest unclaimed. v1.16-v1.18 are unclaimed and v1.16.0.0 preserves monotonic version ordering on main once #1234 (v1.17), #1233 (v1.19), and #1252 (v1.20) merge after us. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): version-gate enforces collisions, allows lower-but-unclaimed slots The gate was rejecting any PR VERSION below the util's next-slot recommendation, even when the lower slot was unclaimed. This blocked PRs that legitimately want to land at an unclaimed slot below the queue max — which is what /ship should pick when the goal is monotonic version ordering on main (lower-numbered PRs landing first preserves order; the util's "advance past max claimed" semantics only optimizes for fresh runs picking unique slots, not for queue ordering on merge). New gate logic: 1. Hard-fail if PR VERSION <= base VERSION (no actual bump). 2. Hard-fail if PR VERSION exactly matches another open PR's VERSION (real collision). 3. Pass otherwise. If the PR is below the util's suggestion, emit an informational ::notice:: explaining the slot is unclaimed. The util's output stays informational — it tells fresh /ship runs what the next-up slot should be, but the gate only blocks actual conflicts. This is a strict relaxation: every PR that passed the old gate also passes the new one. Confirmed by dry-run against the current queue (4 open PRs claiming 1.17.0.0, 1.19.0.0, 1.21.1.0, 1.22.0.0): - v1.16.0.0 → pass with informational notice (unclaimed) - v1.17.0.0 → fail (collision with #1234) - v1.15.0.0 → fail (no bump from base) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent dde5510 commit 8f3701b

10 files changed

Lines changed: 489 additions & 35 deletions

CHANGELOG.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,54 @@
11
# Changelog
22

3+
## [1.16.0.0] - 2026-04-28
4+
5+
## **Paired-agent tunnel allowlist now matches what the docs already promised. Catch-22 resolved, gate is unit-testable.**
6+
7+
The visible bug: a paired remote agent over the ngrok tunnel hit 403s on `newtab`, `tabs`, `goto-on-existing-tab`, and a chain of other commands the operator docs claimed worked. The hidden bug: the v1.6.0.0 `TUNNEL_COMMANDS` allowlist was set at 17 entries while `docs/REMOTE_BROWSER_ACCESS.md`, `browse/src/cli.ts:546-586`, and the operator-facing instruction blocks all documented 26. The shipped allowlist drifted from the design intent silently for releases. This release closes the gap: 9 commands added (`newtab`, `tabs`, `back`, `forward`, `reload`, `snapshot`, `fill`, `url`, `closetab`), each bounded by the existing per-tab ownership check at `server.ts:613-624`. Scoped tokens default to `tabPolicy: 'own-only'`, so a paired agent still can't navigate, fill, or close on tabs it doesn't own — same isolation as before, just covering more verbs.
8+
9+
### The numbers that matter
10+
11+
Branch totals come from `git diff --shortstat origin/main..HEAD`. Test counts come from `bun test browse/test/dual-listener.test.ts browse/test/tunnel-gate-unit.test.ts browse/test/pair-agent-tunnel-eval.test.ts browse/test/pair-agent-e2e.test.ts` against the merged tree.
12+
13+
| Metric | Δ |
14+
|---|---|
15+
| Tunnel allowlist size | **17 → 26 commands** (+53%) |
16+
| Catch-22 resolution | `newtab``goto``back` chain works for the first time |
17+
| Gate testability | inline regex check → **pure exported `canDispatchOverTunnel()`** function |
18+
| New unit-test coverage | **53 expects** in `tunnel-gate-unit.test.ts` (allowed, blocked, null/undefined/non-string, alias canonicalization) |
19+
| New behavioral coverage | **4 tests** in `pair-agent-tunnel-eval.test.ts` running BOTH listeners locally (no ngrok) |
20+
| Source-level guard | exact-set equality against the 26-command literal + ownership-exemption regex |
21+
| All free tests | **69 pass / 0 fail** on the four touched test files |
22+
| Codex review passes | **2 outside-voice rounds** during plan mode, 6 of 7 findings incorporated |
23+
24+
### What this means for users running paired agents
25+
26+
Three things change immediately. **First**, paired agents can actually open and drive their own tab without hitting the catch-22 the prior allowlist created. `newtab` succeeds (the ownership-exemption at `server.ts:613` was always there, but the allowlist gated the entry); `goto`, `back`, `forward`, `reload`, `fill`, `closetab` all work on the just-created tab; `snapshot`, `url`, `tabs` give the agent the read-side surface needed to be useful. **Second**, the tunnel-surface gate is unit-testable now — `canDispatchOverTunnel(command)` is pure, exported from `browse/src/server.ts`, and covered by 53 expects. A future refactor that decouples the allowlist literal from the gate logic fails a free test in milliseconds. **Third**, `pair-agent-tunnel-eval.test.ts` exercises the gate end-to-end with BOTH the local and tunnel listeners bound on 127.0.0.1 (no ngrok required) so the routing decision — "this request hit the tunnel listener, run the gate; this one hit the local listener, skip the gate" — is asserted on every PR. The new `BROWSE_TUNNEL_LOCAL_ONLY=1` env var binds the second listener locally without invoking ngrok, gated to no-op outside test mode. Production tunnel still requires `BROWSE_TUNNEL=1` + a valid `NGROK_AUTHTOKEN`.
27+
28+
### Itemized changes
29+
30+
#### Added
31+
32+
- 9 new commands in `browse/src/server.ts:111-120` `TUNNEL_COMMANDS` set: `newtab`, `tabs`, `back`, `forward`, `reload`, `snapshot`, `fill`, `url`, `closetab`. The set is now exported so tests can reference the literal directly.
33+
- `canDispatchOverTunnel(command: string | undefined | null): boolean` in `browse/src/server.ts` — pure exported function. Handles non-string input, runs `canonicalizeCommand` for alias resolution, returns `TUNNEL_COMMANDS.has(canonical)`.
34+
- `BROWSE_TUNNEL_LOCAL_ONLY=1` env var in `browse/src/server.ts:2080-2104`. Test-only sibling branch to `BROWSE_TUNNEL=1` that binds the second `Bun.serve` listener via `makeFetchHandler('tunnel')` without invoking ngrok. Persists `tunnelLocalPort` to the state file for the eval to read.
35+
- `browse/test/tunnel-gate-unit.test.ts`: 53 expects covering all 26 allowed commands, 20 blocked commands (pair, unpair, cookies, setup, launch, restart, stop, tunnel-start, token-mint, etc.), null/undefined/empty/non-string defensive handling, and alias canonicalization (e.g. `set-content` resolves to `load-html` and is correctly rejected since `load-html` isn't tunnel-allowed).
36+
- `browse/test/pair-agent-tunnel-eval.test.ts`: 4 behavioral tests that spawn the daemon under `BROWSE_HEADLESS_SKIP=1 BROWSE_TUNNEL_LOCAL_ONLY=1`, bind both listeners on 127.0.0.1, mint a scoped token via the existing `/pair``/connect` ceremony, and assert: (1) `newtab` over the tunnel passes the gate; (2) `pair` over the tunnel 403s with `disallowed_command:pair` AND writes a fresh denial-log entry to `~/.gstack/security/attempts.jsonl`; (3) `pair` over the local listener does NOT trigger the tunnel gate; (4) regression test for the catch-22 — `newtab` followed by `goto` on the resulting tab does not 403 with `Tab not owned by your agent`.
37+
38+
#### Changed
39+
40+
- `browse/test/dual-listener.test.ts`: must-include + must-exclude assertions replaced with one exact-set-equality test against the 26-command literal. The intersection-only style of the prior tests let new commands sneak into the source without a corresponding test update — the bidirectional check catches it both ways. Added a regex assertion that the `command !== 'newtab'` ownership-exemption clause at `server.ts:613` still exists (catches refactors that re-introduce the catch-22 from the other side).
41+
- `browse/test/dual-listener.test.ts`: `/command` handler test updated to assert the inline `TUNNEL_COMMANDS.has(cmd)` check is now `canDispatchOverTunnel(body?.command)` — proves the gate is delegated to the pure function and not duplicated.
42+
- `docs/REMOTE_BROWSER_ACCESS.md:35,168`: bumped "17-command allowlist" to "26-command allowlist". Corrected the denied-commands list (removed `eval`, which IS in the allowlist; the prior doc was wrong).
43+
- `CLAUDE.md`: bumped the transport-layer security section's "17-command browser-driving allowlist" reference to "26-command".
44+
45+
#### For contributors
46+
47+
- The plan was reviewed under `/plan-eng-review` plus 2 sequential codex outside-voice passes during plan mode. Round-1 codex caught a doc-target mistake (we were going to update `SIDEBAR_MESSAGE_FLOW.md` instead of `REMOTE_BROWSER_ACCESS.md`) and a wrong-layer test design. Round-2 codex caught that the round-1 correction was still wrong (the chosen test harness only binds the local listener) AND that the docs promised 6 more commands than the allowlist had. All 6 of 7 substantive findings landed in the implementation; the 7th (a pre-existing `/pair-agent` `/health` probe mismatch at `cli.ts:656-668`) is logged as out of scope.
48+
- One known accepted risk: `tabs` over the tunnel returns metadata for ALL tabs in the browser, not just tabs the agent owns. The user authored the trust relationship when they paired the agent, the agent already can't read CONTENT of unowned tabs (write commands blocked, the active tab can't be switched without a `tab <id>` command that's NOT in the allowlist), and tab IDs already leak via the 403 `hint` field on disallowed `goto`. Codex noted that tightening this requires touching the ownership gate itself (the gate falls back to `getActiveTabId()` BEFORE dispatch in `server.ts:603-614`), which is materially out of scope for a catch-22 fix. Logged in the plan failure-mode table as accepted.
49+
50+
51+
352
## [1.15.0.0] - 2026-04-26
453

554
## **Real-PTY test harness ships. 11 plan-mode E2E tests, 23 unit tests, and 50K fewer tokens per invocation.**

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -258,7 +258,7 @@ through `POST /pty-session` only.
258258
**Transport-layer security** (v1.6.0.0+). When `pair-agent` starts an ngrok tunnel,
259259
the daemon binds two HTTP listeners: a local listener (127.0.0.1, full command
260260
surface, never forwarded) and a tunnel listener (locked allowlist: `/connect`,
261-
`/command` with a scoped token + 17-command browser-driving allowlist,
261+
`/command` with a scoped token + 26-command browser-driving allowlist,
262262
`/sidebar-chat`). ngrok forwards only the tunnel port. Root tokens over the tunnel
263263
return 403. SSE endpoints use a 30-minute HttpOnly `gstack_sse` cookie minted via
264264
`POST /sse-session` (never valid against `/command`). Tunnel-surface rejections go

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.15.0.0
1+
1.16.0.0

browse/src/server.ts

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -108,13 +108,31 @@ const TUNNEL_PATHS = new Set<string>([
108108
* extension-inspector state. This allowlist maps to the eng-review decision
109109
* logged in the CEO plan for sec-wave v1.6.0.0.
110110
*/
111-
const TUNNEL_COMMANDS = new Set<string>([
111+
export const TUNNEL_COMMANDS = new Set<string>([
112+
// Original 17
112113
'goto', 'click', 'text', 'screenshot',
113114
'html', 'links', 'forms', 'accessibility',
114115
'attrs', 'media', 'data',
115116
'scroll', 'press', 'type', 'select', 'wait', 'eval',
117+
// Tab + navigation primitives operator docs and CLI hints already promised
118+
'newtab', 'tabs', 'back', 'forward', 'reload',
119+
// Read/inspect/write operators paired agents need to be useful
120+
'snapshot', 'fill', 'url', 'closetab',
116121
]);
117122

123+
/**
124+
* Pure gate: returns true iff the command is reachable over the tunnel surface.
125+
* Extracted from the inline /command handler so the gate logic is unit-testable
126+
* without standing up an HTTP listener. Behavior is identical to the inline
127+
* check; the function canonicalizes the command (so aliases hit the same set)
128+
* and returns false for null/undefined input.
129+
*/
130+
export function canDispatchOverTunnel(command: string | undefined | null): boolean {
131+
if (typeof command !== 'string' || command.length === 0) return false;
132+
const cmd = canonicalizeCommand(command);
133+
return TUNNEL_COMMANDS.has(cmd);
134+
}
135+
118136
/**
119137
* Read ngrok authtoken from env var, ~/.gstack/ngrok.env, or ngrok's native
120138
* config files. Returns null if nothing found. Shared between the
@@ -1772,8 +1790,7 @@ async function start() {
17721790
// Paired remote agents drive the browser but cannot configure the
17731791
// daemon, launch new browsers, import cookies, or rotate tokens.
17741792
if (surface === 'tunnel') {
1775-
const cmd = canonicalizeCommand(body?.command);
1776-
if (!cmd || !TUNNEL_COMMANDS.has(cmd)) {
1793+
if (!canDispatchOverTunnel(body?.command)) {
17771794
logTunnelDenial(req, url, `disallowed_command:${body?.command}`);
17781795
return new Response(JSON.stringify({
17791796
error: `Command '${body?.command}' is not allowed over the tunnel surface`,
@@ -2060,6 +2077,29 @@ async function start() {
20602077
tunnelListener = null;
20612078
}
20622079
}
2080+
} else if (process.env.BROWSE_TUNNEL_LOCAL_ONLY === '1') {
2081+
// Test-only: bind the dual-listener tunnel surface on 127.0.0.1 with NO
2082+
// ngrok forwarding. Lets paid evals exercise the surface==='tunnel' gate
2083+
// without an ngrok authtoken or live network. Production tunneling still
2084+
// requires BROWSE_TUNNEL=1 + a valid authtoken above.
2085+
try {
2086+
const boundTunnel = Bun.serve({
2087+
port: 0,
2088+
hostname: '127.0.0.1',
2089+
fetch: makeFetchHandler('tunnel'),
2090+
});
2091+
tunnelServer = boundTunnel;
2092+
tunnelActive = true;
2093+
const tunnelPort = boundTunnel.port;
2094+
console.log(`[browse] Tunnel listener bound (local-only test mode) on 127.0.0.1:${tunnelPort}`);
2095+
const stateContent = JSON.parse(fs.readFileSync(config.stateFile, 'utf-8'));
2096+
stateContent.tunnelLocalPort = tunnelPort;
2097+
const tmpState = config.stateFile + '.tmp';
2098+
fs.writeFileSync(tmpState, JSON.stringify(stateContent, null, 2), { mode: 0o600 });
2099+
fs.renameSync(tmpState, config.stateFile);
2100+
} catch (err: any) {
2101+
console.error(`[browse] BROWSE_TUNNEL_LOCAL_ONLY=1 listener bind failed: ${err.message}`);
2102+
}
20632103
}
20642104
}
20652105

browse/test/dual-listener.test.ts

Lines changed: 40 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -70,17 +70,37 @@ describe('Tunnel path allowlist', () => {
7070
});
7171

7272
describe('Tunnel command allowlist', () => {
73-
test('TUNNEL_COMMANDS is a closed set of browser-driving commands only', () => {
73+
// The full closed set of commands reachable over the tunnel surface. Adding
74+
// or removing a command here means changing the literal in server.ts AND
75+
// updating this list — that double-edit is the point. A single-source
76+
// "include the items in the source" assertion would silently widen the
77+
// surface during a refactor that adds a command to server.ts without test
78+
// review. The exact-set match catches it.
79+
const EXPECTED_TUNNEL_COMMANDS = new Set([
80+
// Original 17
81+
'goto', 'click', 'text', 'screenshot',
82+
'html', 'links', 'forms', 'accessibility',
83+
'attrs', 'media', 'data',
84+
'scroll', 'press', 'type', 'select', 'wait', 'eval',
85+
// Tab + navigation primitives operator docs and CLI hints already promised
86+
'newtab', 'tabs', 'back', 'forward', 'reload',
87+
// Read/inspect/write operators paired agents need to be useful
88+
'snapshot', 'fill', 'url', 'closetab',
89+
]);
90+
91+
test('TUNNEL_COMMANDS literal matches the closed allowlist exactly (catches additions/removals without test update)', () => {
7492
const cmds = extractSetContents(SERVER_SRC, 'TUNNEL_COMMANDS');
75-
// Must include the core browser-driving commands
76-
const required = [
77-
'goto', 'click', 'text', 'screenshot', 'html', 'links',
78-
'forms', 'accessibility', 'attrs', 'media', 'data',
79-
'scroll', 'press', 'type', 'select', 'wait', 'eval',
80-
];
81-
for (const c of required) {
93+
// Both directions: anything in the source must be expected, and anything
94+
// expected must be in the source. The intersection-only style of the old
95+
// must-include / must-exclude tests let new commands sneak into the source
96+
// without a corresponding test update.
97+
for (const c of cmds) {
98+
expect(EXPECTED_TUNNEL_COMMANDS.has(c)).toBe(true);
99+
}
100+
for (const c of EXPECTED_TUNNEL_COMMANDS) {
82101
expect(cmds.has(c)).toBe(true);
83102
}
103+
expect(cmds.size).toBe(EXPECTED_TUNNEL_COMMANDS.size);
84104
});
85105

86106
test('TUNNEL_COMMANDS does NOT include daemon-configuration or bootstrap commands', () => {
@@ -89,12 +109,21 @@ describe('Tunnel command allowlist', () => {
89109
'launch', 'launch-browser', 'connect', 'disconnect',
90110
'restart', 'stop', 'tunnel-start', 'tunnel-stop',
91111
'token-mint', 'token-revoke', 'cookie-picker', 'cookie-import',
92-
'inspector-pick',
112+
'inspector-pick', 'pair', 'unpair', 'cookies', 'setup',
93113
];
94114
for (const c of forbidden) {
95115
expect(cmds.has(c)).toBe(false);
96116
}
97117
});
118+
119+
test('newtab ownership exemption preserved (catches refactors that re-introduce the catch-22)', () => {
120+
// The /command handler must skip the per-tab ownership check when the
121+
// command is `newtab`, otherwise paired agents have no way to create their
122+
// own tab — every other write command requires an owned tab, and you can't
123+
// own a tab you haven't created. The string `command !== 'newtab'` is the
124+
// contract that breaks the catch-22.
125+
expect(SERVER_SRC).toMatch(/command\s*!==\s*['"]newtab['"]/);
126+
});
98127
});
99128

100129
describe('Request handler factory', () => {
@@ -176,14 +205,14 @@ describe('GET /connect alive probe', () => {
176205
});
177206

178207
describe('/command tunnel command allowlist', () => {
179-
test('/command handler checks TUNNEL_COMMANDS when surface is tunnel', () => {
208+
test('/command handler delegates to canDispatchOverTunnel when surface is tunnel', () => {
180209
const commandBlock = sliceBetween(
181210
SERVER_SRC,
182211
"url.pathname === '/command' && req.method === 'POST'",
183212
'return handleCommand(body, tokenInfo)'
184213
);
185214
expect(commandBlock).toContain("surface === 'tunnel'");
186-
expect(commandBlock).toContain('TUNNEL_COMMANDS.has');
215+
expect(commandBlock).toContain('canDispatchOverTunnel(body?.command)');
187216
expect(commandBlock).toContain('disallowed_command');
188217
expect(commandBlock).toContain('is not allowed over the tunnel surface');
189218
expect(commandBlock).toContain('status: 403');

0 commit comments

Comments
 (0)