Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ bin/gstack-global-discover
.gbrain/
.context/
extension/.auth.json
# xterm assets are vendored from npm at build time; not source-of-truth.
extension/lib/xterm.js
extension/lib/xterm.css
extension/lib/xterm-addon-fit.js
.gstack-worktrees/
/tmp/
*.log
Expand Down
53 changes: 53 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,58 @@
# Changelog

## [1.14.0.0] - 2026-04-25

## **The gstack browser sidebar is now an interactive Claude Code REPL with live tab awareness.**

Open the side panel and Claude Code is right there in a real terminal. Type, watch the agent work, switch browser tabs and Claude sees the change. The old one-shot chat queue is gone. Two-way conversation, slash commands, `/resume`, ANSI colors, all of it. Plus a `$B tab-each` command that fans out a single browse command across every open tab and returns per-tab JSON results.

### The numbers that matter

| Metric | Before | After | Δ |
|---|---|---|---|
| Sidebar surfaces | Chat (one-shot `claude -p`) + 3 debug | Terminal (live PTY) + 3 debug | -1 surface, +interactive |
| Subprocesses spawned per session | Many (one per chat message) | One (PTY claude, lazy-spawned) | -N |
| Lines in `extension/sidepanel.js` | 1969 | 1042 | -47% |
| Total diff | — | 27 files, +2875 / -3885 | -1010 net |
| New unit + integration + regression tests | 0 | 56+ | +56 |
| Live `tabs.json` push latency | n/a (no live state) | <50ms after `chrome.tabs` event | new capability |

### What this means for builders

Open the sidebar, type. Real PTY means slash commands, `/resume`, real ANSI rendering, real claude process lifecycle. Switch browser tabs while Claude is running and `<stateDir>/tabs.json` + `active-tab.json` update in place — Claude reads them, no need to ask `$B tabs`. Need to do the same thing on every tab? `$B tab-each <command>` returns a JSON array, original active tab restored when done, no OS focus stealing.

The old chat queue is gone. `sidebar-agent.ts`, `/sidebar-command`, `/sidebar-chat`, `/sidebar-agent/event` all deleted. The Cleanup / Screenshot / Cookies toolbar buttons survive in the Terminal pane — Cleanup pipes its prompt straight into the live PTY via `window.gstackInjectToTerminal()` instead of spawning yet another `claude -p`.

### Itemized changes

#### Added
- **Interactive Terminal sidebar tab.** xterm.js + a non-compiled `terminal-agent.ts` Bun process that spawns claude with `Bun.spawn({terminal: {rows, cols, data}})`. Auto-connects when the side panel opens, no keypress needed.
- **`$B tab-each <command>`** — fan-out helper for multi-tab work. Returns `{command, args, total, results: [{tabId, url, title, status, output}]}`. Skips chrome:// pages, scope-checks the inner command before iterating, restores the original active tab in a `finally` block, never pulls focus away from the user's foreground app.
- **Live tab state files.** `<stateDir>/tabs.json` (full list with id, url, title, active, pinned, audible, windowId) and `<stateDir>/active-tab.json` (current active). Updated atomically on every `chrome.tabs` event (activated, created, removed, URL/title change). Claude reads on demand instead of running `$B tabs`.
- **Tab-awareness system prompt** injected via `claude --append-system-prompt` at spawn so the model knows about the state files and the `$B tab-each` command without being told.
- **Always-visible Restart button** in the Terminal toolbar. Force-restart claude any time, not just from the "session ended" state.

#### Changed
- **Sidebar is Terminal-only.** No more `Terminal | Chat` primary tab nav. Activity / Refs / Inspector still live behind the `debug` toggle in the footer. Quick-actions (🧹 Cleanup / 📸 Screenshot / 🍪 Cookies) moved into the Terminal toolbar.
- **WebSocket auth uses `Sec-WebSocket-Protocol`** instead of cookies. Browsers can't set `Authorization` on WS upgrades, and `SameSite=Strict` cookies don't survive the cross-port jump from server.ts:34567 to the agent's random port from a chrome-extension origin. The token rides on `new WebSocket(url, [`gstack-pty.<token>`])` and the agent echoes the protocol back (Chromium closes connections that don't pick a protocol).
- **Cleanup button now drives the live PTY.** Clicking "🧹 Cleanup" injects the cleanup prompt straight into claude via `window.gstackInjectToTerminal()`. The Inspector "Send to Code" action uses the same path. No more `/sidebar-command` POSTs.
- **Repaint after debug-tab close.** xterm.js doesn't auto-redraw when its container flips from `display: none` back to `display: flex`. A MutationObserver on `#tab-terminal`'s class attribute now forces a `fitAddon.fit() + term.refresh() + resize` push when the pane becomes visible.

#### Removed
- **`browse/src/sidebar-agent.ts`** — the one-shot `claude -p` queue worker. ~900 lines.
- **Server endpoints**: `/sidebar-command`, `/sidebar-chat[/clear]`, `/sidebar-agent/{event,kill,stop}`, `/sidebar-tabs[/switch]`, `/sidebar-session{,/new,/list}`, `/sidebar-queue/dismiss`. ~600 lines.
- **Chat-related state** in server.ts: `ChatEntry`, `SidebarSession`, `TabAgentState`, `pickSidebarModel`, `addChatEntry`, `processAgentEvent`, `killAgent`, the agent-health watchdog, `chatBuffer`, the per-tab agent map.
- **Chat UI in sidepanel.html**: primary-tab nav, `<main id="tab-chat">`, the chat input bar, the experimental "Browser co-pilot" banner, the security event banner, the `clear-chat` footer button.
- **Five obsolete test files**: `sidebar-agent.test.ts`, `sidebar-agent-roundtrip.test.ts`, `security-e2e-fullstack.test.ts`, `security-review-fullstack.test.ts`, `security-review-sidepanel-e2e.test.ts`. Plus 5 chat-only describe blocks inside surviving security tests (loadSession session-ID validation, switchChatTab DocumentFragment, pollChat reentrancy, sidebar-tabs URL sanitization, agent queue security).

#### For contributors
- **`browse/src/pty-session-cookie.ts`** mirrors `sse-session-cookie.ts`. Same TTL, same opportunistic pruning, separate registry (PTY tokens must never be valid as SSE tokens or vice versa).
- **`docs/designs/SIDEBAR_MESSAGE_FLOW.md`** rewritten around the Terminal flow: WebSocket upgrade, dual-token model (`AUTH_TOKEN` for `/pty-session`, `gstack-pty.<token>` for `/ws`, `INTERNAL_TOKEN` for server↔agent loopback), threat-model boundary (Terminal tab bypasses the prompt-injection stack on purpose; user keystrokes are the trust source).
- **`browse/test/terminal-agent.test.ts`** (16 tests) + `terminal-agent-integration.test.ts` (real `/bin/bash` PTY round-trip, raw `Sec-WebSocket-Protocol` upgrade verification) + `tab-each.test.ts` (10 tests with mock `BrowserManager`) + `sidebar-tabs.test.ts` (27 structural assertions locking the chat-rip invariants).
- **CLAUDE.md** updated with the dual-token model, the cookie-vs-protocol rationale, and the cross-pane injection pattern.
- **`vendor:xterm`** build step copies `xterm@5.x` and `xterm-addon-fit` from `node_modules/` into `extension/lib/` at build time. xterm files are gitignored.
- **TODOS.md** carries three v1.1+ follow-ups: PTY session survival across sidebar reload (Issue 1C deferred), `/health` `AUTH_TOKEN` distribution audit (codex finding, pre-existing soft leak), and dropping the now-dead `security-classifier.ts` ML pipeline.

## [1.13.0.0] - 2026-04-25

## **`/gstack-claude` gives non-Claude hosts a read-only outside voice.**
Expand Down
35 changes: 29 additions & 6 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,12 +225,35 @@ When you need to interact with a browser (QA, dogfooding, cookie setup), use the
project uses.

**Sidebar architecture:** Before modifying `sidepanel.js`, `background.js`,
`content.js`, `sidebar-agent.ts`, or sidebar-related server endpoints, read
`docs/designs/SIDEBAR_MESSAGE_FLOW.md`. It documents the full initialization
timeline, message flow, auth token chain, tab concurrency model, and known
failure modes. The sidebar spans 5 files across 2 codebases (extension + server)
with non-obvious ordering dependencies. The doc exists to prevent the kind of
silent failures that come from not understanding the cross-component flow.
`content.js`, `terminal-agent.ts`, or sidebar-related server endpoints,
read `docs/designs/SIDEBAR_MESSAGE_FLOW.md`. The sidebar has one primary
surface — the **Terminal** pane (interactive `claude` PTY) — with
Activity / Refs / Inspector as debug overlays behind the footer's
`debug` toggle. The chat queue path was ripped once the PTY proved out;
`sidebar-agent.ts` and the `/sidebar-command` / `/sidebar-chat` /
`/sidebar-agent/event` endpoints are gone. The doc covers the WS auth
flow, dual-token model, and threat-model boundary — silent failures
here usually trace to not understanding the cross-component flow.

**WebSocket auth uses Sec-WebSocket-Protocol, not cookies.** Browsers
can't set `Authorization` on a WebSocket upgrade, but they CAN set
`Sec-WebSocket-Protocol` via `new WebSocket(url, [token])`. The agent
reads it, validates against `validTokens`, and MUST echo the protocol
back in the upgrade response — without the echo, Chromium closes the
connection immediately. `Set-Cookie: gstack_pty=...` is kept as a
fallback for non-browser callers (the cross-port `SameSite=Strict`
cookie path doesn't survive from a chrome-extension origin).

**Cross-pane PTY injection.** The toolbar's Cleanup button and the
Inspector's "Send to Code" action both pipe text into the live claude
PTY via `window.gstackInjectToTerminal(text)`, exposed by
`sidepanel-terminal.js`. No `/sidebar-command` POST — the live REPL is
the only execution surface in the sidebar now.

**`/health` MUST NOT surface any shell-grant token.** It already leaks
`AUTH_TOKEN` to localhost callers in headed mode (a v1.1+ TODO). Don't
make that worse by adding the PTY session token there. PTY auth flows
through `POST /pty-session` only.

**Transport-layer security** (v1.6.0.0+). When `pair-agent` starts an ngrok tunnel,
the daemon binds two HTTP listeners: a local listener (127.0.0.1, full command
Expand Down
1 change: 1 addition & 0 deletions SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -1035,6 +1035,7 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`.
| `closetab [id]` | Close tab |
| `newtab [url] [--json]` | Open new tab. With --json, returns {"tabId":N,"url":...} for programmatic use (make-pdf). |
| `tab <id>` | Switch to tab |
| `tab-each <command> [args...]` | Run a command on every open tab. Returns JSON with per-tab results. |
| `tabs` | List open tabs |

### Server
Expand Down
52 changes: 52 additions & 0 deletions TODOS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,57 @@
# TODOS

## Sidebar Terminal (cc-pty-import follow-ups)

### v1.1: PTY session survives sidebar reload

**What:** Today the Terminal tab's PTY dies with the WebSocket — sidebar
reload, side-panel close, even a quick navigate-away in another tab close
the session. v1.1 should key the PTY on a tab/session id so a reload
reattaches to the existing claude process and you keep `/resume` history.

**Why:** Mid-task resilience. When you've been pair-programming with claude
for 20 minutes and an accidental Cmd-R blows it away, the cost is real.

**Pros:** Better UX, fewer interrupted sessions. **Cons:** Session-tracking
state, ghost-process risk, lifecycle bugs (when DOES the PTY actually go
away?). v1 chose the simple "PTY dies with WS" model deliberately.

**Context:** /plan-eng-review Issue 1C decision (cc-pty-import branch,
2026-04-25). v1 ships with phoenix's lifecycle. **Depends on:**
cc-pty-import landed.

**Priority:** P2 (nice-to-have).
**Effort:** M. Likely needs a per-tab session map keyed by chrome.tabs.id
plus a TTL so abandoned PTYs eventually exit.

---

### v1.1+: Audit `/health` token distribution

**What:** Codex's outside-voice review on cc-pty-import flagged that
`/health` already surfaces `AUTH_TOKEN` to any localhost caller in headed
mode (`server.ts:1657`). That's a pre-existing soft leak — anything
running on localhost gets the root token by hitting `/health`.

**Why:** cc-pty-import sidesteps it by NOT putting the PTY token there
(uses an HttpOnly cookie path instead). But the underlying leak is still
shippable surface. A second extension or a localhost web app could
currently scrape `AUTH_TOKEN` and hit any browse-server endpoint.

**Pros:** Closes a real privilege-escalation path on multi-extension
machines. **Cons:** Either we tighten the gate (Origin must be OUR
extension id, not just any chrome-extension://) or we move bootstrap
discovery off `/health` entirely. Either has migration cost for tests
and the existing extension.

**Context:** codex finding #2 on cc-pty-import plan-eng review. Not in
scope of that PR; deliberately deferred to keep PTY-import small.

**Priority:** P2.
**Effort:** M.

---

## Testing

### Pre-existing test failures surfaced during v1.12.0.0 ship
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.13.0.0
1.14.0.0
1 change: 1 addition & 0 deletions browse/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -959,6 +959,7 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero
| `closetab [id]` | Close tab |
| `newtab [url] [--json]` | Open new tab. With --json, returns {"tabId":N,"url":...} for programmatic use (make-pdf). |
| `tab <id>` | Switch to tab |
| `tab-each <command> [args...]` | Run a command on every open tab. Returns JSON with per-tab results. |
| `tabs` | List open tabs |

### Server
Expand Down
79 changes: 32 additions & 47 deletions browse/src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -853,7 +853,7 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
// Delete stale state file
safeUnlinkQuiet(config.stateFile);

console.log('Launching headed Chromium with extension + sidebar agent...');
console.log('Launching headed Chromium with extension + terminal agent...');
try {
// Start server in headed mode with extension auto-loaded
// Use a well-known port so the Chrome extension auto-connects
Expand Down Expand Up @@ -882,56 +882,41 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
const status = await resp.text();
console.log(`Connected to real Chrome\n${status}`);

// Auto-start sidebar agent
// __dirname is inside $bunfs in compiled binaries — resolve from execPath instead
let agentScript = path.resolve(__dirname, 'sidebar-agent.ts');
if (!fs.existsSync(agentScript)) {
agentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'sidebar-agent.ts');
// sidebar-agent.ts spawn was here. Ripped alongside the chat queue —
// the Terminal pane runs an interactive PTY now, no more one-shot
// claude -p subprocesses to multiplex.

// Auto-start terminal agent (non-compiled bun process). Owns the PTY
// WebSocket for the sidebar Terminal pane.
let termAgentScript = path.resolve(__dirname, 'terminal-agent.ts');
if (!fs.existsSync(termAgentScript)) {
termAgentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'terminal-agent.ts');
}
try {
if (!fs.existsSync(agentScript)) {
throw new Error(`sidebar-agent.ts not found at ${agentScript}`);
}
// Clear old agent queue
const agentQueue = path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
try {
fs.mkdirSync(path.dirname(agentQueue), { recursive: true, mode: 0o700 });
fs.writeFileSync(agentQueue, '', { mode: 0o600 });
} catch (err: any) {
if (err?.code !== 'EACCES') throw err;
}

// Resolve browse binary path the same way — execPath-relative
let browseBin = path.resolve(__dirname, '..', 'dist', 'browse');
if (!fs.existsSync(browseBin)) {
browseBin = process.execPath; // the compiled binary itself
}

// Kill any existing sidebar-agent processes before starting a new one.
// Old agents have stale auth tokens and will silently fail to relay events,
// causing the server to mark the agent as "hung".
try {
const { spawnSync } = require('child_process');
spawnSync('pkill', ['-f', 'sidebar-agent\\.ts'], { stdio: 'ignore', timeout: 3000 });
} catch (err: any) {
if (err?.code !== 'ENOENT') throw err;
if (fs.existsSync(termAgentScript)) {
// Kill old terminal-agents so a stale port file can't trick the
// server into routing /pty-session at a dead listener.
try {
const { spawnSync } = require('child_process');
spawnSync('pkill', ['-f', 'terminal-agent\\.ts'], { stdio: 'ignore', timeout: 3000 });
} catch (err: any) {
if (err?.code !== 'ENOENT') throw err;
}
const termProc = Bun.spawn(['bun', 'run', termAgentScript], {
cwd: config.projectDir,
env: {
...process.env,
BROWSE_STATE_FILE: config.stateFile,
BROWSE_SERVER_PORT: String(newState.port),
},
stdio: ['ignore', 'ignore', 'ignore'],
});
termProc.unref();
console.log(`[browse] Terminal agent started (PID: ${termProc.pid})`);
}

const agentProc = Bun.spawn(['bun', 'run', agentScript], {
cwd: config.projectDir,
env: {
...process.env,
BROWSE_BIN: browseBin,
BROWSE_STATE_FILE: config.stateFile,
BROWSE_SERVER_PORT: String(newState.port),
},
stdio: ['ignore', 'ignore', 'ignore'],
});
agentProc.unref();
console.log(`[browse] Sidebar agent started (PID: ${agentProc.pid})`);
} catch (err: any) {
console.error(`[browse] Sidebar agent failed to start: ${err.message}`);
console.error(`[browse] Run manually: bun run ${agentScript}`);
// Non-fatal: chat still works without the terminal agent.
console.error(`[browse] Terminal agent failed to start: ${err.message}`);
}
} catch (err: any) {
console.error(`[browse] Connect failed: ${err.message}`);
Expand Down
3 changes: 2 additions & 1 deletion browse/src/commands.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ export const WRITE_COMMANDS = new Set([
]);

export const META_COMMANDS = new Set([
'tabs', 'tab', 'newtab', 'closetab',
'tabs', 'tab', 'tab-each', 'newtab', 'closetab',
'status', 'stop', 'restart',
'screenshot', 'pdf', 'responsive',
'chain', 'diff',
Expand Down Expand Up @@ -144,6 +144,7 @@ export const COMMAND_DESCRIPTIONS: Record<string, { category: string; descriptio
'tab': { category: 'Tabs', description: 'Switch to tab', usage: 'tab <id>' },
'newtab': { category: 'Tabs', description: 'Open new tab. With --json, returns {"tabId":N,"url":...} for programmatic use (make-pdf).', usage: 'newtab [url] [--json]' },
'closetab':{ category: 'Tabs', description: 'Close tab', usage: 'closetab [id]' },
'tab-each':{ category: 'Tabs', description: 'Run a command on every open tab. Returns JSON with per-tab results.', usage: 'tab-each <command> [args...]' },
// Server
'status': { category: 'Server', description: 'Health check' },
'stop': { category: 'Server', description: 'Shutdown server' },
Expand Down
Loading
Loading